Language
Definitions on Language
Subjects to be Learned
- alphabet
- string (word)
- language
- operations on languages: concatenation of strings, union, intersection, Kleene star
Contents
Here we are going to learn the concept of language in very abstract and general sense,
operations on languages and some of their properties.
1. Basic concepts
First, an alphabet is a finite set of symbols. For example
{0, 1} is an alphabet with two symbols, {a, b} is another alphabet with two symbols and English alphabet is
also an alphabet. A string
(also called a word) is a finite sequence
of symbols of an alphabet. b, a and aabab are examples of
string over alphabet {a, b} and 0, 10 and 001 are examples
of string over alphabet {0, 1}.
A language is a set of strings over an alphabet.
Thus {a, ab, baa} is a language (over alphabert {a,b}) and {0, 111}
is a language (over alphabet {0,1}).
The number of symbols in a string is called the length of the string.
For a string w its length is represented by |w|. It can be
defined more formally by recursive definition.
The empty string
(also called null string)
is the string with length 0. That is, it has no symbols. The empty string is denoted
by (capital lambda). Thus || = 0.
Let u and v be strings. Then uv denotes the string obtained
by concatenating
u with v, that is, uv is the string obtained by appending the sequence of symbols
of v to that of u. For example if u = aab and v = bbab, then
uv = aabbbab. Note that vu = bbabaab
uv.
We are going to use first few symbols of English alphabet such as a and b
to denote symbols of an alphabet
and those toward the end such as u and v for strings.
A string x is called a substring of another string y if there are strings
u and v such that y = uxv.
Note that u and v may be an empty string. So a string is
a substring of itself.
A string x is a prefix of another
string y if there is a string v such that y = xv. v is called
a suffix of y.
2. Some special languages
The empty set is
a language which has no strings.
The set {} is a language
which has one string, namely
. Though
has no symbols,
this set has an object in it. So it is not empty.
For any alphabet ,
the set of all strings
over (including the empty string)
is denoted by
.
Thus a language over alphabet
is a subset of .
3. Operations on languages
Since languages are sets, all the set operations can be applied to languages.
Thus the union, intersetion and difference of two languages over an alphabet
are languages over .
The complement of a language L over an alphabet
is - L and it is also a language.
Another operation onlanguages is concatenation.
Let L1 and L2 be languages. Then the
concatenation of
L1 with L2 is denoted as L1L2
and it is defined as L1L2 = { uv | u
L1 and v L2 }.
That is L1L2 is the set of strings obtained by concatenating
strings of L1 with those of L2.
For example {ab, b} {aaa, abb, aaba} = {abaaa, ababb, abaaba, baaa, babb, baaba}.
Powers : For a symbol a
and a natural number k, ak represents
the concatenation of
k a's. For a string u
and a natural number k, uk denotes
the concatenation of
k u's. Similarly for a language L, Lk means
the concatenation of
k L's. Hence Lk is the set of strings that can be obtained by concatenating
k strings of L. These powers can be formally defined recursively. For example Lk
can be defined recursively as follows.
Recursive definition of Lk:
Basis Clause: L0 = { }
Inductive Clause: L(k+1) = Lk L.
Since Lk is defined for natural numbers k, the extremal clause is not necessary.
ak and uk can be defined similarly.
Here a0 =
and u0 = .
The following two types of languages are generalizations of
*
and we are going to see them quite often in this course.
Recursive definition of L*:
Basis Clause:
L*
Inductive Clause: For any x
L* and any w L,
xw L*.
Extremal Clause: Nothing is in L* unless it is obtained from the above two clauses.
L* is the set of strings obtained by concatenating zero or more strings
of L as we are going to see in Theorem 1.
This * is called Kleene star.
For example if L = { aba, bb }, then L*
= { , aba, bb, ababb, abaaba, bbbb, bbaba, ... }
The * in * is also the same Kleene star
defined above.
Recursive definition of L+:
Basis Clause: L
L+
Inductive Clause: For any x
L+ and any w L,
xw L+.
Extremal Clause: Nothing is in L+ unless it is obtained from the above two clauses.
Thus L+ is the set of strings obtained by concatenating one or more strings
of L.
For example if L = { aba, bb }, then L+
= { aba, bb, ababb, abaaba, bbbb, bbaba, ... }
Let us also define
(i.e. L0 L L2 ... )
as
=
{ x | x Lk
for some natural number k } .
Then the following relationships hold on L* and L+.
Theorems 1 and 2 are proven in
"General Induction" which you study in the next unit.
Other proofs are omitted.
Theorem 1: Ln
L*
Theorem 2:
Theorem 3:
Theorem 4: L+ = L L* = L*L
Note: According to Theorems 2 and 3, any nonempty string in L*
or L+ can be expresssed as the concatenation of
strings of L, i.e. w1w2...wk for some k, where wi's are strings of L.
L* and L* have a number of interesting properties.
Let us list one of them as a theorem and prove it.
Theorem 5: L* = (L*)*.
Proof:
Because
by Theorem 2, by applying Theorem 2 to the language L* we can see that
L*
(L*)*.
Conversely ( L* )*
L* can be proven as follows:
Let x be an arbitrary nonempty string of ( L* )*.
Then there are nonempty strings w1, w2, ..., wk
in L* such that x = w1w2...wk
.
Since w1, w2, ..., wk
are strings of L*, for each wi there are strings
wi1, wi2, ..., wimi in L
such that
wi = wi1wi2...wimi
Hence x = w11 ...w1m1w21..
.w2m2...wm1...wmmk .
Hence x is in L* .
If x is an empty string, then it is obviously in L* .
Hence ( L* )*
L* .
Since L*
(L*)*
and ( L* )*
L* ,
L* = ( L* )* .
Test Your Understanding of Language
Indicate which of the following statements are correct and which are not.
Click Yes or No , then Submit.
There are two sets of questions.
In the questions below the following notations are used:
L^* , L^+ and L^k for L*, L+, Lk , respectively
a^k for ak
Next -- Problem Solving and Language
Back to Study Schedule
Back to Table of Contents