Language

Definitions on Language



Subjects to be Learned

Contents

Here we are going to learn the concept of language in very abstract and general sense, operations on languages and some of their properties.

1. Basic concepts

First, an alphabet is a finite set of symbols. For example {0, 1} is an alphabet with two symbols, {a, b} is another alphabet with two symbols and English alphabet is also an alphabet. A string (also called a word) is a finite sequence of symbols of an alphabet. b, a and aabab are examples of string over alphabet {a, b} and 0, 10 and 001 are examples of string over alphabet {0, 1}. A language is a set of strings over an alphabet. Thus {a, ab, baa} is a language (over alphabert {a,b}) and {0, 111} is a language (over alphabet {0,1}). The number of symbols in a string is called the length of the string. For a string w its length is represented by |w|. It can be defined more formally by recursive definition. The empty string (also called null string) is the string with length 0. That is, it has no symbols. The empty string is denoted by (capital lambda). Thus || = 0.
Let u and v be strings. Then uv denotes the string obtained by concatenating u with v, that is, uv is the string obtained by appending the sequence of symbols of v to that of u. For example if u = aab and v = bbab, then uv = aabbbab. Note that vu = bbabaab uv. We are going to use first few symbols of English alphabet such as a and b to denote symbols of an alphabet and those toward the end such as u and v for strings.

A string x is called a substring of another string y if there are strings u and v such that y = uxv. Note that u and v may be an empty string. So a string is a substring of itself. A string x is a prefix of another string y if there is a string v such that y = xv. v is called a suffix of y.

2. Some special languages

The empty set is a language which has no strings. The set {} is a language which has one string, namely . Though has no symbols, this set has an object in it. So it is not empty. For any alphabet , the set of all strings over (including the empty string) is denoted by . Thus a language over alphabet is a subset of .

3. Operations on languages

Since languages are sets, all the set operations can be applied to languages. Thus the union, intersetion and difference of two languages over an alphabet are languages over . The complement of a language L over an alphabet is - L and it is also a language.

Another operation onlanguages is concatenation. Let L1 and L2 be languages. Then the concatenation of L1 with L2 is denoted as L1L2 and it is defined as L1L2 = { uv | u L1 and v L2 }. That is L1L2 is the set of strings obtained by concatenating strings of L1 with those of L2. For example {ab, b} {aaa, abb, aaba} = {abaaa, ababb, abaaba, baaa, babb, baaba}.

Powers : For a symbol a and a natural number k, ak represents the concatenation of k a's. For a string u and a natural number k, uk denotes the concatenation of k u's. Similarly for a language L, Lk means the concatenation of k L's. Hence Lk is the set of strings that can be obtained by concatenating k strings of L. These powers can be formally defined recursively. For example Lk can be defined recursively as follows.

Recursive definition of Lk:

Basis Clause: L0 = { }
Inductive Clause: L(k+1) = Lk L.
Since Lk is defined for natural numbers k, the extremal clause is not necessary.

ak and uk can be defined similarly. Here a0 = and u0 = .

The following two types of languages are generalizations of * and we are going to see them quite often in this course.

Recursive definition of L*:
Basis Clause: L*
Inductive Clause: For any x L* and any w L, xw L*.
Extremal Clause: Nothing is in L* unless it is obtained from the above two clauses.

L* is the set of strings obtained by concatenating zero or more strings of L as we are going to see in Theorem 1. This * is called Kleene star.
For example if L = { aba, bb }, then L* = { , aba, bb, ababb, abaaba, bbbb, bbaba, ... }
The * in * is also the same Kleene star defined above.

Recursive definition of L+:
Basis Clause: L L+
Inductive Clause: For any x L+ and any w L, xw L+.
Extremal Clause: Nothing is in L+ unless it is obtained from the above two clauses.

Thus L+ is the set of strings obtained by concatenating one or more strings of L.
For example if L = { aba, bb }, then L+ = { aba, bb, ababb, abaaba, bbbb, bbaba, ... }

Let us also define (i.e. L0 L L2 ... ) as = { x | x Lk for some natural number k } .

Then the following relationships hold on L* and L+. Theorems 1 and 2 are proven in "General Induction" which you study in the next unit. Other proofs are omitted.

Theorem 1: Ln L*

Theorem 2:

Theorem 3:

Theorem 4: L+ = L L* = L*L


Note: According to Theorems 2 and 3, any nonempty string in L* or L+ can be expresssed as the concatenation of strings of L, i.e. w1w2...wk for some k, where wi's are strings of L.

L* and L* have a number of interesting properties. Let us list one of them as a theorem and prove it.

Theorem 5:   L* = (L*)*.

Proof: Because by Theorem 2, by applying Theorem 2 to the language L* we can see that
L* (L*)*.
Conversely ( L* )* L* can be proven as follows:

Let x be an arbitrary nonempty string of ( L* )*. Then there are nonempty strings w1, w2, ..., wk in L* such that x = w1w2...wk . Since w1, w2, ..., wk are strings of L*, for each wi there are strings wi1, wi2, ..., wimi in L such that wi = wi1wi2...wimi
Hence x = w11 ...w1m1w21.. .w2m2...wm1...wmmk . Hence x is in L* .
If x is an empty string, then it is obviously in L* .
Hence ( L* )* L* .
Since L* (L*)* and ( L* )* L* ,   L* = ( L* )* .




Test Your Understanding of Language

Indicate which of the following statements are correct and which are not.
Click Yes or No , then Submit.
There are two sets of questions.

In the questions below the following notations are used:

L^* , L^+ and L^k   for   L*, L+, Lk , respectively
a^k   for   ak




Next -- Problem Solving and Language

Back to Study Schedule

Back to Table of Contents