##### Regular Languages

## Non-Regular Languages

### Subjects to be Learned

- Existence of non-regular languages
- Myhill - Nerode Theorem for non-regularity test
- Pumping Lemma

### Contents

We have learned regular languages, their properties and their usefulness
for describing various systems. There are, however, languages that are not regular
and therefore require devices other than finite automata to recognize them.
In this section we are going to study some of the methods for testing
given languages for regularity and see some of the languages that are not regular.
The main idea behind these test methods is that finite automata have only finite amount of memory
in the form of states and that they can not distinguish
infinitely many strings. For example to recognize the language
{ a^{n}b^{n} | n is a natural number} ,
a finite automaton must remember how many a's it has read when it starts reading b's.
Thus it must be in different states when it has read different number of a's and starts
reading the first b. But any finite automaton has only finite number of states.
Thus there is no way for a finite automaton to remember how many a's it has read
for all possible strings a^{n}b^{n} .
That is the main limitation of finite automata. Since a regular language must be recognized
by a finite automaton, we can conclude that
{ a^{n}b^{n} | n is a natural number} is not regular.
This is the basis of
two of the regularity test methods we are going to study below: Myhill-Nerode Theorem and Pumping Lemma.

**
Non-regularity test based on Myhill-Nerode's theorem
**

**Indistinguishability of strings:**
Strings x and y in ^{*}
are **indistinguishable with respect to a language L**
if and only if
for every string z in ^{*}, either xz and yz are both
in L or they are both not in L.

For example, a and aa are indistinguishable with respect to the language a^{n}
over alphabet { a }, where n is a positive integer, because aa^{k} and
aaa^{k}
are in the language a^{n} for any positive integer k. However, with respect to
the language
a^{n}b^{n} , a and aa are
not indistinguishable (hence distinguishable), because ab is in the language a^{n}b^{n}
while aab is not in the language.

Using this concept of indistinguishability, the following theorem by Myhill and Nerod
gives a criterion for (non)regularity of a language. It is stated without a proof.
For more on Myhill-Nerode theorem click here.

** Theorem :**
A language L over alphabet is nonregular
if and only if there is an infinite subset of ^{*} ,
whose strings are pairwise
distinguishable with respect to L.

**
Example 1:
**
L_{1} = { a^{n}b^{n} | n is a positive integer } over alphabet
{ a , b } can be shown
to be nonregular using Myhill-Nerode as follows:

Consider the set of strings S_{1} = { a^{n} | n is a positive integer } .
S_{1} is over alphabet { a , b } and it is infinite. We are going to show that
its strings are pairwise distinguishable
with respect to L_{1}. Let a^{k} and a^{m} be arbitrary two different
members of the set S_{1}, where k and m are
positive integers and k m .
Select b^{m} as a string to be appended to a^{k} and a^{m} . Then
a^{k}b^{m} is not in L_{1} while
a^{m}b^{m} is in L_{1} . Hence a^{k} and a^{m} are
distinguishable with respect to L_{1} . Since
a^{k} and a^{m} are arbitrary strings of S_{1},
S_{1} satisfies the conditions of Myhill-Nerode theorem. Hence L_{1} is
nonregular.

**
Example 2:
**
L_{2} = { ww | w {a, b }^{*} }
is nonregular.

Consider the set of strings S_{2} which is the same as S_{1} of Example 1 above. It can be shown to be pairwise distinguishable with respect to L_{2} as follows.

Let a^{k} and a^{m} be arbitrary two different members of the set, where k and m are
positive integers and k m .
Select ba^{k}b as a string to be appended to a^{k} and a^{m} . Then
a^{k}ba^{k}b is in L_{2} while
a^{m}ba^{k}b is not in L_{2} . Hence a^{k} and a^{m}
are distinguishable with respect to L_{2} . Since
a^{k} and a^{m} are arbitrary strings of S_{2},
S_{2} satisfies the conditions of Myhill-Nerode theorem. Hence L_{2} is
nonregular.

**
Example 3:
**
Let L_{3} be the set of algebraic expressions involving identifiers x and y, operations
+ and * and left and right parentheses.

L_{3} can be defined recursively as follows:

Basis Clause: x and y are in L_{3} .

Inductive Clause: If and
are in L_{3} , then
( + )
and ( * ) are in L_{3} .

Extremal Clause: Nothing is in L_{3} unless it is obtained from the above
two clauses.

For example, x , (x*y) , ( ( x + y ) * x ) and (( (x*y) + x ) + (y*y) ) are algebraic
expressions.

Consider the set of strings S_{3} = { (^{k} x | k is a positive integer } ,
that is, the set of strings consisting of one or more right parentheses followed by
identifier x.
This set is infinite and it can be shown to be pairwise distinguishable with respect
to L_{3} as follows:

Let (^{k} x and (^{m} x be arbitrary two strings of S_{3} ,
where k and m are positive integers and k m .
Select [ + x ) ]^{k}
as a string to be appended to (^{k} and (^{m} .
For example [ + x ) ]^{3} is +x)+x)+x) .

Then (^{k} x + [ + x ) ]^{k} is in L_{3} but
(^{m} x + [ + x ) ]^{k} is not in L_{3} because
the number of ('s is not equal to the number of )'s in the latter string.
Hence S_{3} is pairwise distinguishable with respect to L_{3} .
Hence L_{3} is not regular.

**
Pumping Lemma**

Let us consider the NFA given below.

This NFA accepts among others some strings of length greater than 5 such as abbabbb,
abbabbabbb etc.
Those strings which are accepted by this NFA and whose length is greater than 5
have a substring which can be repeated any number of times without being rejected
by the NFA. For example the string abbabbb is accepted by the NFA and if one of its
substrings
bba is repeated any number of times in a*bba*bbb, the resultant strings such as
abbb (bba repeated 0 times), abbabbabbb, abbabbabbabbb etc. are also accepted by the NFA.

In general if a string w (such as abbabbb in the example above) is accepted by an NFA
with *n* states and if its length is
longer than *n*, then there must be a cycle in the NFA along some path from the
initial state to some accepting state (such as the cycle 2-3-4-2 in the above example).
Then the substring representing that cycle (bba in the example) can be
repeated any number of times within the string w without being rejected by the NFA.

The following theorem which is called Pumping Lemma is based on this observation. It states
that if a language is regular, then any long enough string of the language has a substring
which can be repeated any number of times with the resultant strings still in the language.
It is stated without a proof here.

** Pumping Lemma :**
Suppose that a language L is regular. Then there is an FA that accepts L. Let n be the number of states of that FA.
Then for any string x in L with |x| n, there are
strings u, v and w which satisfy the following relationships:

x = uvw

|uv| n

|v| > 0 and

for every integer m 0, uv^{m}w
L.

Note that Pumping Lemma gives a necessity for regular languages
and that it is not a sufficiency, that is, even if there is an integer n
that satisfies the conditions of Pumping Lemma, the language is not necessarily regular.
Thus Pumping Lemma can not be used to prove the regularity of a language. It can only show
that a language is nonregular.

**
Example 4:**
As an example to illustrate how Pumping Lemma might be used to prove that
a language is nonregular, let us prove that the language
L = a^{k}b^{k} is nonregular, where k is a natural number.

Suppose that L is regular and let n be the number of states of an FA that accepts L.
Consider a string x = a^{n}b^{n} for that n.
Then there must be strings u, v, and w
such that

x = uvw,

|uv| n

|v| > 0 , and

for every m 0, uv^{m}w
L.

Since |v| > 0 , v has at least one symbol. Also since |uv|
n,
v = a^{p}, for some p > 0 ,

Let us now consider the string uv^{m}w for m = 2.

Then uv^{2}w = a^{n-p}a^{2p}b^{n}
= a^{n+p}b^{n} . Since p > 0 , n + p
n . Hence
a^{n+p}b^{n} can not be in the language L represented by
a^{k}b^{k} .

This violates the condition that
for every m 0, uv^{m}w
L.
Hence L is not a regular language.

### Test Your Understanding of Non-regularity

Indicate which of the following statements are correct and which are not.

Click True or Fals , then Submit.

**
Next -- Context-Free Grammar **

**
Back to Study Schedule **

**
Back to Table of Contents **