Regular Languages

Non-Regular Languages

Subjects to be Learned

Existence of non-regular languages
Myhill - Nerode Theorem for non-regularity test
Pumping Lemma

We have learned regular languages, their properties and their usefulness for describing various systems. There are, however, languages that are not regular and therefore require devices other than finite automata to recognize them. In this section we are going to study some of the methods for testing given languages for regularity and see some of the languages that are not regular. The main idea behind these test methods is that finite automata have only finite amount of memory in the form of states and that they can not distinguish infinitely many strings. For example to recognize the language { aⁿbⁿ | n is a natural number} , a finite automaton must remember how many a's it has read when it starts reading b's. Thus it must be in different states when it has read different number of a's and starts reading the first b. But any finite automaton has only finite number of states. Thus there is no way for a finite automaton to remember how many a's it has read for all possible strings aⁿbⁿ . That is the main limitation of finite automata. Since a regular language must be recognized by a finite automaton, we can conclude that { aⁿbⁿ | n is a natural number} is not regular. This is the basis of two of the regularity test methods we are going to study below: Myhill-Nerode Theorem and Pumping Lemma.

Non-regularity test based on Myhill-Nerode's theorem

Indistinguishability of strings: Strings x and y in

^* are indistinguishable with respect to a language L if and only if for every string z in

^*, either xz and yz are both in L or they are both not in L.
For example, a and aa are indistinguishable with respect to the language aⁿ over alphabet { a }, where n is a positive integer, because aa^k and aaa^k are in the language aⁿ for any positive integer k. However, with respect to the language aⁿbⁿ , a and aa are not indistinguishable (hence distinguishable), because ab is in the language aⁿbⁿ while aab is not in the language.

Using this concept of indistinguishability, the following theorem by Myhill and Nerod gives a criterion for (non)regularity of a language. It is stated without a proof. For more on Myhill-Nerode theorem click here.

Theorem : A language L over alphabet

is nonregular if and only if there is an infinite subset of

^* , whose strings are pairwise distinguishable with respect to L.

Example 1: L₁ = { aⁿbⁿ | n is a positive integer } over alphabet { a , b } can be shown to be nonregular using Myhill-Nerode as follows:
Consider the set of strings S₁ = { aⁿ | n is a positive integer } . S₁ is over alphabet { a , b } and it is infinite. We are going to show that its strings are pairwise distinguishable with respect to L₁. Let a^k and a^m be arbitrary two different members of the set S₁, where k and m are positive integers and k

m . Select b^m as a string to be appended to a^k and a^m . Then a^kb^m is not in L₁ while a^mb^m is in L₁ . Hence a^k and a^m are distinguishable with respect to L₁ . Since a^k and a^m are arbitrary strings of S₁, S₁ satisfies the conditions of Myhill-Nerode theorem. Hence L₁ is nonregular.

Example 2: L₂ = { ww | w

{a, b }^* } is nonregular.
Consider the set of strings S₂ which is the same as S₁ of Example 1 above. It can be shown to be pairwise distinguishable with respect to L₂ as follows.
Let a^k and a^m be arbitrary two different members of the set, where k and m are positive integers and k

m . Select ba^kb as a string to be appended to a^k and a^m . Then a^kba^kb is in L₂ while a^mba^kb is not in L₂ . Hence a^k and a^m are distinguishable with respect to L₂ . Since a^k and a^m are arbitrary strings of S₂, S₂ satisfies the conditions of Myhill-Nerode theorem. Hence L₂ is nonregular.

Example 3: Let L₃ be the set of algebraic expressions involving identifiers x and y, operations + and * and left and right parentheses.
L₃ can be defined recursively as follows:
Basis Clause: x and y are in L₃ .
Inductive Clause: If

and

are in L₃ , then (

) and (

) are in L₃ .
Extremal Clause: Nothing is in L₃ unless it is obtained from the above two clauses.

For example, x , (x*y) , ( ( x + y ) * x ) and (( (x*y) + x ) + (y*y) ) are algebraic expressions.

Consider the set of strings S₃ = { (^k x | k is a positive integer } , that is, the set of strings consisting of one or more right parentheses followed by identifier x. This set is infinite and it can be shown to be pairwise distinguishable with respect to L₃ as follows:
Let (^k x and (^m x be arbitrary two strings of S₃ , where k and m are positive integers and k

m . Select [ + x ) ]^k as a string to be appended to (^k and (^m . For example [ + x ) ]³ is +x)+x)+x) .
Then (^k x + [ + x ) ]^k is in L₃ but (^m x + [ + x ) ]^k is not in L₃ because the number of ('s is not equal to the number of )'s in the latter string. Hence S₃ is pairwise distinguishable with respect to L₃ . Hence L₃ is not regular.

Pumping Lemma

Let us consider the NFA given below.

This NFA accepts among others some strings of length greater than 5 such as abbabbb, abbabbabbb etc. Those strings which are accepted by this NFA and whose length is greater than 5 have a substring which can be repeated any number of times without being rejected by the NFA. For example the string abbabbb is accepted by the NFA and if one of its substrings bba is repeated any number of times in abbabbb, the resultant strings such as abbb (bba repeated 0 times), abbabbabbb, abbabbabbabbb etc. are also accepted by the NFA.
In general if a string w (such as abbabbb in the example above) is accepted by an NFA with n states and if its length is longer than n, then there must be a cycle in the NFA along some path from the initial state to some accepting state (such as the cycle 2-3-4-2 in the above example). Then the substring representing that cycle (bba in the example) can be repeated any number of times within the string w without being rejected by the NFA.
The following theorem which is called Pumping Lemma is based on this observation. It states that if a language is regular, then any long enough string of the language has a substring which can be repeated any number of times with the resultant strings still in the language. It is stated without a proof here.

Pumping Lemma : Suppose that a language L is regular. Then there is an FA that accepts L. Let n be the number of states of that FA. Then for any string x in L with |x|

n, there are strings u, v and w which satisfy the following relationships:
x = uvw
|uv|

n
|v| > 0 and
for every integer m

0, uv^mw

L.

Note that Pumping Lemma gives a necessity for regular languages and that it is not a sufficiency, that is, even if there is an integer n that satisfies the conditions of Pumping Lemma, the language is not necessarily regular. Thus Pumping Lemma can not be used to prove the regularity of a language. It can only show that a language is nonregular.

Example 4: As an example to illustrate how Pumping Lemma might be used to prove that a language is nonregular, let us prove that the language L = a^kb^k is nonregular, where k is a natural number.

Suppose that L is regular and let n be the number of states of an FA that accepts L. Consider a string x = aⁿbⁿ for that n. Then there must be strings u, v, and w such that

x = uvw,
|uv|

n
|v| > 0 , and
for every m

0, uv^mw

L.

Since |v| > 0 , v has at least one symbol. Also since |uv|

n, v = a^p, for some p > 0 ,
Let us now consider the string uv^mw for m = 2.
Then uv²w = a^n-pa^2pbⁿ = a^n+pbⁿ . Since p > 0 , n + p

n . Hence a^n+pbⁿ can not be in the language L represented by a^kb^k .
This violates the condition that for every m

0, uv^mw

L. Hence L is not a regular language.

Test Your Understanding of Non-regularity

Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- Context-Free Grammar

Back to Study Schedule

Back to Table of Contents