Context-Free Languages

Context-Free Grammar

Subjects to be Learned

Context-Free Grammar
Context-Free Languages
Push Down Automata

Earlier in the discussion of grammars we saw context-free grammars. They are grammars whose productions have the form X ->

, where X is a nonterminal and

is a nonempty string of terminals and nonterminals. The set of strings generated by a context-free grammar is called a context-free language and context-free languages can describe many practically important systems. Most programming languages can be approximated by context-free grammar and compilers for them have been developed based on properties of context-free languages. Let us define context-free grammars and context-free languages here.

Definition (Context-Free Grammar) : A 4-tuple G = < V ,

, S , P > is a context-free grammar (CFG) if V and

are finite sets sharing no elements between them, S

V is the start symbol, and P is a finite set of productions of the form X ->

, where X

V , and

( V

)^* .
A language is a context-free language (CFL) if all of its strings are generated by a context-free grammar.

Example 1: L₁ = { aⁿbⁿ | n is a positive integer } is a context-free language. For the following context-free grammar G₁ = < V₁ ,

, S , P₁ > generates L₁ :
V₁ = { S } ,

= { a , b } and P₁ = { S -> aSb , S -> ab }.

Example 2: L₂ = { ww^r| w

{a, b }⁺ } is a context-free language , where w is a non-empty string and w^r denotes the reversal of string w, that is, w is spelled backward to obtain w^r . For the following context-free grammar G₂ = < V₂ ,

, S , P₂ > generates L₂ :
V₂ = { S } ,

= { a , b } and P₂ = { S -> aSa , S -> bSb , S -> aa , S -> bb }.

Example 3: Let L₃ be the set of algebraic expressions involving identifiers x and y, operations + and * and left and right parentheses. Then L₃ is a context-free language. For the following context-free grammar G₃ = < V₃ ,

₃, S , P₃ > generates L₃ :
V₃ = { S } ,

₃ = { x , y , ( , ) , + , * } and P₃ = { S -> ( S + S ) , S -> S*S , S -> x , S -> y }.

Example 4: Portions of the syntaxes of programming languages can be described by context-free grammars. For example
{ < statement > -> < if-statement > , < statement > -> < for-statement > , < statement > -> < assignment > , . . . , < if-statement > -> if ( < expression > ) < statement > , < for-statement > -> for ( < expression > ; < expression > ; < expression > ) < statement > , . . . , < expression > -> < algebraic-expression > , < expression > -> < logical-expression > , . . . } .

Properties of Context-Free Language

Theorem 1: Let L₁ and L₂ be context-free languages. Then L₁

L₂ , L₁L₂ , and L₁^* are context-free languages.

Outline of Proof

This theorem can be verified by constructing context-free grammars for union, concatenation and Kleene star of context-free grammars as follows:
Let G₁ = < V₁ ,

, S₁ , P₁ > and G₂ = < V₂ ,

, S₂ , P₂ > be context-free grammars generating L₁ and L₂ , respectively.
Then for L₁

L₂ , first relabel symbols of V₂ , if necessary, so that V₁ and V₂ don't share any symbols. Then let S_u be a symbol which is not in V₁

V₂ . Next define V_u = V₁

V₂

{ S_u } and P_u = P₁

P₂

{ S_u -> S₁ , S_u -> S₂ } .
Then it can be easily seen that G_u = < V_u ,

, S_u , P_u > is a context-free grammar that generates the language L₁

L₂ .

Similarly for L₁L₂ , first relabel symbols of V₂ , if necessary, so that V₁ and V₂ don't share any symbols. Then let S_c be a symbol which is not in V₁

V₂ . Next define V_c = V₁

V₂

{ S_c } and P_c = P₁

P₂

{ S_c -> S₁S₂ } .
Then it can be easily seen that G_c = < V_c ,

, S_c , P_c > is a context-free grammar that generates the language L₁L₂ .

For L₁^* , let S_s be a symbol which is not in V₁ . Then let P_s = P₁

{ S_s -> S_sS₁ , S_s ->

} . It can be seen that the grammar G_s = < V_s ,

, S_s , P_s > is a context-free grammar that generates the language L₁^* .

Pushdown Automata

Like regular languages which are accepted by finite automata, context-free languages are also accepted by automata but not finite automata. They need a little more complex automata called pushdown automata.
Let us consider a context-free language aⁿbⁿ . Any string of this language can be tested for the membership for the language by a finite automaton if there is a memory such as a pushdown stack that can store a's of a given input string. For example, as a's are read by the finite automaton, push them into the stack. As soon as the symbol b appears stop storing a's and start popping a's one by one every time a b is read. If another a (or anything other than b) is read after the first b, reject the string. When all the symbols of the input string are read, check the stack. If it is empty, accept the string. Otherwise reject it.
This automaton behaves like a finite automaton except the following two points: First, its next state is determined not only by the input symbol being read, but also by the symbol at the top of the stack. Second, the contents of the stack can also be changed every time an input symbol is read. Thus its transition function specifies the new top of the stack contents as well as the next state.

Let us define this new type of automaton formally.

A pushdown automaton ( or PDA for short ) is a 7-tuple M = < Q ,

, q₀ , Z₀ , A ,

> , where
Q is a finite set of states,

and

are finite sets ( the input and stack alphabet, respectively ).
q₀ is the initial state,
Z₀ is the initial stack symbol and it is a member of

,
A is the set of accepting states

is the transition function and

: Q

(

}

-> 2^Q

^*.

Thus

( p , a , X ) = ( q ,

) means the following:
The automaton moves from the current state of p to the next state q when it sees an input symbol a at the input and X at the top of the stack, and it replaces X with the string

at the top of the stack.

Example 1 :

Let us consider the pushdown automaton < Q ,

, q₀ , Z₀ , A ,

> , where Q = { q₀ , q₁ , q₂ } ,

= { a , b } ,

= { a , b , Z₀ } , A = { q₂ } and let

be as given in the following table:

State	Input	Top of Stack	Move
q₀	a	Z₀	( q₀ , aZ ₀ )
q₀	a	a	( q₀ , aa )
q₀	b	a	( q₁ , )
q₁	b	a	( q₁ , )
q₁		Z₀	( q₂ , Z₀ )

This pushdown automaton accepts the language aⁿbⁿ . To describe the operation of a PDA we are going to use a configuration of PDA. A configuration of a PDA M = < Q ,

, q₀ , Z₀ , A ,

> is a triple ( q , x ,

) , where q is the state the PDA is currently in, x is the unread portion of the input string and

is the current stack contents, where the input is read from left to right and the top of the stack corresponds to the leftmost symbol of

. To express that the PDA moves from configuration ( p , x ,

) to configuration ( q , y ,

) in a single move (a single application of the transition function) we write
( p , x ,

)

( q , y ,

) .
If ( q , y ,

) is reached from ( p , x ,

) by a sequence of zero or more moves, we write
( p , x ,

)

^* ( q , y ,

) .

Let us now see how the PDA of Example 1 operates when it is given the string aabb , for example.
Initially its configuration is ( q₀ , aabb , Z₀ ). After reading the first a, its configuration is ( q₀ , abb , aZ₀ ). After reading the second a, it is ( q₀ , bb , aaZ₀ ). Then when the first b is read, it moves to state q₁ and pops a from the top of the stack. Thus the configuration is ( q₁ , b , aZ₀ ). When the second b is read, another a is popped from the top of the stack and the PDA stays in state q₁ . Thus the configuration is ( q₁ ,

, Z₀ ). Next it moves to the state q₂ which is the accepting state. Thus aabb is accepted by this PDA. This entire process can be expressed using the configurations as

( q₀ , aabb , Z₀ )

( q₀ , abb , aZ₀ )

( q₀ , bb , aaZ₀ )

( q₁ , b , aZ₀ )

( q₁ ,

, Z₀ )

( q₂ ,

, Z₀ ).

If we are not interested in the intermediate steps, we can also write

( q₀ , aabb , Z₀ )

^* ( q₂ ,

, Z₀ ) .

A string x is accepted by a PDA (a.k.a. acceptance by final state) if (q₀, x, Z₀)

^* (q,

), for some

^*, and an accepting state q.

Like FAs, PDAs can also be represented by transition diagrams. For PDAs, however, arcs are labeled differently than FAs. If

( q , a , X ) = ( p ,

) , then an arc from state p to state q is added to the diagram and it is labeled with ( a , X /

) indicating that X at the top of the stack is replaced by

upon reading a from the input. For example the transition diagram of the PDA of Example 1 is as shown below.

Example 2 :

Let us consider the pushdown automaton < Q ,

, q₀ , Z₀ , A ,

> , where Q = { q₀ , q₁ , q₂ } ,

= { a , b , c } ,

= { a , b , Z₀ } , A = { q₂ } and let

be as given in the following table:

State	Input	Top of Stack	Move
q₀	a	Z₀	( q₀ , aZ ₀ )
q₀	b	Z₀	( q₀ , bZ ₀ )
q₀	a		( q₀ , a )
q₀	b		( q₀ , b )
q₀	c		( q₁ , )
q₁	a	a	( q₁ , )
q₁	b	b	( q₁ , )
q₁		Z₀	( q₂ , Z₀ )

In this table

represents either a or b.

This pushdown automaton accepts the language { wcw^r | w

{ a , b }^* } , which is the set of palindromes with c in the middle.
For example for the input abbcbba, it goes through the following configurations and accepts it.

( q₀ , abbcbba , Z₀ )

( q₀ , bbcbba , aZ₀ )

( q₀ , bcbba , baZ₀ )

( q₀ , cbba , bbaZ₀ )

( q₁ , bba , bbaZ₀ )

( q₁ , ba , baZ₀ )

( q₁ , a , aZ₀ )

( q₁ ,

, Z₀ )

( q₂ ,

, Z₀ ) .

This PDA pushes all the a's and b's in the input into stack until c is encountered. When c is detected, it ignores c and from that point on if the top of the stack matches the input symbol, it pops the stack. When there are no more unread input symbols and Z₀ is at the top of the stack, it accepts the input string. Otherwise it rejects the input string.

The transition diagram of the PDA of Example 2 is as shown below. In the figure

₁ and

₂ represent a or b.

Further topics on CFL

PDA and Context-Free Language

There is a procedure to construct a PDA that accepts the language generated by a given context-free grammar and conversely. That means that a language is context-free if and only if there is a PDA that accepts it. Those procedures are omitted here.
Pumping Lemma for Context-Free Language

Let L be a CFL. Then there is a positive integer n such that for any string u in L with |u| n , there are strings v, w, x, y and z which satisfy

u = vwxyz
|wy| > 0
|wxy| n
for every integer m 0 , vw^mxy^mz L
Parsing and Parsers for CFL

Consider the algebraic expression x + yz. Though we are accustomed to interpreting this as x + (yz) i.e. compute yz first, then add the result to x, it could also be interpreted as ( x + y )z meaning that first compute x + y, then multiply the result by z. Thus if a computer is given the string x + yz, it does not know which interpretation to use unless it is explicitly instructed to follow one or the other. Similar things happen when English sentences are processed by computers (or people as well for that matter). For example in the sentence "A man bites a dog", native English speakers know that it is the dog that bites and not the other way round. "A dog" is the subject, "bites" is the verb and "a man" is the object of the verb. However, a computer like non-English speaking people must be told how to interpret sentences such as the first noun phrase (" A dog") is usually the subject of a sentence, a verb phrase usually follow the noun phrase and the first word in the verb phrase is the verb and it is followed by noun phrases reprtesenting object(s) of the verb.
Parsing is the process of interpreting given input strings according to predetermined rules i.e. productions of grammars. By parsing sentences we identify the parts of the sentences and determine the strutures of the sentences so that their meanings can be understood correctly.
Contect-free grammars are powerful grammars. They can describe much of programming languages and basic structures of natural languages. Thus they are widely used for compilers for high level programming languages and natural language processing systems. The parsing for context-free languages and regular languages have been extensively studied. However, we are not going to study parsing here. Interested readers are referred to the textbook and other sources.

????
references on Parsing
????

Test Your Understanding of Contect-Free Language
Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- Turing Machines

Back to Schedule

Back to Table of Contents

Context-Free Languages

Context-Free Grammar

Subjects to be Learned

Contents

Further topics on CFL

Test Your Understanding of Contect-Free Language