Pushdown Automata
CS390, Fall 2019
Abstract
Pushdown automata (PDAs) can be thought of as combining an NFA “controlunit” with a “memory” in the form of an infinite stack. PDAs are more powerful than FAs, being able to recognize languages that FAs cannot. In fact, the set of languages that can be recognized by PDAs are the contextfree languages of the previous module.
1 The Automaton
1.1 Nondeterministic PDAs
Suppose we couple an NFA (with $\epsilon$ transitions) with a stack:
The moves of this automaton are controlled at each step by
 the current NFA state,
 the current input symbol, and
 the current symbol on top of the stack.
The last is the big change in the way that state transitions are dedermined.
Like a Mealy machine, we will allow our NFA to produce output at each transtion. These outputs are written to the stack after popping the top symbol:

$\epsilon$: nothing is written to the stack (but the top symbol has been popped)

$\alpha$: a string of symbols are pushed onto the stack, one at a time, in effect replacing the symbol that had been at the top.
So on each move, we 1. Change state, and 2. Rewrite the top of the stack.
Definiton: A Pushdown Automaton (PDA) is given as $(Q, \Sigma, \Gamma, \delta, q_0, Z_0, F)$ where 1. Q is a finite set of states. 2. $\Sigma$ is the alphabet of input symbols 3. $\Gamma$ is an alphabet of stack symbols 4. $\delta$ is a transition function mapping $Q \times \Sigma \times \Gamma \rightarrow Q \times \Gamma^*$
 i.e., $\delta(q_i,a,Z) = (q_j,\alpha)$
 $q_0 \in Q$ is the start state
 $Z_0 \in \Gamma$ is the starting symbol on the stack
 $F \subseteq Q$ is a set of final or accepting states.
Example 1: A PDA for $0^n1^n$
 $\delta(q, 0, Z) = \{(q, XZ)\}$
 Push an X onto the stack for the first 0 in the input.
 $\delta(q, 0, X) = \{(q, XX)\}$.
 Push an X onto the stack for each subsequent0 in the input.
 $\delta(q, 1, X) = \{(p, \epsilon)\}$.
 At the first $1$, go to state $p$ and pop one $X$.
 $\delta(p, 1, X) = \{(p, \epsilon)\}$.
 Pop an $X$ on each subsequent $1$.
 $\delta(p, \epsilon, Z) = \{(f, Z)\}$.
 When we reach the (original) bottom of the stack.
We can show PDAs this in a transition diagram similar to the ones we used for FAs:
Instead of labeling each transition with just one symbol (the input), we have to label it with three components $a; X; \alpha$ where
 $a$ is the next input symbol.
 $X$ is the symbol the symbol at the top of the stack.
 $\alpha$ is a string of stack symbols that are to be pushed onto the stack (after popping $X$)
Example 2: A Transition Diagram for $0^n1^n$The diagram here follows the JFLAP conventions
 The initial stack symbol is given as
Z
instead of $Z_0$. As before, $\lambda$ denotes an empty string instead of $\epsilon$.
Here is the JFLAP file for that PDA.
One thing worth pointing out is that the output of $\delta$ is a set. Because the FA controller for a PDA is allowed to be nondeterministic, it is quite possible for the transition on a new (input, stack symbol) will actually be to multiple states.
We did not really take advantage of that in the example above. That automaton is, to all appearances, deterministic.
Let’s look at an example of a PDA that does employ nondeterminism.
Example 3: A PDA for $ww^R$One of the “obvious” things we would expect a machine with a stack to be good at would be reversing strings or, because our automata are about recognizing strings in a language, recognizing when a string has been reversed.
Let’s construct a PDA for the language $ww^R$ where $w$ is any string in $\{0, 1\}^*$ and $w^R$ is the reverse of that string.
Here is the transition diagram for that PDA.
Look at the transitions from
q
to itself. If the input (first element in the triple label) is 0, then no matter what symbol is on top of the stack (the $\epsilon$ / $\lambda$ in the second position), we push a 0 (the third position) onto the stack. Similarly, if the input is 1, then no matter what symbol is on top of the stack, we push a 1 onto the stack. So basically, we are usingq
to store up the characters of $w$ on the stack.
 A $\lambda$ for the stack symbol is a shorthand for “no matter what is on the stack”. (This is a JFLAP shorthand – your text does not allow this.)
It stands for all of the productions that we would obtain by replacing $\lambda$ by each of the possible stack symbols, and modifying the “push string” in the third position to push that symbol back on.
For example, the transition from state $q_0$ to $q_0$ on input “0” is really a shorthand for:
\[ \begin{align} \delta(q_0,0,Z) &= \{ (q_0, 0Z) \} \\ \delta(q_0,0,0) &= \{ (q_0, 00) \} \\ \delta(q_0,0,1) &= \{ (q_0, 01) \} \\ \end{align} \]
Here, I have written the same set of transitions without any $\lambda$s in the stack element position.
Look at the transitions from
p
to itself. If the input is 0 and the top symbol on the stack is 0, we “return” top
after replacing that 0 on the top of the stack by the empty string – in other words we pop the matched 0 from the stack. Similarly, if the input is 1 and the top symbol on the stack is 1, we “return” top
after popping the matched symbol from the stack. (If the input is zero and the symbol is 1, or vice versa, there is no transition from this state, not even back top
.) So, the statep
can be seen as doing the actual matching of the symbols in $w^R$.The “trick” in the design of this PDA is knowing when to stop storing and to start checking for the reversed string. For example, ‘0110’ need to be accepted because ‘01’, reversed, is ‘10’. But ‘0110110110’ should also be accepted, because ‘01101’ reversed is ‘10110’. But if we go from
q
top
after seeing ‘01’, and then start popping characters, and it turns out that the string is going to continue as ‘0110110110’ instead of just as ‘0110’, we will have lost the chance to record the characters we need to detect the longer of those two strings.The answer to this problem is that we don’t actually try to guess when to start reversing the string. Instead, we exploit nondeterminism to simultaneously reverse all prefixes of the input. So, if the input is actually going to be ‘0110’, we will actually simultaneously explore the possibility that the string is going to be ’’, ‘00’, ‘0110’, ‘011110’, and ‘01100110’.
We accomplish this by adding the transition from
q
top
. That transition triggers on every state transition (including the start of the automaton). So, every time we push a character onto the stack inq
, we simultaneously return toq
to push some more input and go top
to start detecting reversed input.That leaves the question of knowing when we are done. Whenever we are in
p
, pop a matched symbol, and wind up looking at the start symbol $Z$ on the stack, we have detected a $ww^R$ string, and immediately go to a final stater
. (Of course, we can’t discount the possibility that more input remains, but if we encounter more input, there is no legal transition fromr
, so we would leaver
on that input.To complete the example, here is the transition function:
 $\delta(q, 0, \epsilon) = \{(q, 0)\}$
 $\delta(q, 1, \epsilon) = \{(q, 1)\}$
 $\delta(q, \epsilon, \epsilon) = \{(p, \epsilon)\}$
$\delta(p, 0, 0) = \{(p, \epsilon)\}$
 $\delta(p, 1, 0) = \{(p, \epsilon)\}$
 $\delta(p, \epsilon, Z) = \{(r, \epsilon)\}$
Here is the JFLAP file for that PDA.
1.2 Running a PDA
As noted earlier, because the FA controller for a PDA is allowed to be nondeterministic, it is quite possible for the transition on a new (input, stack symbol) will actually be to multiple states. Even messier, the multiple parallel transitions may manipulate the stack in different ways, leading to different stack contents for each of those parallel states.
An instantaneous description of a PDA state (“configuration”) is a description of the PDA as a triple: $(q,w,\gamma)$, where
 $q$ is a current state
 $w$ is the remaining input, and
 $\gamma$ is the stack contents, written toptobottom as lefttoright.
For example, if we wanted to run this PDA on the input “0011”, we would describe the starting state as $(q,“0011”,“Z”)$. After processing the first character of input, we would describe the resulting machine as $(q,“011”, “0Z”)$.
We will use the “turnstyle” symbol $\vdash$ to denote a “move” or transition of a PDA. So, for example
\[ (q,“0011”,“Z”) \vdash (q,“011”, “0Z”) \]
We could indicate a series of transitions
\[ \begin{align} (q,“0011”,“Z”) & \vdash (q,“011”, “0Z”) \\ & \vdash (q,“11”, “00Z”) \\ & \vdash (p,“1”, “0Z”) \\ & \vdash (p,“1”, “Z”) \\ & \vdash (r, \epsilon, \epsilon) \\ \end{align} \]
We use $\vdash^*$ to indicate a series of zero or more transitions. Examples would include
\[ \begin{align} (q,“0011”,“Z”) & \vdash^* (q,“0011”,“Z”) \\ (q,“0011”,“Z”) & \vdash^* (p,“1”, “Z”) \\ (q,“11”, “00Z”) & \vdash^* (r, \epsilon, \epsilon) \\ \end{align} \]
The $\vdash^*$ provides a useful way to discuss what can and cannot be recognized by a PDA.
It doesn’t really help a lot with the fundamental problem of tracking the multitude of states that can arise simultaneously during a derivation. See Example 6.4 in your text for an attempt to do this. You can see that the solution attempted there is, at best, a textual hack.
This is a place where a program like JFLAP comes in handy.
1.2.1 Accepting an Input String
Historically, there have been two different ways of indicating that a PDA was to accepct an input string:

Accept a string that leaves the FA controller in a final/accepting state.
This is the technique I have used in the examples above.

Ignore the FA state and accept any string that leaves the stack empty (including having popped the starting stack symbol from the stack).
It’s fairly easy to show that these two methods are equivalent – you can easily transform a PDA that uses one of the methods into a PDA that would accept the exact same language using the other method. The proof is in you textbook. (If you have not encountered it yet, you might want to think about how you would do these transformations before reading the textbook’s approach.)
1.3 Deterministic PDAs
A deterministic PDA is a PDA that is never simultaneously in two or more states, no matter what input it is given. This is a “functional” definition of determinism, not a syntactic one.
For example, this is a deterministic PDA, even though the transition diagram includes an $\epsilon$transition, something that we normally associate with NFAs.
On the other hand, this PDA is very definitely nondeterministic.
Deterministic PDAs are of great practical importance, as discussed later, but not of particularly great theoretical importance.
2 Equivalence of PDAs and CFGs
We can prove that PDAs accept exactly the context free languages by showing that we can convert any CFG into a PDA and any PDA into a CFG,
2.1 Every CFG can be Converted to a PDA
Suppose we have a CFG. We have previously described the process of generating a string by leftmost derivation as
 Begin with a string consisting only of the start symbol.
 Pick the leftmost variable occurrence in the current string.
 Pick any production that has that variable on the left of the $\rightarrow$.
 Replace the chosen occurrence of that variable by the righthand side of the chosen production.
 Repeat steps 24 until the string contains only terminals.
We can turn this into a _parsing algorithm" to recognize if a string is in the language of that grammar by making just a few changes:
 Begin with an input string $s$ and a derived string $\alpha$ consisting only of the start symbol.
 If $\alpha$ begins with one or more terminals, then check to see if the input string $s$ begins with the same terminals. If not, stop the algorithm. If they match, remove those matching terminals from the start of both strings. Go to step 6.
 Let $A$ denote the variable that begins $\alpha$.
 Pick any production that has $A$ on the left of the $\rightarrow$.
 Remove $A$ from the front of $\alpha$ and prepend the righthand side of the chosen production.
 If both $s$ and $\alpha$ are empty, we have successfully parsed the string. If only $\alpha$ is empty, the original input string is not in the language. If $\alpha$ is nto empty, go back to step 2.
The “catch” to this procedure is step 4, where we pick a production. There might be many productions that have $A$ on the left, and if we pick the wrong one our parse will fail to match strings that it should.
But nondeterminism means that we don’t really need to make a choice. We can, in parallel, choose every production with $A$ on the left, and try to carry out the remainder of the parsing in parallel.
With that in mind, we can make an intuitive argument (the formal proof is in the text) that we can map these steps of the above procedure onto a PDA:

Begin with an input string $s$ and a derived string $\alpha$ consisting only of the start symbol.

We will store $\alpha$ on the PDA stack. In this step, we use the start symbol of the grammar as the start symbol on the stack.


If $\alpha$ begins with one or more terminals, then check to see if the input string $s$ begins with the same terminals. If not, stop the algorithm. If they match, remove those matching terminals from the start of both strings. Go to step 6.

This suggest a set of transitions $\delta(p,a,a) = \{(p,\epsilon)\}$, $\forall a \in \Sigma$.


Let $A$ denote the variable that begins $\alpha$.
 Pick any production that has $A$ on the left of the $\rightarrow$.

Again, we’ll pick all of them and let the nondeterminism run rampant.


Remove $A$ from the front of $\alpha$ and prepend the righthand side of the chosen production.

Remember that a PDA transition can write any number of symbols to the stack. So, for a CFG production $A \rightarrow X Y Z\ldots", we would get a PDA transition of the form $\delta(q,\epsilon,A) = \{(r, X Y Z\ldots) \}$. (Remember, we push the strings in reverse order of the way we write them.)


If both $s$ and $\alpha$ are empty, we have successfully parsed the string. If only $\alpha$ is empty, the original input string is not in the language. If $\alpha$ is not empty, go back to step 2.
Your textbook connects all of these elements together.
2.2 Every PDA can be Converted to a CFG
This is, IMO, less obvious. We will create a grammar in which our variable names have the form $[pXq]$ where $p$ and $q$ are states in the PDA and $X$ is a stack symbol.
Now, in a grammar, each variable represents a smaller language in its right. The variable $[pXq]$ is supposed to denote the language of all strings $w$ for which
\[ (p,wz,X\alpha) \vdash^* (q,z,\alpha) \]
i.e., a set of strings that could be consumed when taking us from state $p$ to state $q$, during which time the single symbol $X$ would be popped from the stack.
If we are able to construct a grammar for which those variables actually fulfill that promise, then $[q_0Z_0q_i]$ (for all states $q_i$) would be the language of strings that take us from the starting state of the PDA to any other state, while popping the initial stack symbol $Z_0$ (which means that we have emptied the stack). So that would be the language recognized by a PDA that accepts upon emptying its stack.
Example 4: Constructing a Grammar for $ww^R$Your text gives the construction procedure and proves it in Theorem 6.14. I won’t type all that out here, but to help make it clear why this works, let’s apply the construction procedure to this PDA.
However, I’m going to do something a bit odd. Before we construct the grammar, I’m going to do a sample derivation using it. We can do that because of the definition of the $[qXp]$ variable names.
We’re going to derive “0110”.
Looking at the PDA, it empties its stack (removing the starting stack element $Z$) when making the transition from $p$ to $r$. So we need to start in $q$, end up in $r$, and pop $Z$ on the way. The set of strings that would do that is, by definition, $[qZr]$. So we will want a derivation to start:
\[ \begin{align} S & \Rightarrow [qZr] \\ \end{align} \]
So we are predicting that the grammar we generate will include a production $S \rightarrow [qZr]$.
Now, we expect the first 0 to be "processed by state q. The PDA will push a 0 onto the stack as it consumes that first 0 in the input. That means that, once we have gone past that 0 in the input, we will need to find, in the remaining input, a string that pops that 0 from the stack, leaving us in state $p$, then a string that pops the $Z$ while taking us to state $r$:
\[ \begin{align} S & \Rightarrow [qZr] \\ & \Rightarrow 0 [q0p][pZr] \\ \end{align} \]
So we are predicting that the grammar we generate will include a production $[qZr] \rightarrow 0 [q0p] [pZr]$.
Next up in the input is the first ‘1’. Being intelligent beings who can actually plan ahead, we know that we want this input to be processed in state $q$, pushing a ‘1’ onto the stack, and them immediately take the $\epsilon$transition to state $p$ so we can start matching and popping the second half of the input.
\[ \begin{align} S & \Rightarrow [qZr] \\ & \Rightarrow 0 [q0p][pZr] \\ & \Rightarrow 0 1 [p1p][p0p][pZr] \\ \end{align} \]
So we will be looking for a string that takes us from state $p$ to state $p$ while popping a 1, then from $p$ to $p$ while popping a $0$, then from $p$ to $r$ popping a $Z$.
So we are predicting that the grammar we generate will include a production $[q0p] \rightarrow 1 [q1p] [p0p]$.
Looking at the derivation, and comparing to the PDA, you can start to see what happens with this style of grammar construction.
The variables in the leftmost derivation are encoding the contents that the PDA stack would have once it has recognized all of the terminal symbols to the left of the first variable.
That’s the key insight that motivates the use of this construction to provide that PDAs can be converted to CFGs.
Continuing our derivation, we can fulfill the goal of $[p1p]$ very easily, by simply deriving a ‘1’.
\[ \begin{align} S & \Rightarrow [qZr] \\ & \Rightarrow 0 [q0p][pZr] \\ & \Rightarrow 0 1 [p1p][p0p][pZr] \\ & \Rightarrow 0 1 1 [p0p][pZr] \\ \end{align} \]
We are predicting a production $[p1p] \rightarrow 1$.
We can similarly fulfill the goal of $[p0p]$ by deriving a ‘0’.
\[ \begin{align} S & \Rightarrow [qZr] \\ & \Rightarrow 0 [q0p][pZr] \\ & \Rightarrow 0 1 [p1p][p0p][pZr] \\ & \Rightarrow 0 1 1 [p0p][pZr] \\ & \Rightarrow 0 1 1 0 [pZr] \\ \end{align} \]
We are predicting a production $[p0p] \rightarrow 0$.
Finally, we can fulfill the goal of $[pZr]$ by deriving an empty string..
\[ \begin{align} S & \Rightarrow [qZr] \\ & \Rightarrow 0 [q0p][pZr] \\ & \Rightarrow 0 1 [p1p][p0p][pZr] \\ & \Rightarrow 0 1 1 [p0p][pZr] \\ & \Rightarrow 0 1 1 0 [pZr] \\ & \Rightarrow 0 1 1 0 \\ \end{align} \]
Our derivation is done. We have predicted that we will have the following productions in the grammar:
\[ \begin{align} S & \rightarrow [qZr] \\ [qZr] & \rightarrow 0 [q0p] [pZr] \\ [q0p] & \rightarrow 1 [q1p] [p0p] \\ [p1p] & \rightarrow 1 \\ [p0p] & \rightarrow 0 \\ [pZr] & \rightarrow \epsilon \\ \end{align} \]
That’s no to say that these will be the only productions in our grammar. They are simply the ones we expect to use when deriving “0110”. Other strings in the language may require other productions.
OK, now that we have an idea what we are looking for, let’s construct the grammar.
As noted earlier, the use of $\lambda$ in the stack element position to denote “don’t care” is not used i nthe textbook, and this construction will not work with that shortcut. So we will work from this expanded version of the PDA.
a) For all states $p$, $G$ has the production $S \rightarrow [qZp]$.
So we start with the productions
\[ \begin{align} S &\rightarrow [qZq] \\ S &\rightarrow [qZp] \\ S &\rightarrow [qZr] \\ \end{align} \]
Now, in fact, we know that there is no possible way to pop $Z$ from the stack and then wind up in states $p$ or $q$, so we know that the languages $[qZq]$ and $[qZp]$ are empty. We could actually drop those productions without affecting the language accepted by the CFG we are going to generate.
b) For each transition $\delta(q,a,X) = \{ \ldots, (r, Y_1Y_2\ldots Y_k) \}$, and for all lists of states $r_1, r_2, \ldots, r_k$, G has a production $[qXr_k] \rightarrow a[rY_1r_1][r_1Y_1r_2]\ldots[r_{k1}Y_kr_k]$.
So let’s look at our transitions, one at a time.
 $\delta(q,1,0) = (q,10)$
 There are two symbols being pushed onto the stack, so $k=2$. That means that we need to consider all lists of states of length 2: $[q, q]$, $[q, p]$, $[q, r]$, $[p, q]$, $[p, p]$, $[p, r]$, $[r, q]$, $[r, p]$, $[r, r]$,
So we get new productions
list of states new production $q, q$ $[q0q] \rightarrow 1 [q1q] [q0q]$ $q, p$ $[q0p] \rightarrow 1 [q1q] [q0p]$ $q, r$ $[q0r] \rightarrow 1 [q1q] [q0r]$ $p, q$ $[q0q] \rightarrow 1 [q1p] [p0q]$ $p, p$ $[q0p] \rightarrow 1 [q1p] [p0p]$ $p, r$ $[q0r] \rightarrow 1 [q1p] [p0r]$ $r, q$ $[q0q] \rightarrow 1 [q1r] [r0q]$ $r, p$ $[q0p] \rightarrow 1 [q1r] [r0p]$ $r, r$ $[q0r] \rightarrow 1 [q1r] [r0r]$
 $\delta(q,1,1) = (q,11)$
 There are two symbols being pushed onto the stack, so $k=2$.
So we get new productions
list of states new production $q, q$ $[q1q] \rightarrow 1 [q1q] [q1q]$ $q, p$ $[q1p] \rightarrow 1 [q1q] [q1p]$ $q, r$ $[q1r] \rightarrow 1 [q1q] [q1r]$ $p, q$ $[q1q] \rightarrow 1 [q1p] [p1q]$ $p, p$ $[q1p] \rightarrow 1 [q1p] [p1p]$ $p, r$ $[q1r] \rightarrow 1 [q1p] [p1r]$ $r, q$ $[q1q] \rightarrow 1 [q1r] [r1q]$ $r, p$ $[q1p] \rightarrow 1 [q1r] [r1p]$ $r, r$ $[q1r] \rightarrow 1 [q1r] [r1r]$
 $\delta(q,1,Z) = (q,1Z)$
 There are two symbols being pushed onto the stack, so $k=2$.
So we get new productions
list of states new production $q, q$ $[qZq] \rightarrow 1 [q1q] [qZq]$ $q, p$ $[qZp] \rightarrow 1 [q1q] [qZp]$ $q, r$ $[qZr] \rightarrow 1 [q1q] [qZr]$ $p, q$ $[qZq] \rightarrow 1 [q1p] [pZq]$ $p, p$ $[qZp] \rightarrow 1 [q1p] [pZp]$ $p, r$ $[qZr] \rightarrow 1 [q1p] [pZr]$ $r, q$ $[qZq] \rightarrow 1 [q1r] [rZq]$ $r, p$ $[qZp] \rightarrow 1 [q1r] [rZp]$ $r, r$ $[qZr] \rightarrow 1 [q1r] [rZr]$
 $\delta(q,0,0) = (q,00)$
 There are two symbols being pushed onto the stack, so $k=2$.
So we get new productions
list of states new production $q, q$ $[q0q] \rightarrow 0 [q0q] [q0q]$ $q, p$ $[q0p] \rightarrow 0 [q0q] [q0p]$ $q, r$ $[q0r] \rightarrow 0 [q0q] [q0r]$ $p, q$ $[q0q] \rightarrow 0 [q0p] [p0q]$ $p, p$ $[q0p] \rightarrow 0 [q0p] [p0p]$ $p, r$ $[q0r] \rightarrow 0 [q0p] [p0r]$ $r, q$ $[q0q] \rightarrow 0 [q0r] [r0q]$ $r, p$ $[q0p] \rightarrow 0 [q0r] [r0p]$ $r, r$ $[q0r] \rightarrow 0 [q0r] [r0r]$
 $\delta(q,0,1) = (q,01)$
 There are two symbols being pushed onto the stack, so $k=2$.
So we get new productions
list of states new production $q, q$ $[q1q] \rightarrow 0 [q0q] [q1q]$ $q, p$ $[q1p] \rightarrow 0 [q0q] [q1p]$ $q, r$ $[q1r] \rightarrow 0 [q0q] [q1r]$ $p, q$ $[q1q] \rightarrow 0 [q0p] [p1q]$ $p, p$ $[q1p] \rightarrow 0 [q0p] [p1p]$ $p, r$ $[q1r] \rightarrow 0 [q0p] [p1r]$ $r, q$ $[q1q] \rightarrow 0 [q0r] [r1q]$ $r, p$ $[q1p] \rightarrow 0 [q0r] [r1p]$ $r, r$ $[q1r] \rightarrow 0 [q0r] [r1r]$
 $\delta(q,0,Z) = (q,0Z)$
 There are two symbols being pushed onto the stack, so $k=2$.
So we get new productions
list of states new production $q, q$ $[qZq] \rightarrow 0 [q0q] [qZq]$ $q, p$ $[qZp] \rightarrow 0 [q0q] [qZp]$ $q, r$ $[qZr] \rightarrow 0 [q0q] [qZr]$ $p, q$ $[qZq] \rightarrow 0 [q0p] [pZq]$ $p, p$ $[qZp] \rightarrow 0 [q0p] [pZp]$ $p, r$ $[qZr] \rightarrow 0 [q0p] [pZr]$ $r, q$ $[qZq] \rightarrow 0 [q0r] [rZq]$ $r, p$ $[qZp] \rightarrow 0 [q0r] [rZp]$ $r, r$ $[qZr] \rightarrow 0 [q0r] [rZr]$
 $\delta(q,\epsilon,1) = (p,1)$
 This time, $k=1$, and we have only three possible lists of states of length 1.
list of states new production $q$ $[q1q] \rightarrow [p1q]$ $p$ $[q1p] \rightarrow [p1p]$ $r$ $[q1r] \rightarrow [p1r]$
 $\delta(q,\epsilon,0) = (p,0)$
 $k=1$
list of states new production $q$ $[q0q] \rightarrow [p0q]$ $p$ $[q0p] \rightarrow [p0p]$ $r$ $[q0r] \rightarrow [p0r]$
 $\delta(q,\epsilon,Z) = (p,Z)$
 $k=1$
list of states new production $q$ $[qZq] \rightarrow [pZq]$ $p$ $[qZp] \rightarrow [pZp]$ $r$ $[qZr] \rightarrow [pZr]$
 $\delta(p,1,1) = (p,\epsilon)$
 $k=0$
$[p1p] \rightarrow 1$
 $\delta(p,0,0) = (p,\epsilon)$
 $k=0$
$[p0p] \rightarrow 0$
 $\delta(\epsilon,Z,\epsilon) = r,\epsilon)$
 $k=0$
$[pZr] \rightarrow \epsilon$
And we’re done. The total grammar is
\[ \begin{align} S &\rightarrow [qZq] \\ S &\rightarrow [qZp] \\ \color{red}{S} &\color{red}{\rightarrow [qZr]} \\ [q0q] &\rightarrow 1 [q1q] [q0q] \\ [q0p] &\rightarrow 1 [q1q] [q0p] \\ [q0r] &\rightarrow 1 [q1q] [q0r] \\ [q0q] &\rightarrow 1 [q1p] [p0q] \\ \color{red}{[q0p]} &\color{red}{\rightarrow 1 [q1p] [p0p]} \\ [q0r] &\rightarrow 1 [q1p] [p0r] \\ [q0q] &\rightarrow 1 [q1r] [r0q] \\ [q0p] &\rightarrow 1 [q1r] [r0p] \\ [q0r] &\rightarrow 1 [q1r] [r0r] \\ [q1q] &\rightarrow 1 [q1q] [q1q] \\ [q1p] &\rightarrow 1 [q1q] [q1p] \\ [q1r] &\rightarrow 1 [q1q] [q1r] \\ [q1q] &\rightarrow 1 [q1p] [p1q] \\ [q1p] &\rightarrow 1 [q1p] [p1p] \\ [q1r] &\rightarrow 1 [q1p] [p1r] \\ [q1q] &\rightarrow 1 [q1r] [r1q] \\ [q1p] &\rightarrow 1 [q1r] [r1p] \\ [q1r] &\rightarrow 1 [q1r] [r1r] \\ [qZq] &\rightarrow 1 [q1q] [qZq] \\ [qZp] &\rightarrow 1 [q1q] [qZp] \\ [qZr] &\rightarrow 1 [q1q] [qZr] \\ [qZq] &\rightarrow 1 [q1p] [pZq] \\ [qZp] &\rightarrow 1 [q1p] [pZp] \\ [qZr] &\rightarrow 1 [q1p] [pZr] \\ [qZq] &\rightarrow 1 [q1r] [rZq] \\ [qZp] &\rightarrow 1 [q1r] [rZp] \\ [qZr] &\rightarrow 1 [q1r] [rZr] \\ [q0q] &\rightarrow 0 [q0q] [q0q] \\ [q0p] &\rightarrow 0 [q0q] [q0p] \\ [q0r] &\rightarrow 0 [q0q] [q0r] \\ [q0q] &\rightarrow 0 [q0p] [p0q] \\ [q0p] &\rightarrow 0 [q0p] [p0p] \\ [q0r] &\rightarrow 0 [q0p] [p0r] \\ [q0q] &\rightarrow 0 [q0r] [r0q] \\ [q0p] &\rightarrow 0 [q0r] [r0p] \\ [q0r] &\rightarrow 0 [q0r] [r0r] \\ [q1q] &\rightarrow 0 [q0q] [q1q] \\ [q1p] &\rightarrow 0 [q0q] [q1p] \\ [q1r] &\rightarrow 0 [q0q] [q1r] \\ [q1q] &\rightarrow 0 [q0p] [p1q] \\ [q1p] &\rightarrow 0 [q0p] [p1p] \\ [q1r] &\rightarrow 0 [q0p] [p1r] \\ [q1q] &\rightarrow 0 [q0r] [r1q] \\ [q1p] &\rightarrow 0 [q0r] [r1p] \\ [q1r] &\rightarrow 0 [q0r] [r1r] \\ [qZq] &\rightarrow 0 [q0q] [qZq] \\ [qZp] &\rightarrow 0 [q0q] [qZp] \\ [qZr] &\rightarrow 0 [q0q] [qZr] \\ [qZq] &\rightarrow 0 [q0p] [pZq] \\ [qZp] &\rightarrow 0 [q0p] [pZp] \\ \color{red}{[qZr]} &\color{red}{\rightarrow 0 [q0p] [pZr]} \\ [qZq] &\rightarrow 0 [q0r] [rZq] \\ [qZp] &\rightarrow 0 [q0r] [rZp] \\ [qZr] &\rightarrow 0 [q0r] [rZr] \\ [q1q] &\rightarrow [p1q] \\ [q1p] &\rightarrow [p1p] \\ [q1r] &\rightarrow [p1r] \\ [q0q] &\rightarrow [p0q] \\ [q0p] &\rightarrow [p0p] \\ [q0r] &\rightarrow [p0r] \\ [qZq] &\rightarrow [pZq] \\ [qZp] &\rightarrow [pZp] \\ [qZr] &\rightarrow [pZr] \\ \color{red}{[p1p]} &\color{red}{\rightarrow 1} \\ \color{red}{[p0p]} &\color{red}{\rightarrow 0} \\ \color{red}{[pZr]} &\color{red}{\rightarrow \epsilon} \\ \end{align} \]
Not exactly pretty. But it appears plausible, at least. I have marked in red the productions that we predicted would need to be included to derive the string “0110”.
We can be pretty certain that this grammar is
highly ambiguous,
 Productions like $[q0q] \rightarrow [q0q]$ are kind of a giveaway.
may contain some variables representing an empty language (in which case they and their productions could be dropped),
 For example, the only transition that can empty the stack is the one from $p$ to $r$, and there is no way to get from $r$ back to either of the other states. So there are no strings in $[qZq]$ because there is no way to return back toe $q$ after having popped $Z$ from the stack.
or both.
Even though the grammar is huge, the derivations are fairly short (once you find them) because every step in this derivation adds an input character and/or removes a variable.
In fact, it is possible to give a grammar for $ww^R$, where $w \in \{0, 1\}^*$, in just 3 productions. Can you do so?
3 Applications: Common Parsing Algorithms
The development of CFGs helped turn compiler development from one of the major challenges of early computer science to a common activity that can be tackled by relatively small teams with a high degree of success.
Part of this is due to the more widely spread understanding of common parsing algorithms, so that people don’t have to work as hard to actually get the code written. Part of it is because programming language designers have learned how to design their languages to kepp them easy to parse.
Here is an example of a parsing algorithm, called “predictive parsing”, adapted from Principles of Compiler Design by Aho, Sethi, and Ullman, a classic text in the field.
Predictive parsing is driven by a parse table. This table, called M
in the algorithm, is a twodimensional table indexed by a nonterminal symbol (variable) in a CFG and an input symbol:
k = M[X,a]
means that if the top of the stack is X
and we see input a
coming up, then we want to derive using the $k^{th}$ production of the grammar.
parse (String w, Table M)
{
ip = 0;
stack = emptyStack;
push S (starting symbol of grammar) onto stack
repeat {
X = stack.top();
a = w[ip];
if (X is a terminal) {
if (X == a) {
stack.pop();
++ip;
} else {
syntaxerror();
}
} else { // X is a nonterminal
if (M[X,a] >= 0) {
pop X from the stack;
R[] = symbols form right hand side of kth production
push R[n1], R[N1], ..., R[0] onto stack;
} else {
syntaxerror();
}
}
} until stack is empty
}
Looking at this algorithm,

The resemblance to the parsing algorithm I gave earlier should be fairly clear.

However, that earlier algorithm relied on nondeterminism to explore all possible derivation steps from a given variable in parallel. There’s nothing like that in this algorithm. Instead, the table “magically” chooses the best production right hand side to employ.


At its heart, we have a table, indexed by an input and a stack symbol, that tells us what to push onto the stack and what inputs must match to continue running.

Sounds like a PDA – not even a particularly complicated one. There are no explicit states, but you might argue that there are a few implicit states embedded in the algorithm logic. Not many though.

One thing that may seem unPDAlike. This algorithm may look at the next incoming input and use it to index the table without actually “consuming” it. In compiler parlance, this is referred to as a lookahead character. So the same input symbol can trigger several transitions.
There’s ways to fake that behavior with a PDA, however. In fact, one could make an argument that $\epsilon$transitions already allow something very similar to take place.

So this algorithm appears to have its roots in a PDA, but without nondeterminism. This is why the comment has been made that most programming languages fall into the class of languages that can be recognized by deterministic PDAs.
So how do we actually get tables smart enough to always choose the proper production rather than exploring several in parallel? Well, that’s really a topic for a compilers’ course, but it’s fair to say that the algorithms for computing the parse tables are more complicated and more time consuming than the parsing algorithms that use them.