Citation
Syntax parsing for natural language understanding in artificial intelligence

Material Information

Title:
Syntax parsing for natural language understanding in artificial intelligence
Creator:
Qin, Bin
Publication Date:
Language:
English
Physical Description:
vii, 70 leaves : illustrations ; 29 cm

Subjects

Subjects / Keywords:
Natural language processing (Computer science) ( lcsh )
Natural language processing (Computer science) ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references.
General Note:
Submitted in partial fulfillment of the requirements for the degree, Master of Science, Department of Electrical Engineering.
Statement of Responsibility:
by Bin Qin.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
26909610 ( OCLC )
ocm26909610
Classification:
LD1190.E54 1992m .Q56 ( lcc )

Full Text
SYNTAX PARSING
FOR NATURAL LANGUAGE UNDERSTANDING
IN ARTIFICIAL INTELLIGENCE
by
Bin Qin
B.S., University of Science and Technology of Beijing, 1984
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Electrical Engineering
1992


This thesis for the Master of Science
degree by
Bin Qin
has been approved for the
Department of
Electrical Engineering
by
Wi
Wolfe
4/z
Date


Bin Qin (M.S., Electrical Engineering)
Syntax Parsing for Natural Language Understanding in Artificial Intelligence
Thesis directed by Professor William Wolfe
The syntax parsing is an important part to the theory of Natural Language
Understanding. It is used to examine how the syntactic structure of a sentence can be
computed. For the computation, two things need to be considered: the grammar,
which is a formal specification of the structures allowable in the language, and the
parsing technique, which is the method of analyzing a sentence to determine its
structure according to the grammar. A syntax parsing study is presented in BQSysl. In
BQSysl, the grammatical formalism is Context-Free grammar, and the parsing
technique is a top-down backtracking algorithm in conjunction with a chart. BQSysl
can parse a small set of simple English sentences. After parsing, it provides linguistic
messages for each word of an input sentence and parsing messages for the sentence
structure. A very efficient searching function applied to a binary tree structure of a
dictionary is shown to produce very quick and accurate results. The computational
results show that using the grammar and parsing algorithm BQSysl gives very good
agreement in both word and parsing messages.
The form and content of this abstract are approved. I recommend its publication.
Signed
William Wolfe


To my wife and parents


ACKNOWLEDGMENTS
I would like to take this opportunity to thank my advisor, Professor William
Wolfe, for the guidance and encouragement that he provided during all phases of the
preparation of this thesis. It has been a privilege to work with him, and I hope that I
have acquired some of his insight in the process. Thanks also go to the other two
members of my committee, Carol Keene and Joe Thomas, for their contribution to my
development.
I would also like to thank my wife, my parents and my friends for their unflagging
support. Their encouragement kept me going, and I am in their debt. It is impossible to
list them all, but a few deserve special mention: Dr. Low Edwin, Steve Frenzsufiak
and Randy Wicker.


viii
1
1
2
3
7
10
11
13
14
17
18
21
23
24
28
CONTENT
List of Figures.................................
INTRODUCTION....................................
What Is Natural Language Understanding?.........
Why Solve The Natural Language Understanding Problem?
Is The Natural Language Understanding Problem Solvable?
Why Do We Need Syntax Parsing?..................
LINGUISTIC ANALYSIS OF SYNTAX PARSING...........
Simple Finite State Diagram.....................
Recursive Finite State Diagrams.................
Context-Free Grammar............................
I
The Weakness Of CFG Parsing.....................
Generative Capacity.............................
PARSING ALGORITHM...............................
BQSysl..........................................
CFG Rules.......................................
Parsing Algorithm


Lexicon i..............................................................29
Test Cases...i.........................................................30
5 CONCLUSIONS AND OPEN QUESTIONS............................. 37
Synopsis Of The Thesis..................................... 37
Open Questions..............................................38
APPENDIX A......................................40
APPENDIX B.................................... 53
BIBLIOGRAPHY....................................69
I
Vll


LIST OF FIGURES
1.1 A Dialogue with ELIZA..........................................6
1.2 The Flow of Information..........................................8
2.1 The Finite State Diagram for Example 2.1........................12
2.2 Grammar 1.......................................................14
2.3 Grammar 2.......................................................14
2.4 A Parsie Tree for a Sentence....................................17
2.5 An Ambiguity in Parsing for a Sentence..........................19
3.1 Grammar 3 A Simple Context-Free Grammar.........................40
4.1 A Context-Free Grammar in BQSysl................................27


CHAPTER 1
INTRODUCTION
Syntax Parsing is used to transform sequences of words into structures
that show how the words relate to each other. It is one step in the process
of natural language understanding. In order to understand the role of syntax
parsing in a Natural Language Understanding system, we must address the
following questions:
1. "What is Natural Language Understanding?"
2. "Why solve the Natural Language Understanding problem?"
3. "Is the Natural Language understanding problem solvable?"
4. "Why do we need Syntax Parsing?"
What Is Natural Language Understanding?
The question "What is natural language understanding?" is very
difficult to answer. We use language in such a wide variety of situations
that no single definition of understanding is able to account for them all.
Basically, to understand something is to transform it from one


representation into another, where this second representation has been
chosen to correspond to a set of available actions that could be performed
and where the mapping has been designed so that for each event, an
appropriate action will be performed (Rich and Knight [1]). As we talk
about understanding, it is important to keep in mind that the success or
failure of an "understanding"problem can rarely be measured in an absolute
sense but must instead be measured with respect to a particular task to be
performed. Given the sentence "A Honda Civic Hatchback car is useful."
is true to a small family, but not to a big family.
Why Solve The Natural Language Understanding Problem?
Many useful tasks could be performed if the computer could be
programmed; to extract the content of a message that has been delivered to it
or stored in it in natural language form. As computers get better at
reflecting human thought processes, they also should become as easy to use
as televisions we will just turn them on and type (possibly "speak") to
them as we would to another person (Schank [2]).
The real problem with today's computer is that it does not understand
us when we use language the natural way. When we sit down in front of the
2


computer, we can tell right away that we are not interacting with an
intelligent entity that understands our words in the ways we intended. The
extent to which the computer understands is precisely and only the extent to
which it responds to what we say in a way we expect.
The computer's ability to understand anything is merely a function of
our ability to program it to do so. If we can program it to understand
English and to respond to sentences and stories with the kind of logical
conclusions and inferences an average human would make, this would be
quite an achievement. But before we even tackle such a problem we will
have to learn how humans understand such sentences and form their
responses to them. Now, other questions have arisen "Do we really know
how humans understand sentences?", and "Is the Natural Language
Understanding problem solvable?"
Is The Natural Language Understanding Problem Solvable?
"Understandjing" is hard. It is very unlikely that two people would ever
i
agree on what the word "love" represents. In general, the most problematic
"things" we try to represent are distinctly human things or ideas, such as
"justice", "virtue", "democracy", "beauty" and so on. No two humans have
3


the same views or feelings about these "things", and this fact severely limits
our understanding of one another when we use these representations. The
goal in getting a computer to understand a natural language such as English
clearly can not be to get a computer to understand what "love" or "truth" or
"beauty" are and then tell us the answer. No human really understands what
any other human means when he uses these words or tries to explain what
they mean, so it indeed would be surprising if a computer could exhibit this
level of understanding. This is an abstract concept problem. Before
deciding whether the understanding problem is solvable, we need to define
solvable. Solvable means if given any word, we know what it means.
Otherwise it is unsolvable. As long as the abstract concept problem cannot
be solved, the understanding problem is unsolvable. It is a fact that so far
no existing system or theory can prove that "Understanding" is solvable.
However, it is interesting enough if we can partially solve the
"Understanding"' problem. The fact that people do not understand the
abstract concepts in the same way does not prevent people from
understanding the more practical, every day uses of language to
communicate wants and needs. Much of our communication concerns
concrete and simple concepts. We understand statements like "Please pass
4


the book." and "I'm so tired." without too much difficulty. Programming a
computer to process this portion of our communication just the way we do
is possible and is potentially of great use.
Our ability to understand a sentence lies in just how we associate
meanings and add our own ideas to help find a meaning for the original
sentence. In order to understand the sentence we had to use quite a
substantial amount of knowledge.
Consider a concrete example the ELIZA program that was
developed in the mid-1960s at MIT. This program must be one of the most
popular artificial intelligence programs of all time. ELIZA was never
claimed to embody a theory of language comprehension and production,
and yet its behavior seems impressive. The system plays the role of a
therapist and, to obtain the best results, the user should correspondingly
play the role of a patient. Figure 1.1 (Allen [3]), presents a transcript of
this system in operation.
Here is a simple description of the basic mechanism of ELIZA. There
is a database of particular words that are called keywords. For each
keyword, the system stores an integer, a pattern to match against the input,
and a specification of the output. The algorithm is as follows: given a
sentence S, find a keyword in S whose pattern matches S. If there is more
5


User
Men are all alike.
They're always bugging us about something or other.
Well, my boyfriend made me come here.
He says I'm depressed much of the time.
It's true, I am unhappy.
I need some help; that much seems certain.
Perhaps I could leant to get along with my mother.
My mother takes care of me.
My father.
You are like my father in some ways.
ELIZA
In what way?
Can you think of a specific example?
Your boyfriend made you come here.
I am sorry to hear you are depressed.
Do you think coming here will help you not to be
unhappy?
What would it mean to you if you got some help?
Tell me more about your family.
Who else in your family takes care of you?
Your father.
What resemblance do you see?
Figure 1.1 A dialogue with ELIZA
than one keyword, pick the one with the highest integer value. Use the
output specification that is associated with this keyword to generate the next
sentence. If there are no keywords, generate an innocuous continuation
statement, such as Tell me more or Go on.
This program does not understand the conversation that it is
participating in. Rather, it is a simple collection of tricks. Given this, why
does ELIZA appear to function so well? There are several reasons. Perhaps
the most important reason is that, when people hear or read a sequence of
words that they Understand as a sentence, they attribute meaning to the
sentence and assume that the person (or machine) that produced the
sentence actually intended that meaning. People are extremely good at
distinguishing word meanings and interesting sentences to fit the context.
As a result, ELIZA appears to be intelligent because you use your own
6


intelligence to make sense of what it says.
From this example, we can see that Natural Language Understanding is
very attractive. A lot of computer scientists, educators, psychologists,
linguists, information specialists, medical administers, file managers, and
also non specialists are working on it.
Why Do We Need Syntax Parsing?
A language-comprehension program must have considerable knowledge
about the structure of the language itself, including what the words are,
how to combine the words into sentences, what the words mean, how these
word meanings contribute to the sentence meaning, and so on.
According to the Allen's book [3], a Natural Language Understanding
system consists of three large phases of processing that are based on the
computational techniques used in each phase, as shown in Figure 1.2 (Allen
[3]).
The parsing phase covers those steps that affect the processing of
sentences into structural descriptions by using a grammatical description of
linguistic structure. This phase uses the syntactic and the morphological
knowledge. The semantic interpretation phase consists of the mapping of
7


/ etc the pfzza
The Parser
(S SUBJ (NP PRO I)
MA1N-V ate
TENSE PAST
OBJ (NP DET the
HEAD pizza ))
\
The Sementio Interpreter
1 (PAST el EAT-EVENT [AGENT [PRO pi PERSON "1")] [THEME (DEF/S1NG p2 PIZZA)])
The Contextual Interpreter
ATEfJAMESJ, PIZZA4, TIME3) & BEFORE-NOWfT/ME3)
Figure 1.2 The flow of information
the structural description of the sentence into a logical form, which
represents the meaning of the sentence independent of the context. This
phase uses the semantic knowledge plus parts of the pragmatic knowledge.
Finally, the contextual interpretation phase maps the syntactic and logical
forms into a final representation of the effects of understanding the
sentence. This final meaning is represented using a language that supports
an inference component to model natural reasoning. This phase includes the
remainder of the pragmatic knowledge and general world knowledge.
8


I
In a Natural Language Understanding system based on the linguistic
view, we need syntax parsing to analyze how words are put together to
form sentences that look correct in the language and to identify how one
word relates to another (for example, whether one word modifies another,
or is unrelated). In my thesis, I will focus on syntax parsing.
9


CHAPTER 2
LINGUISTIC ANALYSIS OF SYNTAX PARSING
"Syntax means sentence structure: the orderly arrangement, relation,
agreement of parts of the sentence in accordance with usage, or custom. It
has to do with the use (or construction) of words, phrases, and clauses in a
given sentence." (House and Harman [4]).
To examine how the syntactic structure of a sentence can be computed,
two things need to be considered: the grammar, which is a formal
specification of the structures allowable in the language, and the parsing
technique, which is the method of analyzing a sentence to determine its
structure according to the grammar.
The moist common ways to represent grammars are as a finite state
diagram (FSD), and as a set of production rules (i.e. context-free grammar,
CFG).


Simple Finite State Diagram
"One way of providing a finite representation of morphemic structure
of English sentences is by means of a machine called a 'finite state
grammar'. A finite state grammar can be represented graphically in the
form of a 'state diagram'." (Chomsky [5]).
A finite state diagram is a labeled directed graph in which the nodes
rqjresent the states of the machine and the arcs are obtained from the
transition function.
Definition 2.1
"The state diagram of a finite state machine (Q, £, 5, q0, F) is a
labeled directed graph G defined by the following conditions:
1) The nodes of G are the elements of Q.
2) The labels on the arcs of G are elements of E.
3) qo is the initial node, depicted > O.
4) F is the set of accepting nodes; each accepting node is depicted (
5) There is an arc from node qj to qj labeled a if 5( qj a )=qj
6) For every node qj and symbol a, there is exactly one arc labeled a
leaving qj. (Sudkamp [6]).
li


Example 2.1
The finite state diagram
wet
Figure 2.1 The finite state diagram for Example 2.1
accepts the sentences: The books are red;
The book is red;
The wet books are red;
The wet book is red.
The sentences are accepted since the halting state of the machine,
which is also terminal state of the paths, is the accepting state
Given a state diagram, we produce a sentence by tracing a path from
the initial point on the left to the final point on the right, always proceeding
in the direction of the arrows. Having reached a certain point in the
diagram, we can proceed along any path leading from this point, whether or
not this path has been traversed before in constructing the sentence in
question. Each node in such a diagram thus corresponds to a state of the
12


machine. We can allow transition from one state to another in several ways,
and we can have any number of closed loops of any length.
If we adopt this conception of language, we can view the speaker as
being essentially a machine of the type considered. In producing a sentence,
the speaker beings in the initial state, produces the first word of the
sentence, thereby switching into a second state which limits the choice of
the second word, etc. Each state through which he passes represents the
grammatical restrictions that limit the choice of the next word at this point
in the utterance.
From the example above, it is clear that a finite state machine can be
used to check the syntax of a sentence to see whether the last word of a
sentence reaches state %, and parse a sentence to see which part of the
sentence belongs to subject, verb and object; state % and % mean the first
two or three words could be the subject of the sentence if it is a valid
sentence according to the finite state grammar.
Recursive Finite State Diagrams
The simple state diagram formalism is not powerful enough to describe
all languages that can be described by a Context-Free Grammar (CFG). To
13


get the descriptive power of CFGs, a notion of recursion is needed. A
recursive state diagram is like a simple finite state diagram, except that it
allows arc labels that refer to other finite state diagram rather than word
categories. Thus, given the NP diagram in Grammar 1, a state diagram
for simple English sentences can be expressed as shown in Grammar 2.
Uppercase labels refer to state diagrams, The arc from S to St can be
followed only if the NP state diagram can be successfully traversed to a
halting state.
adj
Figure 2.2 Grammar 1
Figure 2.3 Grammar 2
Context-Free Grammar
In 1957, the linguist Noam Chomsky first suggested that, instead of
trying to list the Sentence types of each language, linguists should treat
14


syntax as a problem in set theory how to give a finite description of an
infinite set. One way to describe an infinite set is to give a generative
procedure that "generates" or points out, all the members of the set. In
order to do that, Chomsky defined Context-Free Grammar which is used
as context-free phrase structure rules.
A context-free grammar (CFG) is a finite set of variables (also called
non terminals of syntactic categories) each of which represents a
language. The languages represented by the variables are described
recursively in terms of each other and primitive symbols called terminals.
The rules relating the variables are called productions. A typical
production states that the language associated with a given variable contains
strings that are formed by concatenating strings from the languages of
certain other variables, possibly along with some terminals.
Example 2.2
A CFG is



< v >





> NP VP;
> d Adj NP;
> n;
> v NP;
> likes;
> boy;
> gun;
---> a;
> the;
> little.
15


S a sentence; NP a noun phrase; VP a verb phrase; v a
verb; n a noiin; d a determiner; adj an adjective.
Definition 2.2
"A context-free grammar (CFG) is denoted G=(V, T, P, S), where
V and T are finite sets of variables and terminals, respectively. We
assume that V and T are disjoint. P is a finite set of productions; each
production is of the form A > a, where A is a variable and a is a string
of symbols from (VUT). Finally, S is a special variable called the start
symbol." (Hopcroft and Ullman [6]).
CFGs are a very important class of grammars for two reasons. The
formalism is powerful enough to be able to describe most of the structure in
natural languages, and yet it is restricted enough so that efficient parsers
can be built to analyze sentences. Symbols that cannot be further
decomposed in a grammar, such as d, n, adj, and v in the preceding
example, are called terminal symbols. The other symbols, such as NP,
VP, and S, are called non terminal (or variable) symbols. The terminal
symbols are actually word categories, and a structure called the lexicon
maintains a list of all words that fall in each category. Of cause, many
16


words will be listed under multiple categories. For example, can would be
listed under v, and n.
When we use a grammar to check a sentence, we need to do a parsing.
The parsing process takes the rules of the grammar and compares them
against the input sentence. Each rule that matches adds something to the
complete structure that is being built for the sentence. The simplest to build
is a parse tree, which simply records the rules in Example 2.1 and how they
are matched. Figure 2.1 shows the parse tree that would be produced for
the sentence "The little girl likes flowers."
s
d adj NP
v
NP
n
n
The
little girl likes flowers
Figure 2.4 A parse tree for a sentence
The Weakness Of CFG Parsing
The parsing is used to analyze a sentence to determine its structure
17


according to the grammar. Unfortunately, there exists an ambiguous
problem in the parsing. Figure 2.5 (Rich and Knight [1]) shows that for the
sentence, "John saw the boy in the park with a telescope.", there are three
valid parsing trees. The first denotes that John used a telescope to see the
boy. The second denotes that the boy has a telescope with him. The third
denotes that there is a telescope in the park. Obviously they do not denote a
same meaning of the sentence. In practice, depth-first search, or breadth-
first search, or semantic part will decide which parsing tree will be
triggered.
I
Generative Capacity
Any language generated by a CFG can be generated by a recursive
finite state diagram, and vice versa. Thus they are equivalent in their
generative capacity. Here no proof is shown. But it is easy to prove that
given any language which is generated by a CFG, we can find a recursive
finite state diagram which can generate the language.
Some languages are not generatable by a CFG. One example is the
language that consists of a sequence of a's, followed by the same number of
b's, followed by the same number of c's that is abc, aabbcc, aaabbbccc,
18


and so on. Similarly, no context-free grammar can generate the language
John saw the boy in the park with a telescope
S
NP VP
det n PP PP
. I I
John saw. i the boy in the park with a telescope
S
NP VP
det n PP
John saw the boy in the park with a telescope
19


Figure 2.5 An ambiguity in parsing for a sentence
that consists of any sequence of letters repeated in the same order twice,
such as abab, abcabc, acdabacdab, and so on. A grammatical formalism,
called context-sensitive grammar (CSG), can generate such sequences,
however. A context-sensitive grammar may have more than one symbol on
the left-hand side, as long as the number of symbols on that side is less than
or equal to the number on the right-hand side. For example, XY > YX
is a legal context-sensitive rule that would change the order of constituents
Xand Y.
It is very hard to apply CSG in languages. Thus, most natural
language understanding applications use the CFG approach in syntax
parsing.
20


CHAPTER 3
PARSING ALGORITHM
In this chapter a Top-Down CFG parser with a Chart (Allen [3]) is
described. The algorithm for processing a single position is as follows:
1. If the leftmost symbol in the current state names an entry on the chart,
then generate the new state(s) by removing the symbol and updating
the sentence position to the position(s) after the chart entry(ies). For
example, given the following position and chart,
Current State Backup States Position Action
(NP VP) NIL 1 Step 1
NP
NP
1
1 2 3 4
two NPs; are found on the chart and two new states are generated. One
becomes the new current state, and the other a backup. That is, the
resulting situation is
Current State Backup States Position Action
(VP) 2 Step 4
(VP) 4
2. If the leftmost symbol is a construction flag, such as , add a
constituent onto the chart for the symbol (NP) range from the starting
position (1) to the current position. For example, given a current state
of ( < NP2 > VP) at position 5, you would add an NP to the chart from
position 2 to 5.


3. Otherwise, if the symbol is a terminal symbol, check the next word in
the sentence for inclusion in that category, and add to the chart if
successful.
4. Otherwise, if the symbol is a non terminal symbol, add a construction
flag to the position and rewrite the symbol according to the grammar
rules. For example, given the state (NP VP) at position 1 and an empty
chart, you would produce three new states using Grammar 3: (ART
NOUN VP), (ART ADJ NOUN VP), and (ADJ
NOUN VP), all at position 1.
5. Otherwise, this state is rejected and a backup state is moved to be the
current state."
Using this algorithm, consider parsing the sentence The green
water evaporated (see Appendix A).
22


CHAPTER 4
BQSysl
BQSysl is the beginning phase of my syntax parsing system. The
programs of BQSysl are written in Borland C++, version 3.0 and runs
under Microsoft Windows 3.0 on an IBM PC 386-33 MH machine.
BQSysl uses a set of CFG rules to parse sentences. The rules are
chosen based on the following reason:
The theory of Context-Free grammar is rooted in the idea that there is
a finite number of core or kernel sentence patterns with which all native
speakers become familiar. These kernel sentences are the basic elementary
sentences of the language. All other sentences are then created, by means of
a systematic set Of transformation rules( see CFG rules on Page 37) from
the terminal strings which underlie the kernel sentences.
Kernel sentences can be defined as those sentences which are derived
exclusively from CFG rules. It is from the terminal strings underlying these
kernel sentences that all the other sentences of the language are then
formed.


The parsing algorithm in BQSysl is revised from Allen's book [3], and
a recursive and depth-first search is applied on a binary tree of dictionary
and a set of CFG rules.
CFG Rules
The Context-Free Grammar rules in BQSysl are used to handle a set of
kernel sentences. The rules given here will, admittedly, generate only a
small number of [English sentences, but they show the general idea how the
CFG rules can been used in the syntax parsing of natural language
processmg.
CFG Rule: Interpretation of CFG Rule:
1. S ~> NP+VP
2. NP>D+N
i
3. NP~>Pro
4. NP>N
A sentence (S) consists of or is
rewritten as (->) a Noun Phrase
(NP) plus a Verb Phrase (VP).
A Noun Phrase (NP) consists of a
Determiner (D) plus a Noun (N).
A Noun Phrase (NP) consists of a
Pronoun (Pro).
A Noun Phrase (NP) consists of a
Noun (N).
5. NP~> Adj+N
A Noun Phrase (NP) consists of an
Adjective (Adj) plus a Noun (N).
24


6. NP>D+Adj+N
A Noun Phrase (NP) consists of a
Determiner (D) plus an Adjective
(Adj) and plus a Noun (N).
7. VP>MV A Verb Phrase (VP) consists of a Main Verb (MV).
8. VP~>MV+Comp+Adv A Verb Phrase (VP) consists of a Main Verb (MV) Phrase plus a Complement (Comp) plus an Adverb (Adv).
9. VP>MV+Adv A Verb Phrase (VP) consists of a Main Verb (MV) plus an Adverb (Adv).
10. VP > MV+Comp A Verb Phrase (VP) consists of a Main Verb (MV) plus a Complement (Comp).
ll.MV>M+V A Main Verb (MV) consists of a Modal auxiliary (M) plus a Verb (V).
12.MV> V A Main Verb (MV) consists of a nonterminal Verb (V).
13. V>v A nonterminal Verb (V) consists of a sub-nonterminal verb (v).
14. V->be A nonterminal Verb (V) consists of a be verb (be).
> A 1 1 > A sub-nonterminal verb (v) consists of an intransitive verb (Vi).
16. v->Vl A sub-nonterminal verb (v) consists of a link verb (VI).
25


17. v>Vtm A sub-nonterminal verb (v) consists of a mid-transitive verb (Vtm).
18. v~>Vt+part I A sub-nonterminal verb (v) consists of a transitive verb (Vt) plus a particle (part).
> A 1 1 > ON A sub-nonterminal verb (v) consists of a transitive verb (Vt).
20. v-->Vi+part A sub-nonterminal verb (v) consists of an intransitive verb (Vi) plus a particle (part).
21. Comp-->NP A Complement (Comp) consists of a Noun Phrase (NP).
22. Comp~>Adv A Complement (Comp) consists of a nonterminal Adverb (Adv).
23. Comp~>Q+Adj A Complement (Comp) consists of a Qualifier (Q) plus an Adjective (Adj).
24. Comp-> Adj A Complement (Comp) consists of an Adjective (Adj).
25. Comp->NP+Q+Adj A Complement (Comp) consists of a Noun Phrase (NP) plus a Qualifier (Q) plus an Adjective (Adj).
26. Comp- > NP+Adj A Complement (Comp) consists of a Noun Phrase (NP) plus an Adjective (Adj).
27. Comp- > NP+NP A Complement (Comp) consists of a Noun Phrase (NP) Plus another Noun Phrase (NP).
26


28. Adv>advp
A nonterminal Adverb (Adv)
consists of a place adverb (advp).
29. Adv-->Q+advm
A nonterminal Adverb (Adv)
consists of a Qualifier (Q) and plus
a manner adverb (advm).
30. Adv-- > advm
A nonterminal Adverb (Adv)
consists of a manner adverb
(advm).
31. Adv> advt
A nonterminal Adverb (Adv)
consists of a time adverb (advt).
Figure 4.1 A Context-Free Grammar in BQSysl
When applying the CFG rules, we must begin at the top with the term
Sentence (S) and proceed toward the bottom, applying only one rule at a
time. That is, we rewrite only one element or constituent with each rule
application, As we progress with a derivation, we will occasionally find it
necessary to go back to a previously stated rule and apply it over and over
again. Note, for example, that Rule 2 tells us to rewrite NP as D+N. Rule
3 then instructs us to rewrite VP as MV+NP, which means we once again
have the symbol NP to convert to D+N by the application of Rule 2. Thus,
the PS rules can be said to account for the repetitive or recursive nature of
language-the linguistic fact that a noun phrase can be contained within a
verb phrase, a verb phrase can be contained within a noun phrase, and so
forth.
27


Parsing Algorithm
The algorithm is revised from Allen's book [3]. It uses a top-down
parsing method. Because a parsing can be thought of as a special case
search problem, as defined in artificial intelligence, the algorithm could be
described in terms of a searching strategy. The searching strategy used in
my system is a recursive and depth-first search. Let the current state and the
backup states be considered as one list of possible states called the
possibilities list. It is initially set to the start state of the parse, namely
S>NP+VP. Then the following steps are repeated until there is a success
or failure:
1. If the leftmost symbol in the current state is labeled as a flag, then put
the symbol into a chart stack, record the left and right positions of the
flag in the input sentence, and erase the symbol from the current state.
2. Otherwise, if all of the input sentence has been applied and the current
state is not empty and the backup states are not empty, then copy the
top backup state to the current state.
3. Otherwise, if the leftmost symbol in the current state is a terminal and
28


the correspondent word in the input sentence is defined in the
dictionary, then label it as a terminal, and erase it from the current
state.
4. Otherwise* if the leftmost symbol in the current state is a terminal and
the correspondent word in the input sentence is not defined in the
dictionary, then copy the top backup state to the current state.
5. Otherwise, if the leftmost symbol in the current state is a nonterminal,
then derive the current state, and push new derivations of states into
the backup states' stack if there are more than one derivation.
6. Otherwise,! if all of the input sentence have been applied and the
i
current state is empty, namely it is successful, then stop parsing.
7. Otherwise, if the current state is empty and all of the input sentence
have not been applied and the backup states are not empty, then copy
the top backup state in stack to the current state.
8. Otherwise, print "This sentence is invalid, or is not defined. Please
check.", and stop parsing.
Lexicon
The lexicon words are the terminals (i.e., Pro, N, D ...) in Figure 4.1.
Each of them has a set of English words which are constructed in a binary
29


tree structure under the lexicon word. Because of using binary tree structure
it is very efficient for searching. The lexicon words, binary trees, and node
i
structures are defined in Appendix B.
Test Cases
Now, using this algorithm test four cases by inputting sentences:
Case 1: it; is valid sentence syntactically speaking, and is defined in
BQSysl.
Case 2: it is valid sentence syntactically speaking, but is not defined
in BQSysl.
Case 3: it is an invalid sentence syntactically speaking.
Case 4: it is a valid sentence syntactically speaking and is defined in
BQSysl, but is nonsense.
Case 1
The input sentence is : the boy likes the dogs
Word 0: the \
[Word Category]: D [ Number ]: sing/pl
Word 1: boy
[Word Category]: N [ Countable ]: +
[ Number ]: sing [ Gender ]: masculine
Word 2: likes
[Word Category]: Vt [ Number ]: sing
[ Tense ]: v s
Word 3: the i
[Word Category]: D [ Number ]: sing/pl
30


Word 4: dogs
[Word Category]: N [ Countable ]: +
[ Number ]: pi [ Gender ]: masculine/feminine
The followings are parsing steps:
chartl. con_flag[i]. flag=NP
chart 1. con_flag[i]. from=0
chartl. con_flag[i]. to =2
chartl. con_flag[i]. flag=v
chartl.con_flag[ij.from=2
chartl. con_flag[i]. to =3
chartl. con_flag[i]. flag=V
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=MV
chart 1. con_flag[i]. from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=VP
chartl. con_flag[i]. from=2
chartl. con_flag[i]. to =3
chartl .con_flag[i]. flag=S
chartl. con_flag[i]. from=0
chartl. con_flag[i]. to =3
chartl.con_flag[i].flag=v
chartl. con_flag[i]. from=2
chartl.con_fiag[i].to =3
chartl. con_flag[i]. flag=V
chartl.con_flag[i].from=2
chartl.con_flag[i].to =3
chartl .con_flag[i].flag=MV
chart 1. con_flag[i]. from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=NP
chartl .con_flag[i], from=3
chartl.con_flag[i].to =5
31


chartl. con_flag[i]. flag=Comp
chartl. conflagti]. ffom=3
chartl.con_flag[i].to =5
chartl .con_flag[i].flag=NP
chartl. conflagti]. from=3
chartl.con_flag[i].to =5
chartl. conflagti]. flag=NP
chartl .con_flag[i].from=3
chartl.con_flag[i].to =5
chartl. conflagti]. flag=NP
chartl .con_flag[i].from=3
chartl. conflagti]. to =5
chartl. conflagti]. flag=v
chartl. conflagti]. from=2
chartl.con_flag[i].to =3
chart 1. conflagti]. flag=V
chartl. con_flag[i]. from=2
chartl. conflagti]. to =3
chartl. con_flag[i]. flag=MV
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=v
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=V
chartl. conflagti]. from=2
chartl.con_flag[i].to =3
chartl. conflagti]. flag=MV
chartl.con_flag[i].from=2
chartl.con_flag[i].to =3
chartl .con_flag[i].flag=NP
chartl. conflagti]. from=3
chartl.con_flag[i].to =5


chart 1 .con_flag[i]. flag=Comp
chart 1. con_flag[i]. from=3
chartl.con_flag[i].to =5
chart 1. con_flag[i]. flag=VP
chart 1. con_fiag[i]. from=2
chartl.con_flag[i].to =5
chartl. con_flag[i]. flag=S
chart 1. con_flag[i]. from=0
chartl. con_flag[i], to =5
Case 2
The input sentence is : Please tell me more
This sentence is invalid, or is not defined. Please check.
WordO: Please
Wordl: tell
Word2: me
Word3: more
The followings are parsing steps:
Case 3
The input sentence is : he told much you
This sentence is invalid, or is not defined. Please check.
WordO: he
Wordl: told
Word2: much
Word3: you
33


The followings are parsing steps:
chartl. con_flag[i]. flag=NP
chart 1. con_flag[i]. from=0
chartl. con_flag[i] .to =1
Case 4
The input sentence is : a book writes a dog
Word 0: a
[Word Category]: D [ Number ]: sing
Word 1: book
[Word Category]: N [ Countable ]: +
[ Number ]: sing [ Gender ]: neuter
Word 2: writes
[Word Category]: Vt [ Number ]: sing
[ Tense ]: v_s
Word 3: a
[Word Category]: D [ Number ]: sing
Word 4: dog
[Word Category]: N [ Countable ]: +
[ Number ]: sing [ Gender ]: masculine/feminine
The followings are parsing steps:
chartl. con_flag[i]. flag=NP
chartl.con_flag[i].from=0
chartl.con_flag[i].to =2
chart 1. con_flag[i]. flag=v
chartl ,con_flag[i].from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=V
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=MV
chartl. con_flag[i]. from=2
chartl. con_flag[i].to =3
34


chartl. con_flag[i]. flag=VP
chartl. con_flag[i]. from=2
chartl. con_flag[i].to =3
chartl. con_flag[i]. flag=S
chartl. con_flag[i]. from=0
chartl. con_flag[i].to =3
chartl .con_flag[i]. flag=v
chartl .con_flag[i]. from=2
chartl. con_flag[i].to =3
chartl. con_flag[i]. flag=V
chartl. con_flag[i].from=2
chartl. con_flag[i].to =3
chartl. con_flag[i]. flag=MV
chartl. con_flag[ij. from=2
chartl. con_flag[i], to =3
chartl. con_flag[i]. flag=NP
chartl. con_flag[i]. from=3
chartl. con_flag[i].to =5
chartl. con_flag[i]. flag=Comp
chartl. con_flag[i]. from=3
chartl. con_flag[i].to =5
chartl .con_flag[i] .flag=NP
chartl .con_flag[i]. from=3
chartl.con_flag[i].to =5
chartl. con_flag[i]. flag=NP
chartl. con_flag[i]. from=3
chartl.con_flag[i].to =5
chartl. con_flag[i]. flag=NP
chartl. con_flag[i]. from=3
chartl.con_flag[i].to =5
chartl. con_flag[i]. flag=v
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =3


chartl. con_flag[i]. flag=V
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =3
chart 1. con_flag[i]. flag=MV
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=v
chartl.con_flag[i].from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=V
chartl. con_flag[i]. from=2
chartl. con_flag[i] .to =3
chartl. con_:flag[i]. flag=MV
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =3
chartl. con_flag[i]. flag=NP
chartl. con_flag[i]. from=3
chartl.con_flag[ii.to =5
chartl. con_flag[i]. flag=Comp
chartl. con_flag[i]. from=3
chartl. con_flag[i]. to =5
chartl. con_flag[i]. flag=VP
chartl. con_flag[i]. from=2
chartl.con_flag[i].to =5
chartl. con_flag[i]. flag=S
chartl. con_flag[i]. from=0
chartl.con_flag[i].to =5
From Case 4, we can see that a nonsense sentence with a correct syntax
can pass a syntax parsing, so a syntax parsing is not the final phase in a
language processing. A semantic and further steps are needed to check and
interpret an input sentence.
36


CHAPTER 5
CONCLUSIONS AND OPEN QUESTIONS
Synopsis of the Thesis
The fundamental questions motivating this work are: What is Natural
Language Understanding? Why solve the Natural Language Understanding
problem? Is the Natural Language Understanding problem solvable? Why
do we need syntax parsing?
In Chapter 1, the above questions were answered (Rich and Knight [1],
Schank [2], Allen [3] and Chomsky [5]).
In Chapter 2, the major syntax parsing methods were introduced-
namely, the simple finite state diagram, recursive finite state diagrams and
context-free grammar (Allen [3], Rich and knight [1], LaPalombara [8],
Hopcroft and Ullman [7], Sudkamp [6], House and Harman [4], Lester [9],
and Sager [10]). The emphasis was on the CFG. Also, the weakness and
generative capacity of the CFG were analyzed.
In Chapter 3, an algorithm which uses a top-down CFG parser with a
chart was introduced (Allen [3]). By using this algorithm, I listed a


complete parsing procedure for an input sentence in an example.
In Chapter 4, my system, BQSysl, was introduced. The system parses
a small set of Ehglish sentences. Whether a sentence can be parsed by
BQSysl, depends on if the pattern of the sentence and words are
predefined in BQSysl. There were four test cases shown there. The most
interesting case is Case 4 in which for any nonsense, valid syntax sentence
the parser will produce a wrong answer. This indicates that we need further
steps (i.e., semantic interpreter), to solve the problem.
Open Questions
Though syntax parsing of natural language understanding has
experienced some progress in the last decade, there are many remaining
unsolved problems. In the special direction of my research, a number of
avenues for further investigation appear to be open.
My system in Chapter 4 is very effective for the simple model
sentences whose patterns are predefined. It seems to promise a very
efficient and feasible approach for more practical problems and derivations
of the CFG rules. However, any CFG cannot wholly generate an entire set
of a natural language (c.f., Hopcroft and Ullman [7]). My effort is to
improve BQSysl, apply it to some applications, and reach an 80%
38


understanding of natural language sentences in an application. If that is the
case, don't you think BQSysl is intelligent? So far, BQSysl just does
syntax parsing. Several further steps (i.e., semantic and contextual
interpreters) are needed in BQSysl to reach the 80% goal.
The open questions are:
1. How to find a method which can completely solve the Natural
Language Syntax Parsing problem?
2. How to find a method which can entirely solve the Natural
Language Understanding problem?
The study of syntax parsing methods for natural languages, an area
which is not well explored, appears to be a rich area for continuing
research.
39


APPENDIX A
A PARSING EXAMPLE
Before we start parsing the sentence The green water evaporated, some
preconditions need to be defined.
Given:
1) A simple CFG is
1. S C-.NP VP 5. NP <-ART ADJ NOUN
2. S <-NP AUX VERB 6. NP < ADJ NOUN
3. S <- NP VERB 7. VP < AUX VERB NP
4. N <~ARTNOUN 8. VP < VERB NP
Figure 3.1 Grammar 3 A simple context-free grammar
2) A lexicon:
the: ART green: ADJ, NOUN
water: NOUN, VERB evaporated: VERB
3) An input sentence:
jThe 2green 3water 4evaporated 5
The following phases consist the procedure of parsing the input
sentence by using the algorithm above.
Phase 1 (initial state and chart):


Current State Backup State Position
(S) NIL 1
1 The 2 green 3 water 4 evaporated 5
Phase 2:
Current State Backup State Position
(NP VP ) 1
(NP AUX VERB ) 1
(NPVERB ) 1
1 The 2 green 3 water 4 evaporated 5
Phase 3:
Current State Backup State Position
(ART NOUN VP ) 1
(ART ADJ NOUN VP ) 1
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NP VERB ) 1
1 The 2 green 3 water 4 evaporated 5
Phase 4:
Current State Backup State Position
(NOUN VP ) 2
(ART ADJ NOUN VP ) 1
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NP VERB ) 1
ART
1 The 2 green 3 water 4 evaporated
Actions
Step 4
Actions
Step 4
Actions
Step 1,3
Actions
Step 1, 3
41


Phase 5:
Current State Backup State Position
( VP ) 3
(ART ADJ NOUN VP ) 1
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NP VERB ) i
ART NOUN
1 The 2 green 3 water 4 evaporated
Phase 6:
Current State Backup State Position
(VP) 3
(ART ADJ NOUN VP ) 1
' (ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NP VERB ) i
NP 1
ART NOUN
1 The 2 green 3 water 4 evaporated
Phase 7:
Current State Backup State Position
(AUX VERB NP ) 3
(VERBNP ) 3
(ART ADJ NOUN VP ) 1
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NP VERB ) 1
Actions
Step 1,2
Actions
Step 4
Actions
Step 1, 3
failed
5
42


NP 1
ART NOUN
1 The 2 green 3 water 4 evaporated
Phase 8:
Current State Backup State Position
(VERB NP ) 3
(ART ADJ NOUN VP ) 1
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NPVERB ) 1
NP 1
ART NOUN
1 The 2 green 3 water 4 evaporated
Phase 9:
Current State Backup State Position
(ART ADJ NOUN VP ) 1
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NPVERB ) 1
I 1
1 The 2 green 3 water 4 evaporated 5
Phase 10:
Current State
( VP ),
Backup State Position
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NPVERB ) i
Actions
Step 1, 3
failed
5
Actions
Step 1, 3,
1,3, 1,3
Actions
Step 1, 2
43


ART ADJ NOUN
1 The 2 green 3 water 4 evaporated 5
Phase 11: 1 i Current State Backup State Position
(VP) 4
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NPVERB ) 1
NP
ART ,ADJ NOUN
1 The 2 green 3 water 4 evaporated
Phase 12:
Current State Backup State
(AUX VERB NP )
I
Position
4
(VERBNP ) 4
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
! (NPVERB ) i
NP
ART ADJ NOUN
1 The 2 green 3 water 4 evaporated
Phase 13:
Current State Backup State Position
(VERB NP ) 4
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NPVERB ) 1
Actions
Step 4
Actions
Step 1,3
failed
5
Actions
Step 1, 3
44


NP
ART ADJ NOUN
1 The 2 green 3 water 4 evaporated
Phase 14:
Current State Backup State Position
(NP ) 5
(ADJ NOUN VP ) 1
(NPAUXVERB ) 1
(NP VERB ) 1
NP
ART ADJ NOUN VERB
1 The 2 green 3 water 4 evaporated
Phase 15:
Current State Backup State Position
(ARTNOUN ) 5
(ART ADJNOUN ) 5
(ADJ NOUN )5
(ADJ NOUN VP ) 1
(NPAUXVERB ) 1
(NPVERB ) 1
NP
ART ADJ NOUN VERB
1 The 2 green 3 water 4 evaporated
Actions
Step 4
Actions
Step 1, 3
failed
5
45


Phase 16:
Current State Backup State Position Actions
(ARTADJNOUN ) 5 Step 1,3
failed
5
(ADJ NOUN )5
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NP VERB ) i
NP
ART ADJ NOUN VERB
1 The 2 green 3 water 4 evaporated
Phase 17:
Current State Backup State
(ADJ NOUN )
Position Actions
5 Step 1,3
failed
5
(ADJ NOUN VP ) 1
(NP AUX VERB ) 1
(NP VERB ) i
NP
ART ADJ NOUN VERB
1 The 2 green 3 water 4 evaporated
Phase 18:
Current State Backup State Position Actions
(ADJ NOUN VP ) 1 Step 1, 3 failed

(NP AUX VERB ) 1
(NP VERB ) 1
1 The 2 green 3 water 4 evaporated 5
46


Phase 19:
Current State Backup State
(NP AUX VERB )
(NPVERB )
Position
1
1
1 The 2 green 3 water 4 evaporated 5
Phase 20:
Current State Backup State Position
(ART NOUN ; AUX VERB ) 1
(ART ADJ NOUN AUX VERB ) 1
(ADJ NOUN AUX VERB ) 1
(NPVERB ) 1
1 The 2 green 3 water 4 evaporated 5
Phase 21:
Current State Backup State Position
( AUX VERB ) 3
! (ART ADJ NOUN AUX VERB ) 1
(ADJ NOUN AUX VERB ) 1
(NPVERB ) 1
ART NOUN
1 The 2 green 3 water 4 evaporated
Actions
Step 4
Actions
Step 1, 3,
1.3
Actions
Step 1, 2
47


Phase 22:
Current State Backup State
(AUX VERB )
Position
3
(ART ADJ NOUN AUX VERB ) 1
(ADJ NOUN AUX VERB ) 1
(NP VERB ) 1
ART NOUN
1 The 2 green 3 water 4 evaporated 5
Phase 23:
Current State Backup State Position
(ART ADJ NOUN AUX VERB ) i
(ADJ NOUN AUX VERB ) i
(NP VERB ) i

1 The 2 green 3 water 4 evaporated 5
Phase 24:
Current State Backup State Position
( AUX VERB ) 4
(ADJ NOUN AUX VERB ) i
, (NP VERB ) 1
ART ADJ NOUN
1 The 2 green 3 water 4 evaporated 5
Phase 25:
Current State Backup State Position
(ADJ NOUN AUX VERB ) 1
(NP VERB ) 1
Actions
Step 1, 3
failed
5
Actions
Step 1, 3,
1, 3, 1,3
Actions
Step 1, 2
Actions
Step 1, 3
failed
5
48


1 The 2 green 3 water 4 evaporated 5
Phase 26:
Current State Backup State Position
(NPVERB ) 1
1 The 2 green 3 water 4 evaporated 5
Phase 27:
Current State Backup State Position
(ART NOUN ! VERB ) 1
(ART ADJ NOUN VERB ) 1
(ADJ NOUN VERB ) 1
I
1 The 2 green 3 water 4 evaporated 5
Phase 28:
Current State Backup State Position
( VERB ) 3
' (ART ADJ NOUN VERB ) I
. (ADJ NOUN VERB ) 1
ART NOUN
1 The i 2 green 3 water 4 evaporated
Actions
Step 4
Actions
Step 1, 3,
1,3
Actions
Step 1,2
49


Phase 29:
Current State Backup State Position
(VERB ) 3
(ART ADJ NOUN VERB ) 1
(ADJ NOUN VERB ) I
NP 1
ART NOUN
1 The 2 green 3 water 4 evaporated
Phase 30:
Cunent State Backup State Position
() , 4
(ART ADI NOUN VERB ) 1
(ADJ NOUN VERB ) 1
NP 1
ART NOUN VERB
1 The 2 green 3 water 4 evaporated
Phase 31:
Current State
o
S
NP 1
ART NOUN VERB
1 The 2 green 3 water 4 evaporated
Phase 32:
Backup State Position
4
(ART ADJ NOUN VERB ) 1
(ADJ NOUN VERB ) 1
Actions
Step 1, 3
Actions
Step 1,2
Actions
Step 1
failed
50


Current State Backup State
(ART ADJ NOUN VERB )
Position Actions
1
(ADJ NOUN VERB ) 1
Step 1, 3, 1
3,1,3
1 The 2 green 3 water 4 evaporated 5
Phase 33:
Current State ( VERB ) Backup State Position Actions 4 Step 1, 2 (ADJ NOUN VERB ) 1
ART ADJ NOUN
1 The 2 green 3 water 4 evaporated 5
Phase 34:
Current State i Backup State Position Actions
(VERB ) 4 Step 1, 3
(ADJ NOUN VERB ) 1
NP
ART ADJ NOUN
1 The 2 green 3 water 4 evaporated 5
Phase 35:
Current State Backup State Position Actions
() 5 Step 1, 2
1 (ADJ NOUN VERB ) 1
NP
ART ADJ NOUN VERB
1 The 2 green 3 water 4 evaporated
51


Phase 36:
Current State Backup State Position Actions
0 5 Step 1
success
(ADJ NOUN VERB ) 1
s
NP
ART ADJ NOUN VERB
1 The 2 green 3 water 4 evaporated
52


APPENDIX B
LEXICON WORDS
The lexicon words are:
D: stands for determiners.
N: stands for nouns.
Vtm: stands for mid-transitive verbs.
Vi: stands for intransitive verbs.
Vt: stands for transitive verbs,
be: stands for be verbs.
VI: stands for link verbs.
M: stands for modal auxiliaries,
part: stands for particles.
Adj: stands for adjectives.
Advp: stands for place adverbs.
Advt: stands for time adverbs.
Advm: stands for manner adverbs.
Q: stands for qualifiers.
Pro: stands for pronouns.
The followings are the binary trees and node structures of the lexicon
words:


D: a, an, the, some, several, any, every
1 2 3 4
a an any every
an
struct determiner
{
struct determiner *left;
struct determiner *right;
char word[WORDLENGTH];
char number[NUMBERLENGTH];
}D[NUMBEROFDETERMINERS];
54
5 6
several some
7
the
some
several
the


N: book, books, boy, boys, chaos, dog, dogs, health, man, men, riches
1 2345 6789 10
book books boy boys chaos dogs health man men
book chaos dogs
struct noun
{
struct norm *left;
struct noun *right;
char word[WORDLENGTH];
char countable[COUNTABLE];
char numberfNUMBERLENGTH];
char gender[GENDERLENGTH];
char animate[AN!MATELENGTH];
char case[CASELENGTH];
}N[NUMBEROFNOUNS];
11
riches
riches
55


Vtm: cost, costed, costing, costs, had, has, have, having, total, totaled, totaling, totals, weight, weighted, weighting, weights
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
costing costs ^ has have having total totaled totaling totals weight weighted weighting weights
weights
struct midtransitiveverb
{
struct midtransitiveverb *left;
struct midtransitiveverb *right;
char word[WORDLENGTH];
char number[NUMBEELENGTH];
char tense[TENSELENGTH];
} Vtm[NUMBEROFMIDTRANSITIVEVERBS];
56


/i: fall, falls, fallen, falling, fell, flew, flies, flown, fly, flying, ran, run, fanning, runs, walk, walked, walking, wall
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
fall falls fallen falling fell flew flies flown fly flying ran run running runs walk walked walking walks
flown flying walks
struct intransitiveverb
{
struct intransitiveverb *left;
struct intransitiveverb *right;
char word[WORDLENGTH];
char number[NUMBERLENGTH];
char tense[TENSELENGTH];
} Vi[NUMBEROFINTRANSmVEVERBS];
57


rt: consider, considered, considering, considers, gave, give, given, gives, giving, like, liked, likes, liking, made, make, makes, making, take, takes, taking, took, write, writes, writing, written, wrote
9 10 u 12 13
14
15
16
17
18
19
20
21 22 23 24
25
26
consider considered -
considers gave give given gives giving like liked likes liking made, make makes making take takes taking took write writes writing written wrote
consider
makes
writing
wrote
struct transitiveverb
{
struct transitiveverb *left;
struct transitiveverb *right;
char word[WORDLENGTH];
char number[NUMBERLENGTH];
char tenselTENSELENGTH];
} Vt[NUMBEROFTRANSITIVEVERBS];
58


be: is, are, was, were, be, being, been
struct beverb
{
struct beverb *left;
struct beverb *right;
char word[WORDLENGTH];
char number[NUMBERLENGTH];
char tense[TENSELENGTH];
} be[NUMBEROFBES];
59


VL appear, appeared, appearing, appears, feel, feeling, feels, felt, remain, remained, remaining, remains, seem, seemed, seeming, seems
1 2 3 4 s 6 7 8 9 10 - 11 12 13 14 15 16
appear appeared appearing appears feel feeling- feels felt remain remained remaining remains seem seemed seeming seems
appear
appearing
feel feels remain remaining
seeming
struct linkverb
{
struct linkverb *left;
struct linkverb *right;
char word[WORDLENGTH];
char number[NUMBERLENGTH];
char tense[TENSELENGTH];
} Vl[NUMBEROFLINKVERBS];
60


M: can, could, may, might, must, ought to, will, would
1 2 3 4
can could may might
5 6 78
must ought to would
could
ought to
can
may
must
will
would
struct modal
^ struct modal *left;
struct modal *right;
char wordfWORDLENGTH];
char tense[TENSELENGTH];
} MlNUMBEROFMODALS];
61


part: in, out, up, down, on, off, over, with
1 2 3 4 5 6 7 8
down in off on out over up with
with
struct particle
struct particle *left;
struct particle *right;
char word[WORDLENGTH];
} part[NUMBEROFPARTICLES];
62


Adj: as, bad, beautiful, good, in the pink, out of sorts
1 2 3 4
5
6
as bad beautiful good in the pink out of sorts
bad
in the pink
as
good
out of sorts
struct adjective
struct adjective *left;
struct adjective *right;
char word[WORDLENGTH];
} Adj [NUMBEROFADJECTTVES];
63


advp: at the store, here. In the room, there
1 2
3 4
at the store here in the room
at the store
in the room
there
there
struct adverbplace
struct adverbplace *left;
struct adverbplace *right;
char word[WORDLENGTH];
} advp [NUMBEROFADVERBPLACES];
64


advt: now, then, often, in the morning, during recess
during recess
then
struct adervtime
{
struct adverbtime *left;
struct adverbtime *right;
char word[WORDLENGTH];
} advt[NUMBEROFADVERBTIMES];
65


advm: quickly, rapidly, in a minute, on the double
1 2 3 4
in a minute on the double quickly rapidly
in a minute quickly rapidly
struct adverbmanner
{
struct adverbmanner *left;
struct adverbmanner *right;
char word[WORDLENGTH];
} advm[NUMBEROFADVERBMANNERS];
66


9: quite, very, more, much, less,
absolutely
2
extremely
3
less
extremely
absolutely
less
struct qualifier
struct qualifier *left;
struct qualifier *right;
char word[WORDLEN GTH];
} Q[NUMBEROFQUALIFIERS];
67
extremely, absolutely
4 5 6 7
more much quite very
much
very


Pro: he, hei; hers, him, his, I, It, Its, me, mine, my, our, ours, she, their, theirs, them they, us, wo, who, whom, whine, you, your, yours
123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
he her hers him his I it Its me mine my our ours she their theirs them they us we who whom whose you your yours
struct pronoun
{
struct pronoun *left;
struct pronoun *right;
char number_person[NUMBERPERSONLENGTH];
char gender[GENDERLENGTH];
char case[CASELENGTH];
} Pro[NUMBEROFPRONOUNS];
68


BIBLIOGRAPHY
[1] Elaine Rich and Kevin Knight. 1983. Artificial Intelligence: 14-15. Second
Edition. New York, N.Y.: McGraw-Hill, Inc.
[2] Roger C. Schank. 1985. The Cognitive Computer On Language, Learning, and
Artificial Intelligence: 1-2. Second Printing. Reading, Massachusetts: Addison-
Wesley Publishing Company, Inc.
[3] James Allen. 1988. Natural Language Understanding: 1-6. Menlo Park, CA: The
Benjamin/CummingsPublishing Company, Inc.
[4] Homer C. House and Susan Emolyn Harman. 1950. Descriptive English
Grammar. Revised by Susan Emolyn Harman. Second Edition. Englewood Cliffs,
N.J.: Prentice-Hall, Inc.
[5] Chomsky, N. 1971. Chomsky: Selected Readings: 1-3. Edited by J.P.B Allen and
Paul Van Buren. London: Oxford University Press.
[6] Thomas A. Sudkamp. Languages and Machines, An Introduction to the Theory of
Computer Science. Addison-Wesley Publishing Company, Inc.
[7] John E. Hopcroft and Jeffrey D. Ullman. 1979. Introduction To Automata
Theory, Languages, and Computation: 2, 4-6. Reading, Massachusetts: Addison-
Wesley Publishing Company, Inc.


[8] Lyda E. LaPalombara. 1976. An Introduction to Grammar, Traditional,
Structural, Transformational: 17-23. Cambrige, Massachusetts: Winthrop
Publishers, Inc.
[9] Mark Lester. 1971. Introductory Transformational Grammar of English: 1-3.
New York, N. J.: Holt, Rinehart and Winston, Inc.
[10] Naomi Sager. 1981. Natural Language Information Processing, A Computer
Grammar of English and Its Applications: 1-2. Reading, Massachusetts: Addison-
Wesley Publishing Company, Inc.
[11] Randolph Quirk and Sidney Greenbaum. 1973. A Concise Grammar of
Contemporary English: 1-7. New York, N.Y.: Harcourt Brace Jovanovich, Inc.
[12] Randolph Quirk, Sidney Greenbaum, and Jan Svartvik. 1972. A Grammar of
Contemporary English: 1-7. New York, N.Y.: Seminar Press New York and
London, a Division of Harcourt Brace Jovanovich, Inc.
70