Citation
Subjective Bayesian inference network

Material Information

Title:
Subjective Bayesian inference network
Creator:
Messegee, Lori Ann
Publication Date:
Language:
English
Physical Description:
xi, 66 leaves : illustrations ; 29 cm

Subjects

Subjects / Keywords:
Expert systems (Computer science) -- Design ( lcsh )
Bayesian statistical decision theory ( lcsh )
Bayesian statistical decision theory ( fast )
Expert systems (Computer science) -- Design ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references.
General Note:
Submitted in partial fulfillment of the requirements for the degree, Master of Science, Department of Electrical Engineering.
Statement of Responsibility:
by Lori Ann Messegee.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
26794130 ( OCLC )
ocm26794130
Classification:
LD1190.E54 1992m .M47 ( lcc )

Full Text
SUBJECTIVE BAYESIAN INFERENCE NETWORK
1 by
Lori Ann Messegee
B.S., South Dakota School of Mines and Technology, 1988
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado at Denver
in partial fulfillment
of the requirement for the degree of
Master of Science

Electrical Engineering
1992


This thesis for the Master of Science
degree by
Lori Ann Messegee
has been approved for the
Department of
Electrical Engineering
by
Date ZWaTi.


Messegee,, Lori Ann (M.S., Electrical Engineering)
;ii
Subjective' Bayesian Inference Network
Thesis directed by Associate Professor William J. Wolfe
'i
jThis paper details the building and analysis of an
inference jnetwork in Lisp. The system utilizes probabilities to
j
weight the inputs and rules, and subjective Bayesian inference
'i1
to propagate the probabilities through the network.
i, Uncertain information is used as the inputs, which
are then Summarized into abstractions. At the opposite end of
the network, problem reduction is utilized to break the output
down into! simpler, smaller problems. These two sets are then
I
combined to form the final network.
i
:, The particular problem utilized to illustrate
subjective, Bayesian inference networks is the Traveller's
Restaurant1 .Selection Problem, which predicts overall food
quality from observable qualities. The knowledge gained
through this example can easily be applied to a large group of
general design projects.
|
This abstract accurately represents the content of the
:i
candidate's thesis. I recommend its publication.
; Signed


DEDICATION
to my parents
and
husband


I '
I
i
ACKNOWLEDGMENTS
!
i !
I Thank you to Hughes Aircraft Company for
financial assistance and support.
i


CONTENTS
I
i'
Tables ^.............................................
Figures .j:............................................
CHAPTER
i 1
1. INTRODUCTION..................................
Topic Selection Process ......................
Abstraction and Problem Reduction.............
I
Inference vs. Back Propagation or Rule-Based
Systems ......................................
Subjective Bayesian Inference Networks . .
[
Probability ..............................
Bayes' Rule...............................
;|
! Odds and Bayes' Rule .....................
! Updating Probabilities....................
; Traveller's Restaurant Selection Problem
2. INFERENCE NETWORK #1 .........................
i'
System Behavior and Analysis..................
i
I
Behavior at a Node............................
Indepth Case Analysis.........................
:! !
Filial Version................................
xm
1
1
3
5
8
9
1 1
16
20
27
31
32
35
40
42


I
: I
I ,
3. KNOWLEDGE ENGINEERING..............................4 6
I
Inference Network #2...............................4 6
i"
' i
4. COMPARING INFERENCE NETWORKS #\ AND #2 .... 5 0
Different Desired Behavior.........................5 0
5. SUMMARY............................................5 4
'!
APPENDIX..................................................5 5
A. LISP Code for Traveler's Restaurant Selection
Pr'dblem ..........................................5 5
,i
REFERENCES................................................5 6
I
i
I
I
I
|
i
I
I
I
I
l
I
i Vll
I '
Vll


TABLES
Table ;
1.1 Inference rules for propositional calculus and
ipossibilistic and probabilistic logic....................7
i
2.1 Current probability inputs for four restaurants .33
I
2.2 Interior node values and outputs for four
restaurants.................................................3 3
! I
2.3 Revised current probability inputs for four
Restaurants.................................................3 4
i
i
2.4 Interior node values and outputs for four
restaurants using revised current probability
inputs .....................................................3 5
,.! i
2.5 Interior node values and outputs for four
restaurants using revised program.....................4 3
t
2.6 Current probability inputs for four submarine
sandwich restaurants........................................4 4
i
I
2.7 Interior node values and outputs for four
submarine sandwich restaurants..............................4 4
\ I
3.1 Expert #2's current probability inputs for three
Italian restaurants.........................................4 8
l
3.2 Expert #2's interior node values and outputs for
three Italian restaurants...................................4 9
4.1 Expert #2's current probability inputs for four
restaurants.................................................5 1 I
I
I


4.2 Expert #2's interior node values and four restaurants outputs for . 52
4.3 Expert #2's current probability inputs submarine sandwich restaurants . for . 52
4.4 Expert #2's interior node values and Submarine sandwich restaurants . outputs for . 53
I
I,
. I
I
IX


I
FIGURES
I
Figure
1.1 Food poisoning and the possible symptoms....................14
1.2 Sufficiency relationship between E and H .................17
I
1.3 E is sufficient evidence to assume -H ....................18
1.4 Possible hypotheses, given the evidence flour ... 19
1.5 v^rc in an inference network, labelled with the
sufficiency and necessity coefficients.....................2 0
1.6 Inference with uncertain evidence E' ......................2 0
'i;
1.7 lA. linear interpolation for computing P(H|E') from
if(E|E') ..................................................21
1.8 Inconsistencies in prior probabilities for E and H .22
1.9 Piecewise-linear function with dead zone for
probability updating ..................................2 3
1.10 "Sufficiency-only" updating function...................24
1.11 "Necessity-only" updating function.....................2 5
1.12 Interpolation function used in the MYCIN system
where Pt(E) = P(E) + t [1 P(E)], with t = 0.2........2 6
1.13 Piecewise-linear function for updating the
probability of a hypothesis............................2 7
1.14 Probabilistic inference network for the
Traveller's Restaurant Selection Problem...............3 0
I
I


I
2.1 The Popularity node and its inputs, Sounds and
Clientele...............................................3 6
l l
2.2 The resulting values of the Popularity node with
Input Scenario #1: Sounds and Clientele vary
together from 0 to 1...........................3 7
2.3 The resulting values of the Popularity node with
Input Scenario #2: Sounds varies from 0 to 1,
while Clientele varies from 1 to 0......................3 8
2.4 The resulting values of the Popularity node with
Input Scenarios #3, #4, and #5: Sounds varies
from 0 to 1, while Clientele is held constant at 0.5,
0.3 and 0.8, respectively...............................3 9
I
I
I I
I
I
I
I
I .
XI


CHAPTER 1
INTRODUCTION
An expert system can be defined as a computer
I
program for capturing valuable knowledge and delivering it at
the point of decision making [1]. Humans excel in knowledge
processing, the analysis of information to make decisions.
Expert systems mimic the human reasoning process, applying
knowledge to analyze information to make decisions. The
emergence!' of expert systems as one of the major areas of
I '
activity iii; information science and technology is providing a
strong incentive for the development of theories of
approximate reasoning as a basis for representingand
reasoning withexpert knowledge [2].
Topic Selection Process
| With a master's degree curriculum greatly involved
in artificial intelligence courses, expert systems are an ideal
thesis topic for research and development. The first step was
choosing an appropriate problem to solve with an expert
.i.
system.
.1 '
I


, The first candidate for the expert system was a
cloud-recognition tool. The data were in many different
formats and extremely raw. This would be an intriguing
problem, but would have required much data manipulation
before the work on the actual expert system could commence.
This project also seemed better suited to a neural network.
I The second idea was an expert system to trouble
shoot remote racks of equipment to diagnose problems. This
i l
project did not require an expert system, though, just a
troubleshooting algorithm.
| The third idea consisted of an expert system to
i
diagnose problems with nuclear reactors. At first, this topic
appeared to require an extreme amount of expertise and would
provide an ideal topic for an expert system. But further
investigation showed that the rules are extremely rigid. The
operator is an expert at controlling the nuclear reactors. But
this expertise is not gained through years and years of
experience j and "gut feelings." Rather, it is knowledge of the
strict rules; that govern the operation of a nuclear reactor. In
other words, every operator would create exactly the same
expert system to perform the job. This project quickly slips out
i :
of the realm of interesting expert systems.


I
The fourth candidate was a joint effort with two
people who are pursuing an expert system to schedule time
and resources for a scientific spacecraft. This optimization
i
problem is NP-complete and also appeared to be better suited
for a neural network.
University of Colorado at Denver Assistant
Professor Wolfe proposed coding and analyzing an inference
network from Tanimoto's book The Elements of Artificial
l
Intelligence [3]. The design concepts of this example problem
can be applied to many other problems such as automobile
diagnosis,' weather prediction, medical diagnosis, and stock
market predictions. Implementation and analysis of this class
of inference networks is the topic of this thesis.
Abstraction and Problem Reduction
After the introduction of subjective Bayesian
r
i
inference networks, an example problem called the Traveller's
1 r
Restaurant; Selection Problem is coded and analyzed. Although
l
the subject matter may be considered trivial, this problem
provides a: very good means of learning the underlying design
principles ; of inference networks. One of the nicest aspects
about this problem is that anyone can be the "expert," which
lends to the ease of understanding the network behavior.
I '
3


I
With the chosen problem, the inputs and output are
! |
given. This corresponds to real life problems, in which the
available information is often limited to a set of sensor
readings or incomplete information. Any system must be
created with the intended purpose in mind and the major goals
clearly established and understood so that the implementation
is realistic [41.
I
!(
I '
j
i
b Some highly technical fields contain a great deal of
noisy or fuzzy information. The difference between noisy and
I |
fuzzy information can be defined as follows: noise is
uncertainty; in what the expert observes, whereas, fuzzy
describes -the expert's uncertainty in how to rate or classify
what he has seen. Noise is introduced when the expert is
unable to measure a quality with an adequate degree of
precision. Fuzziness can be introduced when the quality is
linguistic and the possible data values are selected from a list
of descriptors [5]. In the network to follow, the input nodes
and their rating scale are well defined, thus eliminating the
fuzziness of the information. This information is used as
II
j
inputs, which form the first layer of the network. The inputs
are then summarized into abstractions, which will form the
second layer of the network. Since the abstracted nodes are
evaluated from several inputs, the integration reduces the
effects of noise.
]
4


' On the other end of the network, the output forms
the fourth1 layer. As in most design problems, the problem
reduction! stage breaks the desired output down to smaller
problems.j In this case, the specific characteristics that directly
affect the output are determined, as well as the extent of their
I !
individual importance to the output. These smaller problems
form the third layer of the network.
After inputs are abstracted and the output is
reduced to simpler problems, these two sets of nodes must be
linked to form the network. This inference network consists of
four layers, but the designer may use any number of
intermediate layers, depending on the problem. If the first two
steps of abstraction and problem reduction were properly
iJ
performed, the interior layers of nodes should match up fairly
well. 1
Inference vs. Back Propagation or Rule-Based Systems
Inference networks are similar to back propagation
networks [6] [7], in that they both utilize intermediate layers of
nodes between the inputs and the outputs of the network. But
the intermediate nodes of the back propagation system have no
special meanings attached to them. The system is "trained"


using malny input/output scenarios, and the value of the
connecting arcs are adjusted via a process of gradient descent
to an error function until the desired behavior is achieved. By
contrast, the practice of attaching particular meanings to each
node of the inference network gives the expert and the
knowledge engineer a very straight-forward method of
I
localizing the values at the intermediate nodes to perform
troubleshooting.
i
i
i1
]
l Inference networks are also closely related to rule-
based systems. Each rule in a rule-based system has a
confidence factor associated with it. In the inference network,
probability theory takes the role of the confidence factors.
Thus, probability theory provides a stronger mathematical
foundation for the inference network than the ad hoc
,1
confidence factor provides for the rule-based system. Each
probability of the inference network provides information that
I
I
may be stated in an explicit rule format:
P(H|E) = X means
IF: E is known to be true
THEN: conclude that H is true with probability X.
,1 Information may be propagated through inference
networks using fuzzy-set theory or subjective Bayesian
6


updating functions. Fuzzy inference rules propagate
probabilities by taking several probabilities as inputs, and
generating one new probability as the output. A set of fuzzy
inference rules, referred to as possibilistic or probabilistic logic,
is used ty) simulate propositional calculus because its behavior
is similar l to that of intuition [8]. REVEAL and FLOPS are
examples: which are currently commercially marketed [9].
Fuzzy-set1 theory extends the operations of Boolean algebra to
cover fractional truth values between 0 (false) and 1 (true).
See Table 1.1: Inference rules for propositional calculus and
i
possibilistic logic, for the relationship between the two. Note
that the possibilistic and probabilistic logic rules for A B are
xor(a,b) = max(min(a,l-b),min(l-a,b)), and Xor(a,b) = a + b -
2ab + a^b + ab^ a^b^, respectively.
A B i , -A AA B AvB A->B A@B
a br 1 -a min(a,b) max(a,b) max(l-a,b) xor(a,b)
a b, 1 -a ab a + b ab 1 a + ab Xor(a,b)
Table 1.1;; Inference rules for propositional calculus and
possibilistic and probabilistic logic.
; The more appealing calculus of uncertainty,
subjective Bayesian updating, was utilized for the project to
follow. The advantages and disadvantages of this approach will
unfold as the paper progresses.
i;
I,
I!
7


I
As noted above, this problem encompasses a large
group of general design projects. The knowledge gained in the
I
building and refining of this project is easily transferred to
many areas.
| :
j Expert #1 will be myself. The majority of system
analysis will be performed during the learning curve as
Inference Network #1 reaches the desired performance. Expert
#2 will provide his opinions for Inference Network #2. An
explanation of expert systems, neural networks, Bayes' rule,
and the objective of this particular inference network will be
provided to Expert #2 before the knowledge acquisition portion
of the project begins. Networks #1 and §2 will then be
compared j to one another.
1 Subjective Bayesian Inference Networks
The project is to build an expert system that draws
conclusions from inexact or incomplete information. The
uncertainty of the inputs prevents the system from being a set
of hard and fast logic rules. Often, the logic rules themselves
cannot be [ formulated in a strict manner.
8


I
I
I'
The inference network was selected to model
general decision-making in a practical, yet mathematically
meaningful way. This system utilizes probabilistic knowledge
i
to take the uncertainties into account and produce a logical
I ,
hypothesis similar to that an expert's formal criteria and
intuition j would produce.
' i'
I
The expert provides his degree of belief or
uncertainty in the inputs, the hypotheses, and the relationships
between the two categories. These are then represented as
probabilities and linked together via Bayes' rule and a
subjective form of Bayes' rule. These concepts will be further
'i
discussed in the following chapter.
i
u
Probability
i
i
i '
i
Probability ranges from 0 to 1. A value close to 0
means that a particular outcome is not very likely; a value
close to 0.5 indicates that the outcome has equal chances of
occurring ;or not occurring; and a value close to 1 indicates that
the particular outcome is very likely to occur. Laplace's work
i I
lead to representation of probability as the following ratio:
i .
probability = (# of desired outcomes)/(total # of outcomes)
9


For example, the probability of drawing an ace from
]:
a normal deck of playing cards is the number of aces (4)
divided by the total number of cards (52), which is 4/52 or
1/13. This assumes that each of the 52 cards has an equal
chance of being drawn.
i'1
Probability follows two rules, the Additive Law and
i
the Multiplicative Law, shown below:
11
i
i
i ,
Additive Law: P(A v B) = P(A) + P(B) P(A A B)
Multiplicative Law: P(A A B) = P(A) x P(B|A) = P(B) x P(A|B) .
The notation P(A) denotes the probability of A
occurring,' under any circumstances. The P(B|A) denotes the
I
probability of B being true, given that A is true. If A and B are
I i
statistically independent, the Additive Law reduces to the
i
following;:
P(A v B) = P(A) + P(B) .
s' 1
.1 '
In the network to follow, all inputs are assumed to
| '
be statistically independent.
10


Baves1 Rule
11 There is a great deal of uncertainty in the inputs
and the rules of this system. Degrees of belief are treated as
probabilities. Probabilities are applied to the inputs and to the
decision-making within the network. One method of
approaching this task is through Bayes' rule, which is
commonly applied to the exercise of computing conclusions
from premises. There are two types of information: a) general
knowledge about any case, and b) specific knowledge about a
particular: case. In this network, general knowledge is
i
referred to as "prior probability" and specific knowledge is
referred to as "current probability." In Bayes' rule, listed
below, H is the hypothesis and E is the evidence.
where
P(H|E) = [P(E|H) P(H)] / P(E)
P(E) = P(E|H) P(H) + P(E|-H) P(-H) .
Bayes' rule states that the probability of the
hypothesis H being true, given that the evidence E is true, is
equal to the ratio of the probability that the evidence and the
hypothesis are true, divided by the probability that the
evidence is true, whether or not the hypothesis is true. This
I
way, the network can be adjusted via previous knowledge
about the probability of an event.
I
I
1 1


I
I
I '
Bayes' rule can better be described using an
example of a fictional medical-diagnosis problem. The
objective j of this problem is to determine the probability that
Jane has jfood poisoning, given that she has a symptom of food
poisoning: severe stomach ache. It is assumed that two types
of information, mentioned above, are available to compute the
probabilities for our analysis. The first is general knowledge:
(1) the probability that a person has food poisoning regardless
of any symptom, (2) the probability that a person has a
symptom l of severe stomach ache, given that they have food
|:
poisoning!, (3) the probability that a person has severe stomach
ache, given that they do not have food poisoning. The second
type of information is the information particular to the person,
in this case, Jane: that she has the symptom. Using the symbols
H for hypothesis and E for evidence:
H =' Jane has food poisoning, and
I i
I '
E =;' Jane has a severe stomach ache.
I
Therefore, the general knowledge of the problem is:
i
1. £(H) = the probability that someone has food
poisoning. 2
2. P(E|H) = the probability that a person has a severe
[ stomach ache, given that she has food
1 poisoning.
I
12


3. ]P(E|-H) = the probability that a person has a stomach
:j ache given that she does not have food
poisoning.
! The specific knowledge is: Jane has the symptom of
*
I ,
severe stomach ache.
i
; The conclusion to be determined is P(H|E), the
probability that Jane has food poisoning based on the fact that
she has a' severe stomach ache. This is obtained using Bayes'
i
rule: !
I
i
i
.]
where
P(H|E) = [P(E|H) P(H)] / P(E)
P(E) = P(E|H) P(H) + P(E|-H) P(-H) .
! For our problem, this is interpreted as saying the
i.;
probability that Jane has food poisoning, given that she has a
severe stomach ache, is equal to the ratio of the probability
that she has both the severe stomach ache and food poisoning,
to the probability that someone has a severe stomach ache,
i
whether or not he has food poisoning. The probability of
having a [severe stomach ache is given as the sum of the
i '
conditional probabilities of having a severe stomach ache, given
I,
food poisoning or given no food poisoning, weighted by the
I'
probability of food poisoning and not food poisoning,
respectively. If the general knowledge was as follows:
I
13


I
P(H) = 0.0002 P(E|H) = 0.8 P(E|-H) = 0.16 .
Then, P(E) = (0.8) (0.0002) + (0.16) (0.9998) = 0.160128, and
P(H|E) =! (0.8) (0.0002) / (0.160128) = 0.0009992. Therefore,
!
Jane has i a 0.09992 percent chance of having food poisoning,
given she has a severe stomach ache.
The pie chart of Figure 1.1: Food poisoning and the
possible symptoms, gives a graphical illustration of Bayes' rule.
CZ3 B =f B1 + B2 + B3 + B4
CZD A = A1 + A2 + A3 + A4
B1 = stomach aches
B2 = fevers
B3 = headaches
B4 = vomitting
A1 = food poisoning
with stomach aches
A2 = food poisoning
with fevers
A3 = food poisoning
with headaches
A4 = food poisoning
with vomitting
Figure 1.1: Food poisoning and the possible symptoms.
The; following discussion is with respect to the pie chart
of Figure 1.1 [10]. The outer circle represents all symptoms or
evidence !leading to the conclusion of food poisoning. The
14


I ;
,j 1'
darker shaded area in the center of the circle represents the
hypotheses of food poisoning. Other hypotheses for these
symptoms! are not shown on this chart. Although, this chart is
not drawn to a realistic scale, the areas of the pie chart
represent percentages. Now to illustrate Bayes' rule, P(H|E) will
i,;
be determined with the evidence E being a stomach ache and
|.
the hypothesis H being food poisoning. The probabilities are
defined as follows:
P(H) = A
P(?|H) = A1 / A
P(E|-H) = (B1 Al) / (B A)
P(E) = P(E|H) P(H) + P(E|-H) P(-H)
! = (Al / A) A + [(Bl Al) / (B A)] (B A)
r
= Al + (Bl Al)
= Bl .
Now these are inserted into Bayes' rule:
P(H|E) = [P(E|H) P(H)] / P(E)
, = [(Al / A) A] / Bl
i, = Al / Bl .
The outcbme of Al / Bl is just as expected. The probability of
Al, given Bl should equal the number of Al's divided by the
total number Bl's (that is Bl's with A and without A).
i
; Bayes' rule can be successfully applied in the
inference; network only when the exact probabilities of the
15


]
evidence or the hypotheses are known. If the inputs (the
evidence) of a network are known with certainty, Bayes' rule
I,
can be used to determine the probability of the first interior
layer of hypotheses in the network. The subsequent layers
must be iupdated using "subjective Bayesian" rules. This
technique1 of updating has proved useful in expert systems
such as PROSPECTOR, which has successfully helped locate
deposits of several minerals [11]. The main problem with
Bayes rule is that it depends on the global assumption that the
evidential variables are independent.
i i
i
i
!
Odds and Baves' Rule
As mentioned above, Bayes' rule is generally stated
as:
P(H|E) = [P(E|H) P(H)] / P(E) .
i
I
The probability for the negation of the hypothesis using Bayes'
rule is
P(~H|E) = [P(E|-H) P(-H)] / P(E) .
The odds likelihood formulation for Bayes' rule is obtained by
dividing the two formulas above. The odds of an event X
relates to the probability as follows:
O(X) = P(X) / [1 P(X)] .
16


I
Or to express the probability in terms of the odds, the equation
is rewritten as:
P(X) = O(X) / [1 + 0(X)] .
Now the odds-likelihood form of Bayes' rule is
! 0(H|E) = X 0(H) ,
i
i
where 0(H) is the prior probability on H and X is defined as the
likelihood ratio P(E|H)/P(E|-H). In the network, the odds on H,
with evidence E, is updated by multiplying the prior odds on H
by the likelihood ratio X. X is supplied by the expert for each
rule in the network. If X is much larger than 1, the rule has
high strength indicating that the presence of E makes it much
more probable that H is true. X is called the sufficiency
coefficient because the presence of E is sufficient evidence to
assume that H is true. This condition is depicted in Figure 1.2:
Sufficiency relationship between E and H.
i
I
Figure 1.2: Sufficiency relationship between E and H.
17


| If X is much less than 1, the presence of E reduces
i
the probability of H, but is sufficient for -H. This case is
depicted in Figure 1.3: E is sufficient evidence to assume -H.
i A simple analogy to sufficiency would be: the
presence of chocolate chip cookies is sufficient evidence that
chocolate chips are present. But the presence of chocolate chips
is not sufficient evidence that chocolate chip cookies are
present.
I The following equation can be utilized when E is
absent: j
0(H|-E) = X' 0(H)
where X is defined as
P(-E|H) / P(-E|-H) = [1 P(E|H)] / [1 P(E|-H)] .
; X' is called the necessity coefficient because E is
i
i
necessary for H to be true, but not sufficient to guarantee that
18


H is true: A simple analogy of this is the following: flour is
necessary for chocolate chip cookies, but not sufficientthe
recipe still requires chocolate chips, eggs, sugar, and butter.
The presence of flour does not guarantee chocolate chip
cookies; the outcome could be a loaf of bread or muffins. What
X' does provide is a means of updating the odds on H when the
information about E is in the negative. These ideas are
depicted in Figure 1.4: Possible hypotheses, given the evidence
flour.
Figure 1.4: Possible hypotheses, given the evidence flour.
| ;
; X' and X are not dependent on each other, but can
be related in the following equation:
X' = [1 X P(E|-H)] / [1 P(E|-H)] .
I '
: Therefore, the expert must supply values for both X
and X' for every arc in the inference network. See Figure 1.5:
Arc in an inference network, labelled with the sufficiency and
necessity coefficients. These values should be somewhat
19


consistent, in that if X is greater than 1, indicating that the
! !
presence |of E supports H, then X' should be less than 1 so that
J
the absence of E inhibits H, or vise versa. In this network, the
evidence E either supports or inhibits the hypothesis H, but
J
some systems such as MYCIN [12] allow positive knowledge of
E to support H (i.e. X > 1), while negative knowledge of E has no
effect (X'|= 1).
]
(X = 1.8, X1 = 0.6)
Figure 1.5: Arc in an inference network, labelled with the
sufficiency and necessity coefficients.
Updating! Probabilities
j.
P(H|E) is dependent on the certainty with which E is
known. E is observed from E', and H is updated via E, yielding
i
Figure 1.6: Inference with uncertain evidence E*. Figure 1.7: A
I
linear interpolation for computing P(H|E) from P(E|E), could be
1' I
used to obtain values for P(H|E') with P(E|E') between 0 and 1.
i
i
i
20


1

P(H|E)
P(H|E'):
Updated
Probability
of H
|' P(H|-E)
i
0 1
P(E|E'): current probability of E
Figure 1.7: A linear interpolation for computing P(H|E) from
P(E|E').
i'
i
i
i,,
This simple interpolation restricts the prior
probabilities of E and H to be consistent with one another. If
the prior probability of E, P(E), is used for P(E|E'), the resulting
update fpr P(H|E') should equal P(H), the prior probability of H.
But as shown in Figure 1.8: Inconsistencies in prior
i >
probabilities for E and H, these do not always coincide. Because
,' I
it is difficult for the expert to choose consistent prior
probabilities, the linear interpolation is not utilized for this
network;,
21


1
P(H|E'):
Updated
Probability
of H
P(H|E)
P(H)
P(H|-E)
P(E|E'): cunrent probability of E
Figure 1.8: Inconsistencies in prior probabilities for E and H.
: In this case, Pc(E), the prior probability of E which
r
would bdconsistent with the prior probability chosen for H, is
considerably larger than P(E). There are many piecewise-
linear methods to combat the zone of inconsistency between
i
P(E) and. Pc(E).
One method of solving this problem is to introduce
a dead zone between P(E) and Pc(E), in which the updated
P(H|E') would remain at P(H). The reasoning behind this
algorithm; is that if the user cannot give a response outside this
interval, then he is not sufficiently certain of his response to
1''
justify a; change in the current probability of H. This is shown
in Figure 1.9: Piecewise-linear function with dead zone for
i
probability updating.
22


1
P(H|£):
Updated
Probability
of H
P(E|E): cuirent probability of E
Figure 1.9: Piecewise-linear function with dead zone for
probability updating.
enhances
If the expert states rules in which the presence of E
the odds on H, but the absence of E has no significant
impact, the knowledge engineer may wish to use a piecewise-
linear function in which the lack of E has no effect. In other
words, if P(E|E') is less than PC(E), P(H|E') remains at P(H). The
evidence E may be sufficient for H, but is not necessary. Such
| i,
an algorithm is depicted in Figure 1.10: "Sufficiency-only"
updating function. Holding all values of P(H|E') > P(H) restricts
, ( i
the expert from allowing X > 1, and X' = 1.
23


I
P(H|E'):
Updated
Probability
ofH
P(E|E): current probability of E
Figure 1.10: "Sufficiency-only" updating function.
! Or conversely, if the expert states his rules in terms
of necessity, the inference network may be built with the
function shown in Figure 1.11: "Necessity-only" updating
I
function. In this case, a low value for P(E|E') (i.e. > Pc(E)) can
negatively influence P(H|E'), but a high value of P(E|E') cannot
raise the I value of P(H|E') over P(H). This is an appropriate
!;
function when E is necessary for H, but not sufficient.
I l
I
l
24


I
Figure 1.11: "Necessity-only updating function.
There are numerous options to chose from in
selecting; the proper function for an expert system. The
function depicted in Figure 1.12: Interpolation function used in
the MYCIN system, successfully handled uncertainty in the
I
MYCIN system [13].
i
25


P(E|E): current probability of E
Figure 1.12: Interpolation function used in the MYCIN system,
where Pt(E) = P(E) + t [1 P(E)], with t = 0.2.
j The updating method shown in Figure 1.13:
Piecewise-linear function for updating the probability of a
hypothesis, is later utilized in the inference network developed
l
in this thesis. This method solves the inconsistency problem
without inhibiting the updating in any regions. The following
equations are used to implement this technique:
1. P(H|E) is computed from P(H), A, and A', using the following
equation:
P(H|E) = 0(H|E) / [1 + 0(H|E)] = A 0(H) / [1 + A 0(H)] .
;,i
2. P(H|-E) is computed with the following equation:
i !
P(H|-E) = 0(H|-E) / [1 + 0(H|-E)] = A' 0(H) / [1 + A 0(H)] .
26


3. P(H|E') is computed from P(E|E) as follows:
P(H|E') = P(H|-E) + P(E|E') [P(H) P(H|-E)] / P(E)
for P(E|E') < P(E), and
*
P(H|E') = P(H) + [P(E|F) P(E)] [P(H|E) P(H)] / [1 P(E)]
for P(E|E') > P(E).
Figure 1.13: Piecewise-linear function for updating the
probability of a hypothesis.
Traveller's Restaurant Selection Problem
J
l
The topic of this inference network is the "The
Traveller's Restaurant Selection Problem." A traveller wishes
to choose a good restaurant for dinner. To make his selection,
i
he can observe many details about the restaurant before going
I
inside arid being seated. These observations are then used as
the inputs to the network to predict the overall food quality.
27


I
The nice aspect of this problem is that anyone can
be an "expert" in determining his own taste in restaurants. The
' I '
expert forms the basic network structure by determining the
relative inputs, the interior states, the outputs, and the
relationships among them. After the basic network is
i
established, the expert and knowledge engineer tweak the
system to the desired behavior.
I
i;
i
Figure 1.14: Probabilistic inference network for the
Traveller's Restaurant Selection, shows the basic framework of
l
the network. The first column of nodes is formed from the
observable details of the restaurant. These include the
r,
following:
i
1. DECOR is high when the restaurant is nicely decorated.
2. "jTABLE-SETTING is high when the setting contains nice
dishes, silverware, linen napkins, flowers, or candles. 3 4 5 6
3. | SURFACE-CLEANLINESS is high when the table, dishes,
1 and floor appear to be clean.
i
4. AIR is high when the air is fresh.
I'
5. SOUNDS is high when the noise is low and the music is
, pleasant.
i ;
i
6. |: CLIENTELE is high when diners are well dressed and
behaved.
28


; i
i '
i
I ;
7. MENU is high when the menu contains interesting and
sufficient selections.
I.
8. PRICES is high when the prices are reasonable.
: i:
9. | SERVICE is high when the staff appears courteous and
timely.
The second column of the network represents
abstract Qualities of the restaurant that form a summary of the
inputs. These include Popularity, Elegance, Artistry, and
Cleanliness. The third column represents specific qualities
about the; food, which are predicted from the previous column
of abstract qualities of the restaurant. These include Taste,
Texture, Appearance, Quantity, Correctness, Nutrition, and
Hygiene.!, And the final column is the output node: Overall-
i ,
food-quality.
I
29


Primary Evidential
Variables
Lumped Evidential
Variables
Predicted Component
Variables
Predicted
Summary Variable
DECOR
TABLE SETTING
SURFACE
CLEANLINESS
AIR
SOUNDS
CLIENTELE
MENU
PRICES
SERVICE
OVERALL
FOOD
QUALITY
Figure 1.14: Probabilistic inference network for the
Traveller's Restaurant Selection Problem.


CHAPTER 2
l
! INFERENCE NETWORK #1
i
Expert #1 provides her opinions about restaurants
i
as a whole. The first question asked of the expert is "In your
opinion, what is the probability that a restaurant (any
j I;
restaurant) has good decor?" This probability question is asked
about every node in the network. These values are written
into the program as the "prior probabilities." Next, the expert
is asked for "current probabilities," which are her opinions
about a specific restaurant. The expert supplies opinions only
i
for the network inputs, the first column of the network,
because these are the only observable items. The current
probabilities for the nodes of the subsequent layers are set
equal to the prior probabilities.
Next the expert supplies values for the sufficiency
I "
and necessity coefficients, X and X', for every arc in the
network. Note that the input nodes do not have any incoming
arcs to update their values; the expert simply chooses their
current probabilities when evaluating a particular restaurant.


]
I-;
L!
The prior probabilities, necessity and sufficiency
coefficients form the basic inference network. The first
restaurant is ready to be evaluated when the expert has chosen
i i
current probabilities for the inputs.
System Behavior and Analysis
i
!
Chili's Bar and Grill was the first restaurant
i ,
evaluated: by Expert #\. The program rated Chili's Overall-
food-quality at 79%. But this number does not have any
meaning until there are other restaurant evaluations to
compare \yith it. Expert #1 may have an opinion that is
skewed, thus creating a network that doesn't fully utilize the 0
to 100% i scale. For example, an expert may feel that most
restaurants should be above 75%, or that there should be a Bell
curve centered at 50%. Either may be correct, as long as the
network performs as the expert desires. In the first attempt,
i
Expert #V supplied the values shown in Table 2.1: Current
probability inputs for four restaurants: Chili's Bar and Grill,
Grisanti's Restaurant, Benigan's, and Wolfs Restaurant and
Lounge. The network response is shown in Table 2.2: Interior
node values and outputs for four restaurants. Expert #1 was
not satisfied with the initial results.
32


I
Table 2.1: Current probability inputs for four restaurants
1 . Chili's Grisanti's Beniean's Wolfs
11 Decor 0.7 0.9 0.8 0.6
Table-setting 0.6 0.6 0.6 0.6
Surface-clean 0.5 0.9 0.5 0.5
Air I 0.7 0.9 0.8 0.4
Sounds 0.7 0.3 0.7 0.5
Clientele, 0.9 0.8 0.7 0.2
Menu 0.8 0.9 0.8 0.3
Prices 0.6 0.4 0.7 0.9
Service 0.7 0.6 0.2 0.7
1
Table 2.2; Interior node restaurants. Chilis values and outputs for Grisanti's Beniean' four s Wolfs
Popularity 0.78 0.52 0.69 0.30
Elegance 0.81 0.73 0.51 0.21
Artistry 0.85 0.91 0.69 0.61
Cleanliness i 0.56 0.88 0.62 0.36
Taste 0.72 0.61 0.68 0.37
Texture , 0.72 0.61 0.68 0.37
Appearance 0.75 0.78 0.68 0.65
Quantity 0.68 0.61 0.66 0.55
Correctness 0.71 0.68 0.63 0.48
Nutrition 0.76 0.60 0.59 0.31
Hygiene] 0.44 0.65 0.48 0.31
Food-quality i 0.79 0.67 0.65 0.11
Out of the four restaurants, Grisanti's is Expert
favorite with Chili's as a close second, Benigan's as a more
distant third, and Wolfs at the very bottom. At this point,
expert decided to verify that the current probabilities in Table
33


i
l
2.1 were consistent with one another. For example, is the Decor
of the restaurants properly ranked with Grisanti's = 0.9,
Benigan's = 0.8, Chili's = 0.7, and Wolfs = 0.6? In retrospect,
i
the expert decided Chili's and Benigan's Decor were very much
alike and should be equally ranked at 0.8. This comparison
was done for each input. The updated values are listed in
Table 2.3; Revised current probability inputs for four
restaurants. The outcome of these adjustment is shown in
!'l
Table 2.4: Interior node values and outputs using revised
current probabilities.
Table 2.3: Revised current probability inputs for four
restaurants.
' Chili's Grisanti's Benigan's Wo]
1 Decor j! 0.8 0.9 0.8 0.6
Table-setting 0.6 0.7 0.6 0.6
Surface-clean 0.5 0.9 0.5 0.6
Air ,i! 0.8 0.9 0.8 0.5
Sounds 0.7 0.4 0.7 0.5
Clientele 0.7 0.9 0.7 0.4
Menu i 0.8 0.9 0.8 0.3
Prices j 0.7 0.4 0.7 0.9
Service 0.7 0.6 0.1 0.7
34


Table 2.4: Interior node values and outputs for four
restaurants using revised current probability inputs.
I Chili's Grisanti's Beniean's Wolfs
Popularity 0.69 0.64 0.69 0.37
Elegance 1 0.74 0.84 0.37 0.29
Artistry ! 0.88 0.92 0.55 0.61
Cleanliness 0.62 0.88 0.62 0.43
Taste !' 0.68 0.66 0.68 0.47
Texture 0.68 0.66 0.68 0.47
Appearance 0.77 0.78 0.62 0.65
Quantity 0.66 0.64 0.66 0.57
Correctness 0.69 0.71 0.57 0.59
Nutrition 0.71 0.67 0.53 0.44
Hygiene j 0.48 0.65 0.48 0.35
Food-quality 0.75 0.74 0.60 0.30
The restaurants are still not properly ranked.
Instead of experimenting with four restaurants and the entire
network,; a few simple tests were utilized to determine the
>' ij
basic behavior at a node.
Behavior at a Node
When a node and its inputs are singled out of the
network, an isolated analysis of its value, given different input
scenarios, can be performed. The Popularity node is chosen for
simplicity because it has only two inputs, as shown in Figure
2.1: The Popularity node and its inputs, Sounds and Clientele.
| ,
j
35


Sounds
Popularity
Clientele
!, i
Figure 2.1: The Popularity node and its inputs, Sounds and
Clientele.'
The results of the following input scenarios will be
analyzed and plotted.
1 r
1. Sounds and Clientele vary from 0 to 1, together.
2. Sounds varies from 0 to 1, and Clientele varies from 1 to 0.
3. Sounds varies from 0 to 1, while Clientele is held at 0.5.
i
4. Sounds varies from 0 to 1, while Clientele is held at 0.3.
5. Sounds varies from 0 to 1, while Clientele is held at 0.8.
I
The first plot to be analyzed is shown in Figure 2.2:
The resulting value of the Popularity node with Input Scenario
#1: Sounds and Clientele vary together from 0 to 1. Popularity
starts at 0.11 and ends at 0.84. As expected, this scenario
produced, both the minimum value for Popularity when both
inputs were 0, and the maximum value when both inputs were
1.
36


1
0.9
0.8
0.7 4

0
j
" 1
j
!
: r;.
j.!;
; ' / \
!' / i
: | l
1 11 1 II1 1 II1 1 mr
Clientele = 0 to 1
! doddddddd
j Sounds
Figure 2.2: The resulting value of the Popularity node with
Input Scenario #1: Sounds and Clientele vary together from 0
to 1.
The second plot to be analyzed is shown in Figure
2.3: Th^ resulting value of the Popularity node with Input
Scenario #2: Sounds varies from 0 to 1, while Clientele varies
from 1 to 0. Popularity starts at 0.45 and ends at 0.45, with a
peak value of 0.50 at the center of the curve. Equal starting
and ending values indicate that Expert #1 has placed
approximately equal importance on Sounds and Clientele in
determining if a restaurant is Popular. She plans to change this
! i
behavior because she believes that the Clientele is the
dominating input in determining Popularity. The peak in the
center shows that when both inputs are equal to 0.5, the
system gives the best rating. This is a desirable trait, in that, if
37


' I
I I
M
i:
one of the two inputs is unusually low, it cannot be
overshadowed by another exceptionally high input. Thus, a
, I'
group of average inputs will obtain a higher rating than a
group of j high and low inputs that sum to the average.
i i'
Figure 2,3: The resulting value of the Popularity node with
Input Scenario #2: Sounds varies from 0 to 1, while Clientele
varies from 1 to 0.
The third plot to be analyzed is shown in Figure 2.4:
i j
The resulting value of the Popularity node with Input Scenario
#3, #4, arid #5\ Sounds varies from 0 to 1, while Clientele is
i
held constant at 0.5, 0.3, and 0.8, respectively. Since these
three lines are nearly parallel, the response of this system is
close to linear. Consider Scenario #3 to be the norm for
i J]
purposesjpf analyzing Scenarios #2 and #4. Note that at the
I
38


start, Scenario #4 is 0.12 less than Scenario #3, but at the end
Scenario #4 is 0.18 less. This supports the system trait
mentioned above that the high input on Sounds cannot
overshadow the low input of 0.3 on Clientele. Scenario #5
!
starts with a value 0.13 greater than Scenario #3, but this
declines to 0.11 greater at the end. This indicates that the
higher value on Clientele has more impact on Popularity when
the input on Sounds is low.
1 -
0.9 -
0.8-
*0-7-
'§0.6
1f).5 *
P
^0.4-
0.3-
0.2-
0.1-
0-
o^Nm'tin'OhoooNH
: ooooooooo
Sounds
=1
\ ! J
\
;1
\ >
/ f y J r r
Y i j r r"
: 1 K1 .J J r' r"
TTIT J r" Y-
T rm" TTTT TTTT TTTT TTTT TTTT TTTT l ii i TTTT TTTT
Clientele = 0.5
Clientele = 0.3
Clientele = 0.8
Figure 2.4: The resulting values of the Popularity node with
Input Scenarios #3, #4, and #5: Sounds varies from 0 to 1,
while Clientele is held constant at 0.5, 0.3, and 0.8, respectively.
With knowledge from the experiments above, each
interior node of one restaurant will be analyzed to improve
i
I
i
39


system performance. Expert #1 has chosen to perform this
indepth analysis on her most frequented restaurantChilis.
I
Indepth Case Analysis
.i;
!
1 As shown in the first series of tests, one small
i i
change in' Sounds dramatically alters the value of Popularity.
This experiment was carried one step farther to observe the
!
effect on | pverall-food-quality. A 0.1 increase in Sounds
increased, jOverall-food-quality from approximately 0.5 to 0.7.
Expert #l!; felt that a small change in what she considered to be
an insignificant variable should not affect the Overall-food-
quality that drastically. Reference to Figure 1.14: Probabilistic
inference network for the Traveller's Restaurant Selection
Problem, shows that Sounds is an input to Popularity and
Elegance.'! Popularity is an input to Taste, Texture, Appearance,
i
Quality, arid Nutrition. Elegance is an input to Taste, Texture,
Correctness, and Nutrition. All of the latter are inputs to
Overall-fobd-quality. With this much fan-out, it is easy to see
how one small change in Sounds could drastically affect the
Overall-fpod-quality. Each arc is considered to be an
I'll
independent input, but the expert must weight each arc with
respect to1 the other incoming arcs. Although Expert #1 didn't
agree with all of the intermediate assertions of this network,
40


she kept;! them in place for the sake of comparison, and
;ii
weighted; [them according to her opinion.
'H
;, Inconsistencies between A and A' seemed to cause
*
havoc. Expert #1 reevaluated her A and A' values, following the
premise that if positive knowledge of E increased the
;-i: (
probability of H, then negative knowledge of E could not
;ir
i
increase the probability of H.
Next, Expert #1 wanted to increase the levels of
Taste, Texture, and Hygiene so that they would have more
significance in determining the Overall-food-quality. The first
reaction jis to increase the current probabilities of the inputs
' I;
that have outgoing arcs to these nodes. This may not produce
the desiijed effect, though. These increased current
probabilities will feed into other nodes that perhaps do not
need their values increased.
; The next method of boosting these nodes' values
i
was to lpwer the prior probability of these nodes. This
experiment was performed on the premise that the prior
probability acted as an average. Thus, to receive a high value*
the current probability had to exceed the prior probability.
:i '
This preihise appears to be incorrect. In actuality, a higher
! ii
prior probability produced a higher nodal value. This seems to


imply tha: the system expects a higher nodal value, given prior
information that it will be high.
.111
Increasing the sufficiency coefficient values (X) on
;; i
the nodes]' incoming arcs produced considerable increases in
\i I
the nodal 'values. This works only if the input on the arc is
high. Varying the sufficiencies on the incoming arcs to a node
helped Expert #1 weight the inputs to overcome her dislike of
the connectivity of the network.
':'i
I
' Changing the necessity coefficient (A') did not have
' i
significant effect on the nodal values in this case. If the
incoming-1 arcs had low inputs, then higher necessity coefficients
i!'1
would increase the nodal value.
i Final Version
l After these methods of increasing the values of
: I
Taste, Texture, and Hygiene were learned, Expert #1 realized
'vi
the actual desire was to boost their importance to Overall-food-
quality. Therefore, the final adjustments were made to the
sufficiency coefficients of the incoming arcs to Overall-food-
,\i
quality. jin its final form, Expert #l's network yielded the
.i-;i
information given in Table 2.5: Interior node values and
xi
Outputs for four restaurants using revised program. See
42


, 1.
,i1
Appendix: ;A: LISP Code for Traveler's Restaurant Selection
1 ii
Problem [14], [15], [16].
Table 2.5!:! Interior node values and outputs for four
restaurants using revised program.
:v Chilis Grisanti's Benigan's Wolfs
1 Popularity 0.69 0.53 0.69 0.44
Elegance1, 0.85 0.87 0.57 0.37
Artistry J, 0.88 0.92 0.70 0.64
Cleanliness 0.62 0.88 0.62 0.52
Taste 1 0.71 0.65 0.64 0.45
Texture j 0.71 0.65 0.64 0.45
Appearance 0.77 0.78 0.69 0.66
Quantity- 0.66 0.61 0.66 0.57
Correctness 0.71 0.72 0.64 0.58
Nutrition 0.68 0.64 0.61 0.45
Hygiene 0.50 0.70 0.50 0.41
Food-quality 0.79 0.82 0.71 0.42
' i j : NOW that the system has desirable behavior for
these four restaurants, the system was used to rate four new
restaurants. This time, Expert #1 chose to compare restaurants
,i.'-
serving submarine sandwiches. The inputs are shown in Table
2.6: Current probability inputs for four submarine sandwich
r '' !l
restaurants. These current probabilities were chosen with
|1
respect to the other submarine sandwich restaurants; they
i ,
would be; too high if they were placed in the same rating
category Vjwith the restaurants studied above. Expert #1 was
very pleased with the resulting ranking shown in Table 2.7:
43


'l.ii
:'.i,
'I .
Interior node values and outputs for four submarine sandwich
1 r
restaurants: Subcenter, Quizno's Classic Subs, Subway
ii
Sandwiches and Salads, and Michele's Sub Shop.
Table 2:'6j: Current probability inputs for four submarine
sandwich restaurants.
i
[ Subcenter Ouizno's Subwav Michele's
Decor ! 0.6 0.9 0.6 0.6
Table-setting 0.6 0.7 0.5 0.5
Surface-clean 0.4 0.9 0.5 0.4
Air .| 0.8 0.9 0.8 0.2
Sounds ; ii 0.6 0.7 0.5 0.4
Clientele 0.6 0.9 0.7 0.7
Menu jj, 0.8 0.8 0.7 0.6
Prices I1: 0.8 0.6 0.5 0.5
Service I 0.8 0.6 0.4 0.3
Table 2.V;: Interior node values and outputs for four submarine
sandwich! restaurants.
i I1 Subcenter Ouizno's Subwav Mich
Popularity 0.60 0.78 0.60 0.54
Elegance 0.76 0.92 0.67 0.53
Artistry!, 0.84 0.91 0.71 0.61
Cleanliness 0.55 0.88 0.62 0.24
Taste i; 0.65 0.76 0.62 0.56
Texture'!' 0.65 0.76 0.62 0.56
Appearance 0.75 0.78 0.69 0.65
Quantity 0.63 0.68 0.63 0.61
Correctness 0.69 0.73 0.67 0.63
Nutrition 0.63 0.73 0.61 0.55
Hygiene i: 0.44 0.70 0.50 0.21
Food-quality 0.70 0.89 0.70 0.37
44


I
I
I
j
I
i' Now that the system is performing to Expert #l's
satisfaction, she should rate four restaurants in which she has
never eatjen. Then she should dine at all four restaurants, and
.i'
rate the jOverall-food-quality according to her experience.
11
Comparison of the predicted results and actual results would
i '
; i i'
aid in determining if this inference network is a valid means of
selecting j good restaurants. If the comparison is not favorable,
the following steps should be reworked:
1. Determination of the relevant inputs.
2. Determination of states of nature or decision alternatives.
3. Determination of intermediate assertions.
; i j;
4. Formulation of inference links.
1!
5. Tuning the probabilities and the fuzzy inference functions.
t :
.1 1
j-
I i,
i,
'i.
45


CHAPTER 3
;r,
;i i
KNOWLEDGE ENGINEERING
I
r'
1 r !.
I ;1 Knowledge engineering is the process of capturing,
analyzing!,f and transforming the domain expert's knowledge
into an artificially intelligent (AI) system [17]. The domain
, j11
expert mu!st have special knowledge on a topic, which he can
explain in a detailed manner, and can be mapped into an AI
system, jj Judgment, experience, and intuition have proven to be
the most; [difficult aspects for the domain expert to describe.
The knowledge engineer bridges the gap between the domain
1;
expert's knowledge and the explicit rules required by a
computer;, system [18].
| ,
i
1 Inference Network #2
I Although the example problem may seem fairly
simple, tile knowledge acquisition was somewhat complex due
to the falct that the expert did not have the project background
;;;
that the ijdomain expert of a real-life problem would possess.
; r
Expert #2: is an electrical engineer. He had some familiarity
;i I
with probability, but no knowledge of Bayes' rule, neural
networks!!! or expert systems. Communicating these concepts to
'i
:;-i!
j!
,in
II 'i
i.


i I'
someone of another discipline may have been more difficult for
' i !!
the knowledge engineer, but it might have added an interesting
-11
twist to 'jthe resulting inference network.
v The first thing Expert #2 did was to rule out all fast
li
food restaurants. The reason for this was that he did not need
an expert r system to predict the overall food quality. Once the
! I
diner has i: eaten at each type of fast food restaurant, he knows
what to .expect. This observation assumes all restaurants
!' I'
within aj franchise are identical.
i Next, Expert #2 chose the prior probabilities for the
i i
nodes of, lithe network, and the sufficiency and necessity
coefficients for the arcs of the network. Although the
'' i
j '
relationship between the sufficiency and necessity coefficients
was thoroughly explained, Expert #2 allowed both positive and
Vs-
negative i knowledge of an event to support the same
i
hypothesis. Knowing full well that the values provided by
11 ''
Expert #2 would not produce the desired system behavior, the
i "
1 i1
knowledge engineer generated the initial results. Producing a
if1'
prototype!' working model allows the domain expert an easy
i!
1 > i
method of correcting the specific errors and learning about the
nature of!1 the problem. Expert #2 changed his approach after
\\1
viewing !the initial results.
47


l|
- :l
j Due to the knowledge engineer's newly acquired
experience with Inference Network #1, UNIX, and the vi editor,
Inference; Network #2 reached the desired performance much
faster than the first. This speed was fortunate due to the
r: i
lengthy turn-around time required to implement each change
: i I1
of the network. The inputs for Expert #2 first group of
>j
restaurants is shown in Table 3.1: Expert #2's current
probability inputs for three Italian restaurants. Expert #2 was
'! ii
pleased, with the final results, which are shown in Table 3.2:
Expert #2s interior node values and outputs for three Italian
restaurants: Olive Garden Italian Restaurant, Grisanti's
Restaurant, and The Saucy Noodle Ristorante.
i
If
Table 3.1: Expert #2's current probability inputs for three
Italian restaurants.
' i! : |; Olive Garden Grisanti's Saucy Noodle
Decor f|j 0.8 0.8 0.6
Table-siting 0.6 0.8 0.5
Surface+ejlean 0.9 0.7 0.7
Air !, 0.8 0.8 0.8
Sounds | 0.8 0.4 0.8
Clientele 0.6 0.8 0.6
Menu ! 0.9 0.7 0.9
Prices '! 0.5 0.4 0.4
Service ! 0.8 0.5 0.4
i:
I,
I '
i:
i.
i ('
48


I
I
I 1
! <
ii
Table 3.2; Expert #2's interior node values and outputs for
three Italian i i ,i restaurants. Olive Garden Grisanti's Saucy Noodle
Popularity 0.58 0.50 0.58
Elegancej 0.89 0.78 0.74
Artistry : 0.79 0.69 0.61
Cleanline'ss 0.82 0.71 0.71
Taste 0.78 0.75 0.74
Texture ! 0.49 0.41 0.45
Appearance 0.48 0.45 0.42
Quantity! 0.63 0.60 0.63
Correctness 0.58 0.53 0.51
Nutrition! 0.20 0.20 0.20
Hygiene ' 0.66 0.57 0.57
Food-quality 0.79 0.70 0.72
i
)
' h
.1
I
i ,
I
i I
I
j '
i
i
i
49


CHAPTER 4
COMPARING INFERENCE NETWORKS #1 AND #2
Different Desired Behavior
i
r For the sake of comparison, Experts #1 and #2
evaluated two identical sets of restaurants. Expert #1 's inputs
j '
and results were shown in Tables 2.1 and 2.2. Expert #2's
inputs ark shown below in Table 4.1: Expert #2's current
probability inputs for four restaurants. The resulting values
are shown in Table 4.2: Expert #2's interior node values and
outputs for four restaurants.
i | Comparison of Expert #1 's and Expert #2's inputs
shows that Expert #2 generally gives lower scores on the
l-
observable qualities than Expert #1. This doesn't mean that he
is more Critical than Expert #1; his system is tailored to his
method of scoring.
i
Both experts felt that the taste and texture were
! "
very important to the network. The main difference between
the two jexperts is the importance they placed on the elements
of Service, Cleanliness, and Prices. Expert #1 felt that prices


and cleanliness were of substantial importance, while Expert #2
favored service. Although Expert #2's desired behavior was
different from Expert #l's, he utilized many of the same
methods of adjusting the system as Expert #1, which are
described:, in Chapter #2. I
Table 4.1: Expert #2's current probability inputs for four
restaurants.
Chili's Grisanti's Benicans Wolfs
Decor 0.6 0.8 0.6 0.3
Table-sefting 0.6 0.8 0.6 0.3
Surface-clean 0.5 0.7 0.5 0.2
Air | 0.8 0.8 0.8 0.3
Sounds ! 0.8 0.4 0.8 0.5
Clientele 0.6 0.8 0.4 0.2
Menu 0.4 0.7 0.4 0.5
Prices 0.9 0.4 0.4 0.9
Service 0.8 0.5 0.1 0.5
I
I
i
51


'l l!
I .
Table 4.2* Expert #2's interior node values and outputs for four
restaurants.
,;i ii Chilis Grisantis Beniean's Wolfs
Popularity 0.58 0.50 0.48 0.23
Elegance! 0.65 0.78 0.23 0.15
Artistry j 0.53 0.69 0.25 0.31
Cleanliness 0.58 0.71 0.58 0.14
Taste ! 0.71 0.75 0.53 0.30
Texture 0.43 0.41 0.25 0.13
Appearance 0.39 0.45 0.27 0.30
Quantity; 0.63 0.60 0.59 0.46
Correctness 0.47 0.53 0.25 0.20
,.' Ji Nutrition;: 0.20 0.20 0.17 0.09
Hygiene! 0.47 0.57 0.47 0.15
Food-quality i 0.65 0.70 0.40 0.05
ii'i Table 4.3: Expert #2's current probability inputs for submarine
sandwich; restaurants.
; i" Subcenter Ouizno's Subwav Michele's
Decor ; I " 0.3 0.5 0.5 0.5
Table-setjting 0.3 0.5 0.4 0.5
Surface-clean 0.6 0.4 0.3 0.5
Air | 0.4 0.5 0.4 0.4
1 Sounds i 0.5 0.6 0.5 0.5
:| 1 Clientele!. 0.6 0.5 0.5 0.5
Menu 0.8 0.5 0.5 0.6
Prices j; 0.9 0.6 0.4 0.3
Service .!' 0.9 0.6 0.5 0.4

I
1 ii
I ; I
I
.. | I*
52


Il
Table 4.4; I Expert #2's interior node values and outputs for
submarine!1 sandwich restaurants.
.i r Subcenter Ouizno's Subwav Michele's
M i Popularity 0.41 0.41 0.37 0.37
Elegance |! 0.55 0.43 0.32 0.32
Artistry j 0.55 0.46 0.41 0.44
Cleanliness 0.44 0.34 0.23 0.37
Taste 0.63 0.58 0.52 0.52
Texture i 0.32 0.29 0.25 0.25
Appearance 0.39 0.36 0.34 0.35
Quantity:'!; 0.55 0.55 0.53 0.53
Correctness 0.42 0.36 0.31 0.31
Nutrition! 0.18 0.18 0.17 0.17
Hygiene | 0.35 0.28 0.21 0.31
Food-quaility 0.47 0.37 0.24 0.31
I
\'V
:j.
i
i.
ii
i
i '
i
i i
i'1
53


CHAPTER 5
! SUMMARY
i j
l '
i This paper details the implementation and analysis
of an inference network utilizing subjective Bayesian updating.
Several possible candidates involving back propagation and
rule-based systems were explored before arriving at this
choice. The inference network models general decision-making
' l
in a practical, yet mathematically meaningful way.
'i
Uncertain information is used as the inputs, which
j
form the' first layer of the network. These inputs are then
i
summarized into abstractions, which form the first interior
layer(s) of the network. Since the abstracted nodes are
evaluated', from several inputs, the integration reduces the
'l
effects of; noise.
i
I
i On the other end of the network, the outputs form
the final layer. As in most design problems, the problem
reduction; stage breaks the desired output down to smaller
problems] In this case, the specific characteristics that directly
affect the: output are determined, as well as the extent of their
individual! importance to the output. I
I
I
I
l


I
After the inputs are abstracted and the output is
reduced to simpler problems, these two sets of nodes must be
linked to form the network. One method of approaching this
task is through Bayes' rule, which is commonly applied to the
exercise of computing conclusions from premises.
I
'1 The expert provides his degree of belief or
i.
uncertainty in the inputs, the hypotheses, and the relationships
between the two categories. These are then represented as
probabilities and linked together via Bayes' rule and a
I ,
subjective form of Bayes' rule. Bayes' rule states that the
i I1
probability of the hypothesis H being true, given that the
evidence ;E is true, is equal to the ratio of the probability that
the evidence and the hypothesis are true, divided by the
probability that the hypothesis is true, whether or not the
i,
evidence jis true. Bayes' rule is normally converted to an odds-
likelihood j1 ratio form for the domain expert's benefit.
There are many updating functions, which depend
; i
on how the domain expert expresses the rules. The chosen
method solves prior probability inconsistencies without
J i,
inhibiting!1 the updating in any regions.
I :
I
i !
The Traveller's Restaurant Selection Problem from
Tanimoto's book The Elements of Artificial Intelligence was
I
55


; i.
I .
utilized to demonstrate this method. Although the subject
matter may seem trivial, the design concepts of this example
i i
problem can be applied to an entire class of inference
I ,
networks,| > including automobile diagnosis, weather prediction,
i '
medical diagnosis, and stock market predictions. When the
i f
example problem is comprised of simple subject matter, the
i I
underlying design principles of inference networks can be
I ;
more clearly focused upon.
| The behavior analysis was performed at the nodal
level as well as the system level. An exercise in knowledge
engineering was also performed.
i
| As noted above, this problem encompasses a large
;!
group of! general design projects. The knowledge gained in the
building jand refining of this project is easily transferred to
many areas.
: With the proper design, expert systems can provide
valuable jtools for many commercial applications. It is my
sincerest i hope that corporations continue to allocate resources
and funds toward further developments in this field.
56


II
'I
I
' I
ll
i .
1 i
I
APPENDIX A
LISP COEIE FOR TRAVELERS RESTAURANT SELECTION PROBLEM
.1
I
.In
. I "
:h
i-
i
i


; The Traveling Diner
______INFNET.LSP _ . ____ . _ . _____
; A LISP implementation'of an' iNFerence NETwoYk that
; uses subjective-Bayesian updating of certainty values.
; Given values of particular evidential variables,
; the system determines the certainty value of several
; intermediate variables and a summary variable.
; The problem solved is the prediction of food quality
; in a restaurant:
; the "Traveller's Restaurant Selection Problem"
(defvar h)
la (defun odds (prob)
o (/ prob (- 1.0 prob)))
(defun prob (odds)
(/ odds (+ 1.0 odds)))
Each node of the inference network is represented by an
atom with its property list. The next function helps
to set up these representations.
(defun define_node
(setf (get
(setf (get
(setf (get
(setf (get
(setf (get
(setf (get
(list)
(first list)
(first list)
(first list)
(first list)
(first list)
(first list)
'name) (first list))
'prior_prob) (second list))
'prior_odds) (odds (second list))
'current_prob) (third list) )
'current odds) (odds (third list))
'arcs) (Tourth list) ))
)


(defun current_prob (n) (get n 'current prob))
(defun priorprob (n) (get n 'prior_prob))
(defun current odds (n) (get n 'current_odds))
(defun prior_ocTds (n) (get n 'prior_odds))
|d^efun_ s.u.f f i_cienc.y. ( ar.c ). ^( second arc. T:
( def un TTeces sI ty ( arc )= (thi rd arc ) )
; Here we set up nodes for the Primary Evidential Variables:
(define node '(decor 0.5 0.8 ( ) ) )
(define" node '(table_setting 0.4 0.6 ( ) ) )
(define" "node '(surface_cleanliness 0.5 0.5 ( ) ) )
(define" "node '(air 0.6 0.8 ( ) ) )
(define" "node ' (sounds 0.5 0.7 ( ) ) )
(define" "node '(clientele 0.5 0.7 ( ) ) )
(define] ]node '(menu 0.4 0.8 ( ) ) )
(define" "node '(prices 0.5 0.7 ( ) ) )
(define" "node '(service 0.4 0.7 ( ) ) )
(setf reporting nil)
; Here are declarations for the Lumped Evidential Variables:
(define node
'(popularity 0.5 0.5
(indep
(arc sounds
(arc clientele
3.0 0.25)
3.0 0.25) ) ) )
(define_node '(elegance 0.4 0.4 (indep
(arc decor
(arc table_setting
(arc sounds
(arc clientele
2.0 0.33)
3.0 0.25)
2.0 0.33)
3.0 0.25)


(arc menu 2.0 0.33)
(arc prices 0.4 1.0)
(arc service 3.0 0.25) )))
(define_no.de./ ( artistry 0.5 0.. 5^(indep_-_^---^^ . . . . -
...... - -(arcdecor 3.0 TK25F
(arc table_setting 3.0 0.25)
(arc menu 3.0 0.25)
(arc service 2.0 0.33) )))
(define node
'(cleanliness 0.5 0.5 (indep
(arc surface_cleanliness 5.0 0.1)
(arc air 3.0 0.25) ) ) )
; Here are node definitions for the Predicted Component Variables:
o\
(define node '(taste 0.5 0.5 (indep (arc popularity (arc elegance 3.0 2.0 0.25) 0.33) ) ) )
(define node '(texture 0.5 0.6 (indep (arc popularity (arc elegance 3.0 2.0 0.25) 0.33) ) ) )
(define node '(appearance 0.6 0 .6 (indep (arc artistry 3.0 0.25) ) ) )
(define node '(quantity 0.6 0.6 (indep (arc popularity 2.0 0.33) ) ) )
(define node '(correctness 0.6 0.6 (indep (arc elegance 2.0 0.33) ) ) )


(define node '(nutrition 0.5 0
5 (indep
(arc
(arc
popularity
elegance
2.0 0.33)
2.0 0.33) )))
(define_node '(hygiene 0.4 0.4 (indep
(arc cleanliness 6.0 0.07) )))
; Here is the Predicted Summary Variable node:
(define_node '(overall_food_quality 0.5 0.5 (indep
(and-marker
(arc taste 5.0 0.1)
(arc texture (and-marker 5.0 0.1) )
(arc appearance 2.0 0.33)
(arc correctness 2.0 0.33) )
(arc quantity 2.0 0.33)
(arc nutrition 2.0 0.33)
(arc hygiene 5.0 0.1) ) ) )
; Compute P(H | E') for a single arc.
(defun update_prob (h arc)
(cond
((> (current_prob (first arc))
(prior_prob (first arc)) )
(report_progress 'supportive arc)
(+(prior_prob h)
(M/(-(prob (* (sufficiency arc) (prior_odds h)))
(prior prob h) )
(- 1.0 (prTor_prob (car arc ))) )
(- (current_prob (car arc))
(prior_prob (car arc)) ) ) ) )


(t (report progress 'inhibitive arc)
( + (prol5 (* (necessity arc ) (prior_odds h)))
(* (/ (- (prior_prob h) (prob (* (necessity arc) (prioc_odds h))))
(prior_prob (car arc) ) ) _ ___________;^
: (^drteri^p;rt)b'-(^a^:ar^)4 ^)^) ) r )-^ ^ "
; If REPORTING is not NIL then describe progress of updating:
(defun report progress (x arc)
(cond ((nuTl reporting) nil)
(t (format t "~%~a along arc ~a~% with prior odds ~a"
x h arc (prior_odds h))
(format t "~% The prior and current prob of E are ~a, ~a"
(prior prob (first arc)) (current prob (first arc)) )
o\ (format t "~%"7) ) )
ts>
; Determine the odds updating factor along the ARC specified,
; given the prior and current probabilities and odds for
; the predecessor node, the priors for the node H, and
; SUFFICIENCY and NECESSITY values along the arc.
(defun effective_arc_lambda (arc)
(/ (odds (update prob h arc))
(prior_odds hj ) )
Determine the updating factors for all arcs coming into H
and multiply them to get an overall odds updating factor.
This scheme assumes that the arcs are treated as if their
influences were independent.


On
U>
(defun combine indep_lambdas (arc_exp)
(print_lambcfas (mapcar #'eval_arc exp (rest arc_exp)))
(apply #'*(mapcar #'eval_arc_exp frest ARC_EXP) ) ) )
(defun print_lambdas (x)
; (format t "effective arc lambdas= ")
; (dolist (y x) (format t "~4,2f "y)))
(defun combine_conjunctive_lambdas (arc_exp)
(apply #'min
(mapcar #'eval_arc_exp
(cdr arc_exp) ) ) )
(defun combinedisjunctivelambdas (arcexp)
(apply #'max
(mapcar #'eval_arc_exp
(cdr arc_exp) ) ) )
(defun update nodes (nodes)
(cond ((nuTl nodes) nil)
(t (update_node (car nodes))
(update_nodes (cdr nodes)) ) ) )
; The function EVAL_ARC_EXP evaluates an arc expression,
; an effective odds updating factor that takes effects of
; the arcs in the expression into account.
(defun eval_arc_exp (arc_exp)
(cond ((eq (car arcexp) 'arc)
(effective_ARC_LAMBDA (CDR ARC_EXP)))
((EQ (CAR ARC_EXP) 'indep)
(combine_indep_lambdas arc exp) )
((eq (car arc exp) and-marlcer)
(combine_con3unctive_lambdas arc_exp) )
finding
all


((eq (car arc exp) 'or-marker)
(combine_disjunctive_lambdas arc_exp) )
(t (print '( illegal a ? c^ ej^p te s si o^in_ (printL ar^exp.)) ) )
(defun update_node (h)
(prog nil
(setf (get h 'current odds)
(* (prior_odcfs h)
(eval_arc_exp (get h 'arcs)) ))
(setf (get h 'current_prob)
(prob (current_odds h)) )
(format t "~% Updated value of node ~a= ~4,2f "
h ( current_j?rob h) )
; (format t "~%")
_ ) )
ON
; Make a pass through the non-input nodes,
; updating their probabilities:
(defun test ()
(update_nodes '(popularity elegance artistry cleanliness
taste texture appearance quantity
correctness nutrition hygiene
overall_food_quality)) )
(defun simple_test ()
(update_nodes '(popularity elegance)))
; (simple-test)
(test)


REFERENCES
': i
[1] Jeffrey Richardson and Marjorie J. DeFries, Intelligent
Systems in Business. Norwood, New Jersey: Ablex, 1990.
[2] E. Sanchez and L. A. Zadeh, Approximate Reasoning in
Intelligent Systems. Decisions and Control. New York:
Pergamon Press, 1987.
[3] Steven L. Tanimoto, The Elements of Artificial
Intelligence Using Common LISP. New York: Computer
Science Press, 1990.
[4] Linda A. Murray and John T. E. Richardson, Intelligent
Systems in a Human Context. New York: Oxford
University Press, 1989.
i
[5] Ian ;praham and Peter Llewelyn Jones, Expert Systems
Knowledge. Uncertainty, and Decision. New York:
Chapman and Hall Ltd., 1988.
11,
[6] J. Stephen Judd, Neural Network Design and the
Complexity of Learning. Cambridge, Massachusetts: The
MIT: Press, 1990.
[7] Murray Shanahan and Richard Southwick, Search.
Inference and Dependencies in Artificial Intelligence.
New York: Ellis Horwood, 1989.
[8] David Anderson, Artificial Intelligence and Intelligent
Systems: The Implication. New York: Halsted Press,
1989.
I r
l
[9] Yun; Peng, Abductive Inference Models for Diagnostic
Problem-solving. New York: Springer-Verlag, 1990.
[10] George C. Canavos, Applied Probability and Statistical
Methbds. Boston, Massachusetts: Little, Brown &
Company Limited, 1984.


[11] R. Q. Duda, P. E. Hart, K. Konolige, and R. Reboh, A
Computer-Based Consultant for Mineral Exploration.
Teqh. rep., SRI International, 1979.
[12] Richard O. Duda, Peter E. Hart, and Nils J. Nilsson,
"Subjective Bayesian Methods for Rule-based Inference
Systems," Proceedings National Computer Conference
(AFIPSL Volume 15, 1976.
'i
i
[13] Bruce G. Buchanan and Edward H. Shortliffe, Rule-based
Expert Systems: The MYCIN Experiments of the Stanford
Heuristic Programming Project. Reading, Massachusetts:
Addison Wesley Publishing Company, Inc., 1989.
I :
[14] Keri Tracton, Programmer's Guide to LISP. Blue Ridge
Summit, Pennsylvania: TAB Books Inc., 1980.
i I'
[15] Guy L. Steele Jr., Common LISP. Bedford, Massachusetts:
Digital Press, 1990.
[16] Patrick Henry Winston and Berthold Klaus Paul Horn,
LISP!. Reading, Massachusetts: Addison-Wesley
Publishing Company, 1989.
[17] Dimitris N. Chorafas, Knowledge Engineering. New York:
Van Nostrand Reinhold, 1990.
[18] Richard Forsyth, Expert Systems. Principles and Case
Studies. New York: Chapman and Hall Ltd., 1989.
i
66