EIGENSPACE ANALYSIS AND DESIGN
OF
HOPFIELD NEURAL NETWORKS
by
Garth A. Ripton
B.S., Lafayette College, 1983
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Electrical Engineering
1993
Â£v^Vi5\V'.^
This thesis for the Master of Science
degree by
Garth A. Ripton
has been approved for the
Department of
Electrical Engineering
Date y
Ripton, Garth A. (M.S., Electrical Engineering)
Eigenspace Analysis and Design of Hopfield Neural Networks
Thesis directed by Associate Professor William J. Wolfe
ABSTRACT
The Hopfield model is a biologically inspired symmetrical network that has
mainly been applied to optimization problems. It is one of several models of the class
of parallel distributed processing (PDP) models. The network solves problems by
dynamically adjusting its state. A given network is set to an initial state and it pro
gresses through many intermediate states until a final stable state is reached. The
final state represents a solution to the problem being modeled. The dynamics of the
model are nonlinear and, hence, are difficult to analyze. This fact also makes the
design of networks to solve new problems cumbersome.
In this thesis, a new perspective for the analysis and design of these networks
is described. This "eigenspace analysis" uses several simplifications to allow a clear
er visualization of network dynamics. A linear system which approximates the non
linear behavior of the network used. This allows the entire range of linear analysis
techniques to be used. Specifically, the eigensystem of the coefficient matrix or
"weight matrix" defines a flow field which approximates the motion of the network
in state space. This eigenspace analysis technique is explained in detail. A specific
symmetrical weight matrix called a homogeneous matrix is introduced to further sim
plify analysis. For a homogeneous weight matrix, the eigensystem is defined by the
discrete Fourier transform (DFT) of a subset of its weights. The correspondence
in
between eigenspaces of the linear approximation and final states of the network is
shown.
The focus on eigenspace analysis results in a useful technique for designing
networks with homogeneous weight matrices. This method starts with desired feasi
ble states or set of solutions to a given problem. Transforming to eigenspace using
the DFT, the eigenvalues of a weight matrix that forces the feasible states to be final
states of the network can be developed. Once other factors such as network stability
are considered, the design is complete. Several examples of the design technique are
given. With further refinement this technique of analysis and design could supple
ment those currently being used.
This abstract accurately represents the content of the candidate's thesis. I recommend
its publication. ;
Signed
IV
Dedication
To Charmaine, my chief inspiration.
v
Contents
Chapter
1. Introduction......:................................................1
1.1 Why Study Hopfield Neural Networks?.................................1
1.2 Parallel Distributed Processing (PDP) Models in General.............2
1.2.1 PDP Concepts........................................................3
1.2.2 Motivations for PDP.................................................4
1.2.3 Components of PDP Models............................................6
2. Hopfield Neural Networks...........................................12
2.1 Hopfield Network Dynamics..........................................14
2.1.1 State Space........................................................15
2.1.2 Update Algorithms..................................................15
2.1.3 The Linear Dynamics Approximation..................................21
2.1.4 Analysis of Nonlinear Dynamics....................................23
2.2 Applications of Hopfield Networks..................................26
2.3 Motivation for Design of Hopfield Networks.........................28
2.4 Problem Statement..................................................29
3. Eigenspace Analysis................................................30
3.1 Problem Modeling...................................................32
3.1.1 Architecture.......................................................32
3.1.2 Activations........................................................34
3.1.3 Definitions of Performance.........................................36
3.2 Weight Matrix......................................................38
vi
3.2.1 Choosing Weights by Intuition.............................
3.2.2 Weight Matrix Design......................................
3.3 Weight Distributions......................................
3.4 Eigenvalues and Eigenspaces of Homogeneous Weight Matrix
3.5 Homogeneous Network Linear Dynamics.......................
3.5.1 Eigenspace Perspective....................................
3.5.2 Stability of Eigenspaces..................................
3.5.3 Eigenspaces and Stable States of the Network................
3.5.4 Eigenspace Analysis of the Assignment Problem.............
3.6 Network Stability.........................................
3.6.1 Approximation of Nonlinear "Neural" Dynamics.............
3.6.2 Effect of External Input..................................
3.7 Final States of the Network and Feasibility...............
3.7.1 States of Activation in the Eigenspace Domain.............
3.7.2 Eigenspaces of the Linear Approximation and Final States....
3.7.3 Factors Affecting Which Final State Is Chosen...............
3.8 Performance of the Network..................................
3.8.1 Feasibility.................................................
3.8.2 Optimality..................................................
4. Eigenspace Design.........................................
4.1 Desired Feasible States...................................
4.2 Transformation of Desired Feasible States to Eigenspace...
4.2.1 Transformed State Domain..................................
vii
.38
.42
.42
.46
.51
.52
.55
.56
.60
.61
.61
.63
.65
.66
.68
.69
.72
.72
.73
.77
.79
.80
.80
4.2.2 Superposition to Form Composite State..................................82
4.3 Design of Linear Dynamics.........................................86
4.4 Determination of Eigenvalues of the Weight Matrix.................90
4.5 Construction of the Weight Matrix.................................93
4.6 Determining the External Input and Stability......................96
4.7 Performance of the Network and Design Examples....................99
5. Conclusion.......................................................104
Appendix
A. Proof of Equivalence of Discrete Fourier Transform and EVP for
Circulant Matrices................................................107
B. Analytical Solution to Linear System...............................112
C. Equilibrium Point and Performance..................................135
D. Design Examples....................................................147
References...............................................................191
vui
Figures
Figure
1.1 The Neuron or Computational Unit.......................................7
1.2 Network Architecture..................................................10
2.1 Activation Functions of Hopfield Style Networks.......................13
2.2 Change in Activation versus Activation................................18
2.3 Interactive Activation Surface........................................19
2.4 Cross Section of Interactive Activation Surface.......................20
2.5 Sigmoidal Activation Surface..........................................20
2.6 Cross Section of Sigmoidal Activation Surface.........................21
3.1 Summary of Eigenspace Analysis........................................31
3.2 One Dimensional and Two Dimensional Network Architecture..............33
3.3 Summary of Network Notation...........................................34
3.4 Optimal State for the Assignment Problem..............................37
3.5 A Network with Two Mutually Inhibitory Units..........................39
3.6 WTA and Assignment Problem Weight Matrices............................41
3.7 Toeplitz and Circulant Matrices.......................................44
3.8 Block Circulant Matrix................................................46
3.9 Weight Distribution for the Assignment Problem........................47
3.10 Eigenvalues of the Assignment Problem Weight Matrix...................50
3.11 Comparison of Eigenspace Solution and Analytical Solution.............59
3.12 Activations in the Linear Region......................................62
IX
3.13 Shifted Versions of Activations........................................67
3.14 Unstable Eigenspaces and DC Eigenspace.................................71
4.1 Summary of Eigenspace Design...........................................78
4.2 Superposition of States with the Same Eigenspace Magnitude.............84
4.3 Superposition of States with Different Eigenspace Magnitudes...........85
4.4 Superposition of the Assignment Problem Eigenspace Magnitude...........87
4.5 Two Dimensional Symmetrical Weight Distribution........................94
4.6 Ranges of Stability....................................................97
x
1. Introduction
1.1 Why Study Hopfield Neural Networks?
The Hopfield neural network model is a specific architecture and dynamic
behavior from the general class of parallel distributed processing (PDP) models. Its
architecture is quite simple but its dynamic behavior is nonlinear resulting in some
interesting and surprising capabilities. This model is probably the simplest example
of the subclass of recursive networks. For this reason the study of the range of
"Hopfield style" networks (variations on the original Hopfield neural network model)
is important. Once the subtilties of these networks are understood more complex
recursive models can be built and effectively used.
The qualities that justify the examination of the Hopfield model subclass
include some inherited from all PDP models, as well as, some unique to the subclass.
The general motivation for PDP is biological, hence, the popular synonym "neural
networks." Massively parallel architecture leads to fast and efficient solutions to
many problems in biological systems which traditional VonNeuman, sequential archi
tecture cannot handle efficiently. PDP models incorporate the style of parallelism
found in the brain. It is the Hopfield model that, perhaps more than others, captures
some of the most fundamental microstructure of the biological nervous system. The
characteristic of a simple, symmetrical layout separates the Hopfield model from
most others. Just as in the brain, computations are distributed across the network not
centralized in a given area. Hopfield networks that solve simple problems can be set
up or designed intuitively. Specifically, networks that solve some optimization prob
lems are easy to construct.
1
The clear advantages of Hopfield style models are tempered by several diffi
culties. Although Hopfield models do capture certain important aspects of biological
neural systems, they are not complex enough to model the nervous system even on a
very small scale. Modification of the model itself is required to more accurately rep
resent its source, the nervous system. These more theoretical difficulties are in the
realm of the neurobiologist. There are several practical difficulties in analyzing the
model as it stands. It is nonlinear, meaning the classical techniques of analysis and
design of linear1 dynamic systems must be modified. Although nonlinear systems
have been the subject of much study, analysis of them is not subject to the type of
generalized techniques that linear systems are. Since analysis techniques cannot be
generalized easily, the techniques for design of nonlinear systems are equally, if not
more, difficult. Designing a Hopfield network to solve a specific problem essentially
amounts to designing a nonlinear system. The solutions which Hopfield networks
find in the case of optimization problems can be nonoptimal.
It is the object here to address some of the difficulties specifically in the areas
of analysis and design. A simplified system of analysis is proposed as well as a
design technique for certain types of networks. The hope is that these techniques and
refinements to them will make the application of Hopfield models more attractive.
1.2 Parallel Distributed Processing (PDP) Models in General
Parallel distributed processing (PDP) is a paradigm essentially different from
both the sequential processing of classical computer science and from more recent
parallel processing approaches. The concepts of and motivations for PDP are impor
tant underpinnings for any discussion of any specific PDP model such as the
Hopfield network.
2
1.2.1 PDP Concepts
The PDP model involves a large number of highly interconnected processing
elements [17]. Highly interconnected means a large number of processing elements
are connected to each other, hence, the network is "massively parallel." Each element
or unit is simple in its computational abilities. Computational power is not concen
trated in the individual unit but distributed throughout the network. The simple pro
cessing units are not under centralized control either. There is no director of the com
putation. This leads to a distributed representation of the "knowledge." Individual
units do not know how to find a solution or a part of the solution but all units collec
tively hold the "knowledge" required for a solution.
The concept that a number of elements with simple behavior can exhibit com
plex behavior collectively is called "emergent" behavior. When the behavior happens
to correspond to the solution of a given problem the behavior is called emergent com
putation [4]. The effect is the abilities of the whole (the network) are more than the
sum of its parts (the individual units). In traditional computer science it is essential
that the sum of the parts equal the whole. For instance, the meaning of a single state
ment in a traditional computer language must be clear without reference to other
statements. This makes a traditional computer program a collection of individual
statements that behave predictably without reference to other statements. In general,
computer languages are "context free." In the physical world, many examples of non
linear emergent behavior exist. For example, chemical reactions with the presence of
a catalyst. PDP or "connectionist" models are an attempt to put emergent behavior to
use in computer science.
3
The ability to adapt to surroundings or "learn" is often associated with PDP
models. Learning amounts to modification of the factors or "coefficients" of a model
using information about the environment in which the model exists. A network with
the capability of learning can dynamically modify its behavior. The modification is
undertaken to improve network behavior.
1.2.2 Motivations for PDP
The need for higher performance in specific areas, as always, pushes investi
gations onto new computational schemes. The performance characteristics of PDP
models are unique and interest in them has arisen due to the limitations of other tech
niques. The limitations in parallel processing paradigms are among the major reasons
for interest in PDP. The use of parallel computational models is essential for an
increase in processing speed or efficiency. This type of performance increase is
required either to provide a faster solution to problems which can be handled by
sequential computational models or to solve problems that are larger than those that
can be solved sequentially in a reasonable time [10]. The dominant method of paral
lel processing involves decomposing a problem into subproblems which can be
solved concurrently. PDP takes quite a different approach. It considers the problem as
a whole and does not localize component parts of the problem. The simple interaction
between units when viewed globally solves the problem. This means PDP is a paral
lel processing method but it is a distributed one rather than a localized one.
Conceptually PDP can take advantage of the increase in speed or efficiency inherent
in parallel processing without the requiring decomposition of the problem.
An advantage of the distributed nature of PDP is its robustness. This charac
4
teristic is defined by the ability to come up with a solution in the presence of faults in
the system or network. If a processor fails in a sequential system, the entire process
fails no solution is possible. In a localized parallel system, if one processor fails, a
portion of the process fails a solution is possible but it is a very poor solution. In a
PDP system, if a single processor or unit fails, the process is degraded a solution
which is a poorer solution results. It is "poorer" from the standpoint that it is not as
good as the solution resulting when all units are functioning. Degradation of perfor
mance is proportional to the number of failed processors making it somewhat pre
dictable. Degradation in the case of the other computational models is either com
plete (single processor case) or unpredictable (multiprocessor locally parallel case
degradation is dependent on which processor fails).
As mentioned, decomposition of a problem into subproblems for concurrent
solution is one of the major complexities of parallel computing approaches. PDP mod
els not have to deal with that problem. Some do not require any traditional "program
ming" at all. Network performance is modified by the environment in the case of PDP
models that "learn." After some of the network architectural characteristics are
defined, the network can modify its own internal variables during a learning period.
Once the solution has been learned the network matches unknown inputs to outputs
which solve the problem. The advantage is that there is no need for traditional algo
rithm development. This is tempered by the drawback that a training scheme and a set
of training data are needed [3]. The required training set can be very large and the
training period can be prohibitively long. The capability of learning is an attractive
characteristic but it has limitations and is not viable for some PDP models.
5
It is most important to make the distinction between the "traditional" pro
gramming mentioned in connection with straight parallel processing and the modifi
cation of internal factors of PDP models either by a learning scheme or by design.
Traditionally, the "program" controls the method by which the processor solves a
problem. It is a,direct centralized control, even if the program is run on several
processors. In the PDP model, the internal factors being modified are highly decen
tralized. These factors are usually simple coefficients and are difficult to view in
terms of programming. To see this clearly, a discussion of the exact nature of the
components of a PDP system is necessary.
1.2.3 Components of PDP Models
It is easy to see the biological origins of PDP models when viewing their
components. The basis of the biological nervous system is the nerve cell or neuron.
The processing elements or units from above are often called neurons. The point of
connection between neurons is called a synapse. The network interconnections are
the PDP analog to the synapses. These will usually be called connections rather than
synapses. Just as the nervous system is based on neurons and synapses (nervous sys
tem = neurons + synapses), the PDP system is built of units and connections (net
works = units + connections). An implicit part of both the biological and PDP models
is the environment. In the case of an organism, it is the physical environment in
which it exists and from which it receives sensory input to the nervous system. In the
case of PDP, it is the "problem space" from which the model receives its inputs and
to which it conveys outputs. From units and connections between units, PDP net
works are built.
6
Self connection
from output of i to
input of i
The microscale function of a unit is to receive inputs and generate an output.
Units receive inputs from three sources; from the output of other units, from its own
output, through a weighted "self connection" or feedback, and from the environment,
called external inputs. While the inputs from other units are time varying, the exter
nal inputs can be fixed or time varying. When fixed, they serve as constant biases.
The function which maps a unit's input to its output is the activation function. The
activation function can be linear, as in the case of the WidrowHoff LMS filter [19],
but is usually a nonlinear limiting function. This function generally sums weighted
versions of its inputs and then passes the sum through the linear or nonlinear func
tion. The activation function can be time invariant or changing.
7
The timing with which the function is applied to inputs effects the dynamics
of the network. This factor is called the activation update style and, hence, the appli
cation of the activation function to the summed input is called a unit update. The
update style is called synchronous when all units update at the same time or asyn
chronous when each unit updates without regard to timing for the other units. The
asynchronous style can be deterministic (can follow a set order for when each unit
will update) or probabilistic (random order for updates). The value of the unit's out
put, a scalar value, is called the unit's activation. The state of the entire network is
quantified at a given time by the vector comprised of the activations of each unit the
activation vector. Although there are many variations on the way unit activation is
determined, it is usually some form of simple processing that is performed by a unit
to generate an activation for a given input.
At the "local" level the interconnections determine the inputs each unit
receives. These interconnection are the analogs to the synapses of the nervous
system [7]. Each connection has associated with it a scalar value or weight by which
the output of a unit is scaled before it becomes the input to another unit or to itself.
Units that are connected have nonzero magnitude weights between them while units
that are not connected have zero magnitude weights between them. In addition to
magnitude, weights have a sign associated with them. The sign determines whether
the connection is inhibitive is subtracted from sum of inputs or excitatory is added
to the sum of inputs. The magnitude determines the strength of the inhibitive or exci
tatory force. Self connections are weighted connections from a unit's output to its
input and are a type of feedback term. Several terms are used to describe connection
8
types. A directed connection occurs when the weight from unit i to unit j has non
zero magnitude but the weight from unit j to unit i is zero. The connection is undi
rected when the weight between i and j and the weight between j and i are nonzero.
If the two weights are nonzero and equal then the connection is symmetric.
When viewed globally the connections between all units can be represented in
matrix form called the weight matrix. This matrix stores the "knowledge" of the net
work and affects its ability to solve a given problem. It is the weight matrix that must
be varied based on the required response of a network to its environment. Weights are
determined either through the process of learning described briefly above or by
design. Design takes place outside the network environment and is a metafunction
which fixes the weights before the network is applied to find problem solutions.
Learning actually occurs within the problem environment in a special learning phase
during which weights are assumed to be modifiable. After the learning phase, which
involves the application of special learning rules to the network, weights are fixed
and the network can then be used for finding problem solutions.
Viewing the entire network globally, both neurons and connections, gives rise
to general architectural concepts. Networks can be described by the connection styles
shared by all units. If all units have directed connections the network is called a for
ward feed network. The initial activation proceeds in one direction from input to out
put. Units in these networks are specialized into "layers." Generally one layer
receives the inputs and passes them forward to the next layer. One layer passes out
puts from the network. Between the input and output layers are "hidden" layers. The
forward feed network is analogous to a finite impulse response (FIR) filter and, in
9
fact, a FIR filter can be constructed from a forward feed network [14]. If the connec
tions are undirected and symmetrical the network is called a recurrent network. Units
are not specialized as in the forward feed case. The input units are the same as the
output units and the "hidden" units [16]. The architecture has a feedback element
analogous to a infinite impulse response (HR) filter.
Outputs
Forward Feed Network
Figure 1.2
Network architecture forward feed and recurrent
10
Although there is a wide range of variation, PDP models are found to be quite
simple when viewed either locally or globally. The behavior resulting from these mod
els can be highly interesting and complex even for simpler versions as will be shown.
11
2. Hopfield Neural Networks
The Hopfield model is a simple form of a recurrent PDP model. Recalling the
components inherited from these PDP networks, we now focus specifically on the
Hopfield style network. The activation function common to each unit is a limiting
nonlinearity. This results in convergence of each unit's activation to a stable state. At
some time, t > 0, the activation of each unit settles to one of two values called max
and min. Analysis of the dynamics of this convergence comprises the bulk of this
study so details will be saved for later.
Several types of nonlinear functions are proposed by Hopfield and others in
[7], [8], [17]. The binary activation function outputs max or min based on the sum of
inputs to a unit. If the sum is positive the unit's output is max and if the sum of inputs
is negative the unit's output is min. If max and min are assigned the values 1 and 0 the
binary nature is clear. The definition of the binary activation function result in a dis
continuity at input equal to zero.
The next type of activation is the threshold activation which is the same as the
binary function when inputs are either large negative values or large positive values.
The difference occurs in the region near zero where the threshold function is linear.
Although the function is first order continuous, higher order discontinuities occur in the
transition regions where the function goes from its linear value to its saturation value.
The sigmoidal activation function is similar but with a very important distinc
tion. Like the threshold function, the activation value is min for large negative inputs,
max for large positive inputs, and linearly related to the input near zero. Unlike the
threshold function, the sigmoid is only approximately linear in the transition region
12
but it is smooth there. This makes it higher degree continuous. Just like g(x) = e, the
sigmoidal activation function: f(x) ='/2{ 1 + tanhx) e C(R). It is a function with a
full set of continuous derivatives. The region in which/(x) is approximately linear,
from the negative transition region to the positive transition region, is defined by
Y < x < y, where y is the "gain" of the sigmoid. In the expression from [1],[8]:
f(x) = y2 (1 + tanh x / x), the xQ term is the gain factor. This particular sigmoid which
will be used as an example throughout varies from max = 1 to min = 0. Figure 2.1
shows this sigmoidal and the other activation functions mentioned. For the sigmoidal
activation function, examples of a high gain and low gain case are shown. It is the
sigmoidal activation function which, because of its high order continuity, lends itself
to concise analysis.
(a) (b)
(c) (d)
Figure 2.1
Activation junctions ofHopfield style networks:
(a) binary, (b) threshold, (c) low gain sigmoid, x0 = 1, (d) high gain sigmoid, x0 = 0.1
13
The connection style of the Hopfield network shows the symmetry inherent in
this model. Weight factors connecting the units are symmetrical resulting in a weight
matrix that is symmetric. Since the individual elements of a symmetric weight matrix
W are related by w.. = w.., the weight matrix and its transpose are equal: W = W .
There is a special constraint on self connections: w{.. Hopfield showed that if
weights of all self connections are set equal to zero convergence to a stable state
where activations are all either min or max is assured [7], [8]. The weights are usually
set a priori by a design method. An efficient learning scheme has not been developed
for simple Hopfield networks.
The global architecture for these types of networks is recurrent. This results in
nonspecialized units. The input is not fed to certain units only and output is not
obtained from another subset of units. The transition from an input state to an output
state occurs over time. Network dynamics, rather than a physical or architectural dif
ferences, distinguishes input from output. A solution, then, is a stable equilibrium
point in the evolving dynamic system defined by network parameters, inputs, and the
nonlinear activation function. Analyzing how the network solves problems amounts
to the analysis of the dynamic system.
2.1 Hopfield Network Dynamics
The term "settling" generally describes the type of dynamics displayed by a
Hopfield network. From an initial state the network progresses through a series of
intermediate states until a final stable state is reached. The network "settles" to a
solution. Understanding this process is central to understanding all aspects of the
Hopfield model.
14
2.1.1 State Space
The vague term "state" of the network must be clarified. Above, the vector
form of activation was defined as the collection of the activation values or outputs of
each unit into vector form. The shorthand for this vector through this discussion will
be act(t). (Note that, generally, variable types will be distinguished as follows: vec
tors will be bold variables that begin with lower case letters like act, matrices will
bold upper case letters like W, and scalars will be italic and lower case like ext.) The
time dependence of the activation vector is shown explicitly here. The activation of
an individual unit, i, is denoted by the scalar act ft). Since time is a continuous vari
able, activation is continuous as well.
The state of activation can be viewed in "state space." For a network with n
units, act(f) is ^dimensional. At time t, the vector activation will be represented by
act(t) E R". Since activation values are actually bounded by max and min, if we set
max = 1 and min = 0 then act(t) E [0,1]". The state space is in effect confined to an n
dimensional hypercube with sides of length {max min). In the "binary" case where
max = 1 and min = 0 the ncube is the "unit" ncube. When reference is made to the
ncube it will be the unit ncube unless otherwise stated. The evolution in state space
occurs in the time interval from tQ to fFina]. The activation at tQ, act(tQ) is referred to as
the initial activation and activation at tFjna is, of course, the final activation.
2.1.2 Update Algorithms
The change in activation from initial to final, through many intermediate
states, is controlled by the network update algorithm. Different Hopfield models use
different update styles from the two categories described above. The original
15
Hopfield model updates each neuron asynchronously and stochastically [7]. It also
used the binary update function. Asynchronous updating is used to avoid wild
changes in the activation vector. The choice of this style was motivated by biological
accuracy. The nervous system has no central time clock to coordinate the updating of
neurons, hence it can be considered asynchronous. This model has been found to be
more stable than some others [17]. The later models of Hopfield and others use a syn
chronous update style [8], [17]. They generally use the sigmoidal activation function
which is continuous and, therefore, more easily realizable in physical systems includ
ing biological ones. Unlike the former style, synchronous updates which are deter
ministic are used making the behavior of the network deterministic. The focus will
remain on synchronously updated deterministic networks.
The dynamic behavior of the network is fully described by a set of nonlinear
differential equations defined by Hopfield. In the notation from above, these equa
tions are:
where/[ ] is a sigmoidal activation such as /(*) = K(l + tanhx), and ext is the exter
nal input to each unit in vector form [1], [8]. The vector quantity net(r) is the total
input to each unit. While act is a function of time, generally, we assume that W and
ext are time invariant. This makes analysis of the system significantly easier. The
n
actj (t) = f ^ wLj actj (t) + extj = f\neti (7)] where
net(r) = W act(t) + ext,
2.1
16
synchronous nature of these update equations is implicit. The activation vector gets
updated all at once rather than one component at a time.
Since the equations are nonlinear, they usually evaluated numerically. The
time steps for numerical evaluation are discrete denoted by At rather than continu
ous and infinitesimally small as implied by dt. The discrete form of Equation 2.1 is:
So, a general set of discrete update equations will look like:
act(f + At) = act(r) + Aact(/) where
Aact(r) = /[net(r), act(r), max, min, At]
with net(t) defined in Equation 2.1. For the sigmoid activation function:
f(x) = y = K(l+ tanhx) so we get x = ]41n(y/ly) and from [1]:
The term A act ft) defines the change in each component of the activation vector for a
single discrete time step. This change in activation for a given unit i can be defined in
terms of the sigmoidal relationship or in other terms. One relationship specifically
2.2
neti(t) = y2 In UCt ^ therefore
,w [lacti(t)_
A acti (t) = At actl (/)(l acti (t))neti (t)
2.4
17
designed to update activations discretely is called interactive activation. Change in
activation of unit i is defined by:
A act ft) = rj
<
netft)[max
netftfactft)
act fit))
min)
if net > 0
if net < 0
2.5
where r is the step size factor and is analogous but not the same as time step size At
[111 [21]
Because interactive activation update equations can be easily applied in simu
lations of Hopfield network dynamics, we focus on them for a moment. The simula
tions in this paper will use interactive activation, so it is useful to show the similari
Figure 2.2
Change in activation versus activation for sigmoidal and interactive activation
update equations (for min 0 and max = 1 and r = At)
18
ties between it and sigmoidal activation updating. Note from Figure 2.2 that
A.actft)/A.t with respect to act ft) for the sigmoid and interactive activation cases is the
same in the region near max or min for max = 1 and min = 0. If we look at the sur
faces generated with actft+ht) (or actft+r]) for interactive activation) as a function of
act ft) and net ft), the diagonal cross section is similar and sigmoidal in shape (Figure
2.32.6). The cross section figures also show that near the extremes of actft+At)
(zero and one), the updated activation, the behavior of the two different activation
functions is similar. The step size T] in interactive activation and the time step At in
sigmoidal activation serve the same purpose; to limit the size of the changes in acti
vation to small incremental changes.
Interactive activation surface for net = 10 ...+10 on xaxis, act(t) = 0... 1 onyaxis
and act(t+r\) on z axis with r\ = 0.1
(The dark line is act(t+r\) for act(t) = H20 net(t) +1/2)
19
ac/(/+r)
Figure 2.4
Cross section of interactive activation surface from Figure 2.3 for
act(t)=l/20net(t)+l/2
Figure 2.5
Sigmoidal activation surface for net = 10 ...+10 on xaxis, act(t) = 0 ... 1 onyaxis
and act(t+At) on z axis with At = 0.1.
(The dark line is act(t+At) for act(t) = 1/20 net(t) +1/2)
20
ac/(MAt)
Figure 2.6
Cross section of sigmoidal activation surface from Figure 2.5 for
act(t)=H20net(t)+l/2
2.1.3 The Linear Dynamics Approximation
Although the dynamics of the Hopfield network are nonlinear, a good deal
can be learned from the analysis of a linear system that approximates the nonlinear
one. From Equation 2.1, a linear estimate of dact(t)/dt can be made:
d*C^Ydt = act'(0 ~ dnet^%t= W' act(0 + ext = net(0 or
Aact(/) = At net(r)
2.6
The continuous time equations can be used to estimate the state of the network at any
time for a given initial activation, act(?0) and are very useful for analysis. The expres
sion defines a simple set of linear differential equations that has a straightforward
analytical solution since W, the coefficient matrix, and ext, the "driving function,"
are constant with respect to time [11]. For a full determination of linear dynamics, the
analytical solution is necessary but simpler techniques exist for visualizing the linear
system's solution.
21
The simplest technique for examining time invariant linear systems is eigen
system analysis. The eigenvalues and eigenvectors of the weight matrix define a
dynamical flow field which is an estimate of the time evolution of the activation vec
tor. This flow field is symmetrical with respect to the origin for a system when there
is no driving function. This is the case for a homogeneous system (dx/dt = A x). The
external input skews the flow field. The solution to the nonhomogenous system
(dx/dt = A x + b) is not symmetrical the origin is offset. The motion of the linear
system in state space follows the flow field, so characterizing the field determines
dynamic behavior of the system. How the eigensystem of a specific type of weight
matrix determines this flow field is taken up later.
Since the object of the network is to settle to a stable equilibrium point repre
senting a problem solution, the analysis of the equilibria and their stability is most
important. For the nonhomogeneous system defined by Equation 2.6 there is one
equilibrium point, act(t). If we assume that W is nonsingular (it does not have zero
for an eigenvalue) then the equilibrium point is defined by:
act'(r) = 0 = W act(r) + ext so
act(r) = W_1 ext 2 7
The equilibrium point is stable if, and only if, all of the eigenvalues of W are in the
left half of the complex plane; meaning that all real eigenvalues must have a negative
sign and all complex eigenvalues must have negative real parts. Since Hopfield net
work weight matrices are symmetrical, all eigenvalues will be real so that eliminates
22
the second case. In general, weight matrices are not negative definite so positive real
eigenvalues are present. This means that the equilibrium point will be unstable so the
existence of positive eigenvalues means the linear system is unstable. The stable
equilibria that characterize solutions cannot be determined looking only at linear
dynamic approximation.
The linear approximation does determine the direction in which the activation
vector will flow towards a stable equilibrium point. Recall that activation is bounded
by max and min, and that the unstable dynamic motion of the linear system is limited
by this at the edges or surfaces of the rccube. Within the ncube, however, the linear
system approximates the behavior of the nonlinear "neural" dynamics.
2.1.4 Analysis of Nonlinear Dynamics
To actually determine the stable points of the network, nonlinear analysis
must be done. The behavior for a nonlinear system can be "drastically different" than
the behavior of a linear system [11]. Unlike the analysis of the simple linear system
outlined above, general analytical solutions to nonlinear systems are difficult to find.
Luckily, in this case, we are primarily interested in the stability of the nonlinear sys
tem a subject well researched. Good definitions of equilibrium points and stability are
necessary. An equilibrium point of the nonlinear system is defined by:
act'(r) = 0 = /[act(r)] where/[ ] is the nonlinear function. Once the activation vec
tor is at act(f), it will stay there since the change in activation there is zero. The equi
librium point is stable if there is a region around the equilibrium point in which the
activation vector will remain. An activation vector slightly disturbed from an equilibri
um point should either return to the equilibrium point, or not drift arbitrarily far from
23
it.
For a symmetrical weight matrix with zero self connections, the equilibrium
points are "corners" of the ncube. A corner state occurs when the activations for all
units either equalRiax or min or in the special case we are considering
actco^r ~ {act e R" I ach = 1 or o}. This was proven by Hopfield [7], [8] but it does
make sense intuitively. Since the one equilibrium point that is inside the ncube is
unstable for weight matrices with positive eigenvalues, the activation vector grows
arbitrarily large limited only by the sigmoid nonlinearity. The other options for equi
libria are the surfaces, edges, or vertexes (corners) of the ncube. The only stable
points in this set are the corners because only here has the activation of all units
reached it greatest or smallest possible value. The conditions for convergence to sta
ble equilibria at the surface or edge of the ncube are outlined by Abe [1]. For the
systems presented here the stable equilibria will only be corner states.
There are two techniques for characterizing the stability of equilibria in non
linear systems indirect methods and direct methods. The system can be "linearized"
for a small region around the equilibrium point. Once this is done the stability of the
resulting linear system can be determined by its eigenvalues. If the region is small
enough, the behavior of the nonlinear system is the same or close to the behavior of
the linear system. This is called Liapunov's indirect method and it is used by Abe [1].
The behavior of the multivariate nonlinear system can be a approximated by
a function of a single variable. The so called "summarizing" function gives an indica
tion of the nonlinear system's dynamics while suppressing the details. The summa
rizing function is not a full solution to the system but an indication of some of the
24
characteristics of the solution. This method of using a summarizing function on a
nonlinear system is called Liapunov's direct method and the particular summarizing
function used to determine stability characteristics is called a Liapunov function. A
Liapunov function must be continuous, have a unique minimum in some region about
an equilibrium point, and it must be constantly decreasing. Proof of the existence of a
Liapunov function for an equilibrium point implies that the equilibrium point is sta
ble by Liapunov's theorem [11]. The energy function defined by Hopfield is a
Liapunov function for all equilibrium points of Equation 2.1 [7], [8]. The existence of
an energy function with minima at the corners of the ncube implies the stability of
the corners by Liapunov's theorem.
The linear dynamics of Equation 2.6 approximate the dynamic movement of
the activation vector within the ncube but it is the "limiting" behavior of the nonlin
ear system (Equation 2.1) that determines the stability and existence of the final acti
vation values. This leads to two different ways of visualizing the neural dynamics of
the Hopfield network. Both ways mask some of the details of network operation but
are necessary to develop reasonable analysis and design techniques. The eigenspace
perspective uses global linearization (as opposed to the local linearization of
Liapunov's indirect method) under the assumption that the eigenspaces of the linear
system direct flow of the activation vector towards stable equilibria. However, no sta
ble equilibria exist in the linear system. The details of convergence to comer states
are not available. The energy function approach uses summarization assuming that
the scalar value E, energy, indicates the behavior of the activation vector, act. The
stable equilibria are found at the points where energy is a local minimum, dEldt = 0.
25
The details of the activation value that corresponds to the equilibrium point and the
path which act took to reach the equilibrium are not available using this method. The
technique that gives a better picture of network dynamics is the eigenspace technique
although it is just an estimate of the dynamics.
2.2 Applications of Hopfield Networks
Before proceeding, a brief introduction to the applications of Hopfield net
works is necessary. Some of the terminology describing specific applications is used
to clarify analysis techniques below. The first proposed application of the Hopfield
model was as a content addressable memory [7], [8]. In this case, the input to the net
work is an incomplete or inexact "memory." The desired output is the complete, exact
memory. The network uses the input to find the "closest" match and outputs that
match. The output is found using the content of the input, not by referencing the loca
tion of the memory as in a conventional memory storage systems. In this type of
application memories correspond to local minima of the energy function. These mini
ma are final stable activations of the network. So the term "closest," used loosely
above, means the final activation (output) that is the minimum euclidian distance
from the initial activation (input) is chosen. The flow field directs initial activation to
the nearest stable equilibrium point. In this case, the number of distinguishable local
minima is proportional to the storage capacity of the network so the existence of
many local minima is a positive characteristic.
Hopfield networks can also be applied to optimization problems. Of particular
interest are constraint satisfaction problems [15]. A network designed to solve such a
problem is attempting to settle to a final activation subject to specific constraints on
26
the set of final activations. A simple example would be to find a final activation sub
ject to the constraint that one unit is on and all others are off. (A unit with activation
equal to max is "on" and a unit with activation equal to min is "off".) This is the "win
ner take all" (WTA) problem. If more than one state satisfies the constraints, which is
usually the case, the initial activation determines which state is selected. So the solu
tion to a problem is the output that satisfies the constraints and is optimal best in
some sense for the given input.
There are several subclasses of constraint satisfaction problems to which the
Hopfield network has been applied [9]. The linear type of optimization problem is
characterized by final states that have equal "energies" (defined below in Equation
2.8). They are in effect the same types of local minima required for content address
able memory. Again, the final state that is desired is the state that is closest to the ini
tial activation. In addition, the final state must satisfy the problem constraints.
Closeness is defined in the same way as above. Examples of linear optimization
problems are the WTA problem, described above, the ^winner problem, and the
assignment problem, analyzed below. The ^winner problem is the same as the WTA
problem but states with k units on and nk units off are sought.
Networks have also been applied to NPcomplete problems. Generally, the
final states of this type of network have unequal energies. This means, in addition to
satisfying the constraints of the linear problem, the final state should have the lowest
energy in global sense. The best solution should be close to the initial activation but,
more importantly, it should be lowest in energy. In this case, the existence of many
local energy minima causes a problem. An initial activation close to a local minimum
27
will be drawn to that minimum and have no chance of reaching the global minimum.
The basic Hopfield model has difficulty with this particular problem so several modi
fications have been proposed and tested [17]. Since the focus here is on the basic
model these will not be discussed. Examples of NPcomplete type optimization prob
lems are the 01 knapsack problem and the travelling salesman problem (TSP)
described in [2], [9], [12], [20].
2.3 Motivation for Design of Hopfield Networks
To get a network to solve a given problem we must determine the weight
matrix, W, and external input, ext, that result in desired stable states of activation. In
general, these factors can be "designed" or "learned." The method which is most
practical varies from network to network.
Methods which use learning have several advantages. Learning is "automat
ic." Once the learning rule is established the weights are adjusted to give desired
behavior. This is useful when intuitive rules for weight determination are difficult.
The adjustment to the weights is calculated by comparing desired output to actual
output (final activation) for a given input (initial activation). In the case of a forward
feed network, the result is that an adjustment or "error vector" that is calculated for
each layer. Since there are few layers the calculation of only a few error vectors is
required. The depth of a forward feed network is usually not more than three or four
layers. The "backpropagation" algorithm is an example of the learning rule used for
forward feed networks.
Learning presents difficulties in recurrent networks, however. Unlike forward
feed networks of few layers, adjusting weights for a recurrent network is computa
28
tionally costly. Where a forward feed network had one error vector per layer, the
recurrent network requires one error vector per update step [16]. Since these steps are
small the number of error vectors that must be calculated is large. This makes train
ing by traditional learning rules too time consuming. For this reason, design methods
are used for recurrent networks such as the Hopfield network.
Design of Hopfield networks for specific problems has been accomplished
several ways. First, sets of intuitive rules for determining connection weights have
been used. These work for simple optimization problems but are not precise enough
for difficult problems. As an alternative, Hopfield [9] outlined an energy function
based design technique. By finding an energy function, E, with minima at the desired
final stable states, a weight matrix can be found using:
n n n
E = ^ W,J actj act, Â£ act, ext,
'=1 j=i m 2.8
This technique can be difficult to apply and adjustment factors that are less than intu
itive are often required for good performance of the resulting designs. Chapter 4 will
introduce an eigenspace based design technique that is applicable to a range of opti
mization problems as an alternative to the energy function based technique.
2.4 Problem Statement
The object of this study is twofold. In the next two chapters the following
questions are answered: Can eigenspace domain techniques be used to gain a clearer
understanding of network dynamics? If so, can the insights gained in analysis in the
eigenspace domain be used to formulate a method for designing networks to solve
new problems?
29
3. Eigenspace Analysis
Finding a method for visualizing and explaining the dynamics of the Hopfield
network is a necessary prelude to formalizing a design technique. Since the chapter
covering design is based on the eigenspace view of network dynamics, the first step
is to fully layout the techniques of eigenspace analysis that were mentioned in
Chapter 2. The analysis techniques developed are applied to a reasonably well under
stood sample problem, the assignment problem.
The chapter is organized in the same manner as outlined by Wolfe [20]. First,
problem modeling is introduced. The general focus is on optimization problems of
the linear type. From the problem model, a weight matrix is either given or can be
constructed using simple intuitive rules. Wolfe showed that many linear optimization
problems can be solved by weight matrices with homogeneous symmetry. This type
of weight matrix is introduced and its properties are outlined. The concept of "weight
distribution" and its relationship to the homogeneous weight matrix is defined. Once
the weight matrix is defined, its eigenspace characteristics can be quantified via the
symmetric eigenvalue problem (EVP). Recall that one of the defining characteristics
of the Hopfield model is a symmetrical weight matrix. In the case of a homogeneous
matrix, solving the EVP is equivalent to finding the Fourier transform of the weight
distribution. This means transformation from state space domain to eigenspace
domain is well defined. Next, the nonlinear dynamics must be addressed in the form
of "stability" analysis. The question is, given the weight matrix and the external
input, which states will the nonlinear dynamics cause to be stable. Wolfe [20]
addressed stability before looking at eigenspace dynamics. Here, we address it after
30
the problem has been transformed to the eigenspace domain to gain added insight.
Finally, the performance of the network can be determined and explained based on its
characterization in the eigenspace domain.
Figure 3.1
Summary of eigenspace analysis
31
3.1 Problem Modeling
Modeling in this case is attaching significance to the various parts of the gen
eral Hopfield model described above. The network has architectural characteristics
and dynamic characteristics both of which define a particular problem or instance of
a particular problem.
3.1.1 Architecture
The number, layout of and connectivity between units is defined by the prob
lem. The actual number of units is usually based on some physical factor, usually the
size of the input vector. Since the Hopfield model is recurrent, the number of input
units equals the number of output units. In fact, they are the same units because "out
put" and "input" are only defined in terms of the time evolution of the network. In
terms of linear algebra the network is an operator which maps from R"  R" where n
is the number of units.
Another characteristic given to the network by the problem layout is "dimen
sionality." A one dimensional network can be pictured as a number line or single axis
of independent variables with a unit at each variable. If there are n increments of the
independent variable there will be n units in the network. The two dimensional net
work is like a grid or two axis coordinate system with a unit at each grid point. A
relationship of the form y =/(x) exists between two sets of the units. There is a set of
independent variables and a set of dependent variables. With n increments of the
independent variable and m increments of the dependent variable there must be a grid
of nxm units in the network. Usually two dimensional networks will be considered
to have an equal number of independent and dependent variables so the number of
32
2
units will be n x n or n '. The notation n* will be used to refer to the number of units
2
in a generic Hopfield network so n* = n (for a one dimensional network) or n* = n
(for a two dimensional network). Most observations that apply to two dimensional
networks can be simplified to apply to one dimensional networks. The WTA net
work, which chooses one unit from n units, is an example of a one dimensional net
work. The assignment problem network, which chooses n units from a grid of
(HKHKHK)
One dimensional network n = 6
Dependent
variable
(KKH)
CX>C^O
KKH)
Independent variable
Two dimensional network n = 5
Figure 3.2
One dimensional and two dimensional network architecture
n units, is an example of a two dimensional network.
In some networks the weights are defined by the problem model. On occa
33
W Weight matrix
w^j Element of weight matrix
act(/) Activation vector
actjfj) Element of activation vector
ext External input vector
extf Element of external input vector
n Number of units in a single dimension
* Total number of units
(n for one dimensional network or
n^ for a two dimensional network)
Figure 3.3
Summary of network notation
sion, the weights can take on a "physical" interpretation. The network that solves the
travelling salesman problem is an example [9]. The basic objective is to find the
shortest path between all the "cities" or units. In this problem model, weights corre
spond to physical distances between cities. The weights function to inhibit cities that
are more distant and favor cities that are closer. The use of weights as part of the
problem model is an exception to their normal use as design factors.
3.1.2 Activations
Since the Hopfield network settles to problem solutions over time, two partic
ular states have direct significance in problem modeling the initial state and the final
state. The initial state, act(tQ), is usually given a straightforward physical interpreta
tion. The magnitude of the initial activation of unit i is in most cases: 0 < actft0) < 1,
where min = 0 and max = 1. (Although it is possible to envision a network where
34
act ft0 or 1.) If each unit initially signifies some proposition the closer its activa
tion is to max the stronger the likelihood of the proposition. It can also be an encoded
input of some form based on the problem model. For WTA or the assignment prob
lem, the initial activations are input values normalized to between 0 and 1. The WTA
network picks the largest initial activation. For the assignment problem, each inde
pendent variable can be thought of as a time and each dependent variable a resource.
Initial activations represent the preferences of the assignment of a time to a resource.
The closer the unit is to 1 the stronger the preference and the more likely the time
will be assigned to the resource. The constraints make it possible to assign only one
resource to a given time slot.
The final activation must be a solution to the problem. Comer states or binary
states are the stable states of the Hopfield network. While the initial activations are
continuous, 0 < actftf) < 1, the expected solution must be in binary terms, actftFia^
= 1 or 0. The final states are deterministic given a W, ext, and act(?0). The network
should always reach the same final state when these factors are constant. In addition
to being stable, a characteristic of the network dynamics, the states should be feasi
ble. This means they should be of the form of a solution to the problem as defined by
the problem model. For an optimization problem this means that the solutions should
satisfy all constraints. Note that not every initial activation should necessarily result
in a stable feasible state. In some cases, a solution for a given set of inputs should be
ambiguous according to the problem model. The feasible final states of the WTA
problem are states with one unit on and all others off. In the WTA problem, an
ambiguous state should result when two or more initial activations start at the same
35
value. For the assignment problem, the feasible states are those with one unit on per
row and unit on per column. This corresponds to the assignment of one resource
(dependent variable) to a given time slot (independent variable). Feasible states of the
assignment problem can also be viewed as permutation matrices of size n x n.
3.1.3 Definitions of Performance
Measuring performance is measuring the accuracy with which a network sim
ulates the problem being modeled. Does the network settle to stable "final" states?
Are the final states solutions to the problem in all cases? Are they the best solutions?
These questions will be taken up at the end of the chapter but, since the terminology
of performance is used throughout, some definitions are necessary.
Final states should be stable, feasible, and optimal. Stability is defined in
terms of network nonlinear dynamics. If there are no activations which correspond
to stable equilibria, then no "final" states exist. If all inputs that have unambiguous
solutions settle to final states that correspond to solutions, then the network exhibits
feasibility. Feasibility is a test of how well the network simulates the problem. In the
case of an optimization problem, there are many solutions which fulfill the con
straints but usually only one solution that is "best" for a given set of initial condi
tions. A network exhibits optimal behavior if the final state to which it settles satis
fies the constraints and is the "best" solution with respect to the problem model. In
the case of the WTA problem, the optimal solution is the state with the unit with the
largest initial activation on and the rest off. It is feasible because one unit is on and
the rest are off. This is the constraint which must be fulfilled. It is optimal due to the
fact that the particular state selected has the unit with the largest initial activation on
36
and all others off. For the assignment problem the final state shown in Figure 3.4 is
feasible because it is a permutation matrix (one unit on per row per column) and opti
0.76 0.63 0.93 0.66 0.04 0.34 0.33 0.31 0.82 0.17
0.96 0.90 0.78 0.32 0.04 0.97 0.28 0.78 0.35 0.69
0.35 0.47 0.34 0.12 0.33 0.85 0.10 0.93 0.29 0.35
0.33 0.24 0.72 0.04 0.79 0.34 0.17 0.38 0.27 0.91
0.72 0.68 0.51 0.59 0.10 0.99 0.95 0.57 0.34 0.31
0.91 0.04 0.28 0.01 0.52 0.72 0.83 0.20 0.32 0.26
0.20 0.47 0.18 0.42 0.99 0.95 0.57 0.43 0.90 0.73
0.55 0.09 0.74 0.91 0.43 0.03 0.55 0.52 0.77 0.76
0.31 0.65 0.77 0.58 0.56 0.26 0.54 0.04 0.15 0.76
0.47 0.12 0.93 0.98 0.11 0.28 0.46 0.98 0.56 0.99
(a) l
0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0
0 1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
(b)
Figure 3.4
Optimal state for assignment problem for n = 10.
(a) initial activation, (b) final activation from Wolfe [21]
37
mal because the sum of the initial activations of the on units is larger than that of any
other permutation matrix.
The goal of problem modeling is to be able to intuitively see why a certain
network can solve a certain problem. Elements of the network are given significance
in terms of the problem and the performance of the network is measured in terms of
problem solutions. This thought should remain in the background throughout the dis
cussion, even though the focus now shifts to the network itself.
3.2 Weight Matrix
The weight matrix shapes the dynamic characteristics of the Hopfield net
work. Although, in some cases, the weight matrix is defined by the problem model,
normally it must be derived by some rule or design process. Some explicit require
ments for the weight matrix do exist. The general model assumes a symmetrical
weight matrix with zero magnitude self connections. Hopfield showed that these con
ditions are sufficient to force the network to converge to stable states which are cor
ners of the ncube [7], [8]. Given these requirements we must establish weights which
allow convergence to the comer states which are feasible solutions to a given problem.
3.2.1 Choosing Weights by Intuition
In cases where the required behavior of the network is easy to visualize, there
are simple rules for weight matrix construction. The physical significance of each
connection can be stated in terms of excitation and inhibition. Two units that can
have the same final activation (either 1 or 0) should excite each other. Excitation
translates to a positive signed connection between the units. Two units that should
have different final activation should inhibit each other. Inhibition translates to a neg
38
ative signed connection. If the final activations are unrelated there should be no
force, inhibitory or excitatory, between the units. Neutral force corresponds to a zero
magnitude connection. The magnitude itself corresponds to the relative strength of
association between units the greater the magnitude the greater the inhibition or
excitation [3].
These guides translate to specific rules. One such rule, called mutual inhibi
tion, is of interest here because it easily translates a constraint satisfaction problem to
an appropriate weight matrix. A mutually exclusive constraint occurs when the fulfill
ment of one premise leads to the exclusion of other premises. This is the same as say
ing, if the unit corresponding to that premise is on, all units related by mutual exclu
flC/j = 1
(a)
ac*2 = 0
act i = 1
(C)
act2 =1
acti = 0
(b)
act2 = 1
acti = 0
(d)
act2 = 0
Figure 3.5
A network with two mutually inhibitory units
Cases (a) and (b) are stable while cases (c) and (d) are unstable.
39
sion must be off. The mutual inhibition rule says to assign negative weights between
units that are mutually exclusive. The result for a simple case of two units connected
by negative weights is shown in Figure 3.5. There are two stable states according to
the activation update rule from Equation 2.5. One when unit 1 is on and unit 2 is off
and the other when unit 2 is on and unit 1 is off. (Note that self connections are zero
and the external inputs are adjusted to cause the values of net. indicated in the figure.)
The case with both units on is unstable. The case with both units off is marginally sta
ble but indeterminate because neither unit is on to inhibit the other.
Mutual inhibition can be used to construct a WTA weight matrix or an assign
ment problem weight matrix quite easily. The desired feasible states for a WTA net
work are those with one unit on and all others off. The mutual inhibition rule sug
gests inhibitory connections between all units. The WTA weight matrix that results
has zero self connections, w.. = 0, as required, and all other connections, wi. = a for
i ^ j as shown in Figure 3.6. For the assignment problem, the mutually exclusive con
straint is that only one unit per row per column can be on and all other units in that
row and column must be off. Since the network is two dimensional, the weight
matrix is n x n. When the mutual inhibition rule is applied, a matrix of the form
shown in Figure 3.6 results;. It has zero magnitude self connections, (/^,
between unit l,m and unit l,m, inhibitory connections of a between the units in the
same row or column, between unit l,m and unit p,q iilpormq (but not
both exclusive or), and zero magnitude connections elsewhere. The zero connections
"elsewhere" imply that there are no constraints between units not in the same row or
column.
40
i i 0 1 2 3 4 5 6 7 8 9
0 0 a a a a a a a a a
i a 0 a a a a a a a a
2 a a 0 a a a a a a a
3 a a a 0 a a a a a a
4 a a a a 0 a a a a a
5 a a a a a 0 a a a a
6 a a a a a a 0 a a a
7 a a a a a a a 0 a a
8 a a a a a a a a 0 a
9 a a a a a a a a a 0
WTA weight matrix for n = 10
l,m p>q 0,0 0,1 0,2 1,0 1,1 1,2 2,0 2,1 2,2
0,0 0 a a a 0 0 a 0 0
0,1 a 0 a 0 a 0 0 a 0
0,2 a a 0 0 0 a 0 0 a
1,0 a 0 0 0 a a a 0 0
1,1 0 a 0 a 0 a 0 a 0
1,2 0 0 a a a 0 0 0 a
2,0 a 0 0 a 0 0 0 a a
2,1 0 a 0 0 a 0 a 0 a
2,2 0 0 a 0 0 a a a 0
Assignment problem weight matrix for n = 3
Figure 3.6
WTA and assignment problem weight matrices
41
3.2.2 Weight Matrix Design
Although simple rules can determine weight matrices that solve relatively inter
esting problems, some problems require more methodical design approaches. The ener
gy function approach to design finds a specific function with minima at the desired fea
sible states. Once this function is found the weights of the network can be determined
using Equation 2.8. This approach has several disadvantages. Like the general prob
lem of finding Liapunov functions for nonlinear systems, it is often difficult to find
the proper energy function. It usually must be found by trial and error, as there are
few formal methods for constructing it. Once it is found, it is often necessary to
adjust the equation with factors known as penalty terms which are not very intuitive.
It is difficult to gauge and improve network performance because the energy function
suppresses much of the detail of the network dynamics.
For the Hopfield model to become more applicable to complicated optimiza
tion problems, a better design method is necessary. The ideal design method would
be more algorithmic and less empirical than the energy function approach. It would
maintain rather than suppress the essential details of network dynamics. In short, it
would be easier to apply. Chapter 4 addresses such concerns but it does so by using
several simplifications to the general weight matrix. These simplifications are
addressed next.
3.3 Weight Distributions
The definition of the Hopfield model includes symmetry factors in the weight
matrix. Now, we look for a subset of Hopfield weight matrices with special symme
try that can be easily analyzed in the eigenspace domain. This type of symmetry
42
allows a partial set of weights, called the weight distribution, to define the entire
weight matrix. The matrix described by permutations of a single weight distribution
that will be focused on is called a homogeneous weight matrix. Networks with homo
geneous weight matrices are called the homogeneous Hopfield models or networks.
The weight distribution is the set of weights seen by a given unit. For unit i in
a one dimensional network of size n, the weight distribution consists of the set of n
weights that connect it to all other units: wdist(z) = {w,.J j = 0...n l}. Note that the
cardinality of this weight distribution is n so vector notation is used: wdjsl(z) = n. For
unit l,m in a two dimensional network of size n the weight distribution is a set of n
weights defined by: wdist(/,w) = p = 0...H1, q 0...nl}. The weight
distribution has cardinality of: wdist = n2, and can take on two forms. It can be
vector of length n2, in which case the notation wdist(/,m) is used, or it can be in the
form of an n x n matrix, in which case the notation Wdht(l,m) is used. Recall that in
the two dimensional case the weight matrix is n x n. A complete set of weight dis
tributions: {wdis; (z) z = 0...n 1} or {wdist (/,m)\ l = 0...n 1, m = 0...n 1} defines
the weight matrix for a one or two dimensional network.
If the weight distribution for each unit is the same, the matrix is homoge
neous. The entire weight matrix can be determined by a single weight distribution of
size n* (n for a one dimensional network or n for a two dimensional network). The
symmetry of this weight matrix is a special form of a Toeplitz symmetry called circu
lant symmetry. The components of the Toeplitz matrix are defined by its first row and
column, as well as, the relationship: T.;. = T._7 j } for i s n,j a 2. Its "weight distribu
tion" the set of weights that defines all other weights is of size 2n 1 [10]. From
43
Figure 3.7, we see that elements along the diagonals are equal. The circulant matrix
defined by a one dimensional weight distribution has circulant symmetry. In this
case, the first row defines the other components of the matrix. The elements along the
diagonal are equal only, here, the diagonals repeat in a circulant manner. Each row is
t9 *8 *7 *6 *5 *4 *3 *2 *1 *0
*10 *9 *8 *7 *6 *5 *4 *3 *2 *1
*11 *10 *9 *8 *7 *6 *5 *4 *3 *2
*12 *11 *10 *9 *8 *7 *6 *5 *4 *3
*13 *12 *11 *10 *9 *8 *7 ^6 *5 *4
*14 *13 *12 *11 *10 *9 ^8 ^7 *6 *5
*15 *14 *13 *12 *11 *10 ^9 ^8 *7 *6
*16 *15 *14 *13 *12 *11 *10 *9 *8 *7
*17 *16 *15 *14 *13 *12 *11 *10 *9 *8
*18 *17 *16 *15 *14 *13 *12 *11 *10 *9
Toeplitz matrix for n = 10
C0 C1 C2 c3 c4 c5 C6 c7 C8 c9
c9 C0 C, C2 c3 c4 c5 c6 c7 C8
C8 C9 Co Cl c2 c3 c4 c5 C6 c7
C7 C8 C9 co Cl C2 C3 c4 C5 c6
C6 C7 C8 c9 c0 Cl C2 c3 c4 c5
C5 C6 c7 c8 c9 Co Cl c2 c3 c4
C4 C5 C6 c7 C8 c9 c0 C1 c2 c3
c3 c4 C5 C6 C7 C8 c9 C0 Cl C2
C2 c3 C4 c5 C6 c7 C8 c9 C0 Cl
C1 C2 C3 C4 C5 C6 C7 C8 c9 c0
One dimensional circulant matrix for n = 10
Figure 3.7
Toeplitz and circulant matrices
44
the same as the row above it, only shifted right one unit (with wrap around). For the
weight matrix the first row is the weight distribution: wdjst(0) = {wQQ, wQ v ... ,wQ nl},
the weights seen by the first unit. The relationship that defines the other weights is:
wij= wdist(0)*: where k = (i j)%n (and % is the "mod" operator). For instance, from
Figure 3.7, wdjst(0) = {cQ, cv ... ,cg} so w35 = wdist(0)8 = cg since k = (3 5)%10 = 8.
The circulant nature of weight matrices with two dimensional weight distribu
tions is somewhat more complicated. These weight matrices are circulant but circu
lant in block form. For annxn weight distribution, the elements within the n x n
"submatrix" blocks are circulant and these submatrix blocks are circulant with
respect to the whole matrix. This block nature is illustrated in Figure 3.8 for n = 3.
For a general n x n weight distribution, in matrix form, given by:
W(0,0),(0,0)
Wdist(0,0) = w(OW)
W(0,0),(n,0)
W(0,0),(0,q) ' W(0,0),(0,n)
W(0,0),(p,q) W(0,0),(p,n)
W(0,0),(n,q) ''' W(0,0),(n,n)
with an individual weight in the weight distribution denoted by wdis((0,0)p =
>,0),^)the other weights are defined by w(y)j(Â¥J) = wdist(0,0)M where p = (i r)%n
and q = (j s)%n. Using Figure 3.8 as an example, ^ ^ ^ = wdjs((0,0)21 = c21 since
p = ( 1 2)%3 = 2 and q = (1 0)%3 = 1.
From the block circulant symmetry it is possible to extrapolate to higher
dimensional networks with n x n x n ... x n weight distributions such that wdjst = nm.
This leads to weight matrices that are nm x nm that have m levels of circulant symmetry.
45
For a two dimensional C0 C1 C2 C3 C4 C5 C6 C7 C8
distribution in vector notation C2 C0 C1 C5 C3 C4 C8 C6 C7
C1 C2 C() C4 C5 C3 c7 C8 c6
c0 c, c3 C6 C7 C8 C0 ci C2 C3 C4 C5
C4 C5 C6 C8 C6 C7 C2 c0 CY C5 C3 c4
C1 CK C9 c7 C8 C6 ci C2 co C4 C5 c3
C3 C4 C5 C6 C7 C8 C0 C1 C2
CS C3 c4 C8 C6 C7 C2 ^0 C1
C4 C5 C3 C7 C8 C6 ci C2 C0
For a two dimensional distribution in matrix notation C0,0 CQ,2 C0,l C0,l C0,0 C0,2 C0,2 C0,l cn,o Cl,0 Cl,2 Cl,l CU Cl,0 Cl,2 C\,2 Cl,l Cl,0 C2,0 C2,2 C2,1 C2,l C2,0 ^2,2 C2,2 C2,l C2,0
C0,0 C0,l C0,2 C2,0 C2,l C2,2 co,o C0,l C0,2 Cl,0 cu Cl,2
Cl,0 Cl,l Cl,2 C2,2 C2,0 C2,l C0,2 C0,0 C0,l Cl,2 Cl,0 CU
C2,0 C2,l C2,2 C2,l C2,2 C2,0 C0,l C0,2 C0,0 cu Cl,2 Cl,0
Cl,0 Cl,l Cl,2 C2,0 C2,l C2,2 C0,0 C0,l C0,2
Cl,2 CX,0 CW C2,2 C2,0 C2,l C0,2 cn,o C0,l
Cl,l Cl,2 Cl,0 C2,\ C2,2 C2,0 C0,l C0,2 C0,0
Figure 3.8
Block circulant matrix for n = 3 two dimensional weight distribution
Here, we will confine our discussion to two dimensional networks.
The assignment problem, with its weight matrix shown in Figure 3.6, pro
vides a practical example of block circulant symmetry. The weight distribution can
be extracted in vector form as the first row of the weight matrix or in matrix form as
the first row partitioned to an n x n matrix. Examples of this are given in Figure 3.9.
3.4 Eigenvalues and Eigenspaces of Homogeneous Weight Matrix
The reason for focusing on homogeneous Hopfield networks becomes clear
46
0 1 1 I 1 0 0 I 1 0 0
(a)
0 1 1
1 0 0
1 0 0
(b)
0 1 1 1 1 1 0 0 0 0 I 1 0 0 0 0 I
1 0 0 0 0 11 0 0 0 0
(c)
0 1 1 1 1
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
(d)
Figure 3.9
Weight distribution for the assignment problem.
(a)n 3 vector form, (b) n = 3 matrix form,
(c) n = 5 vector form, (d) n = 5 matrix form.
when looking at the eigensystems of homogeneous matrices. The general method for
determining eigenvalues and eigenvectors, the nonsymmetric eigenvalue problem
(EVP), is a computationally difficult problem that has be studied extensively [5],
[18]. The most elementary method of finding eigenvalues is by solving for the roots
of the characteristic equations of a matrix: det(W )J) = 0 where det( ) is the deter
minant, W is the n x n weight matrix, I is the n x n identity matrix and X, the set of n
roots of the equation, is the set of n eigenvalues. Backsubstituting the eigenvalues
and solving the resulting linear system yields the eigenvectors: (W XI)v = 0 where
47
v is an eigenvector. The set of eigenvalues and eigenvectors form the eigensystem
which is expressed in the familiar relationship: Wv; = Xv. where v. and X. are the ith
eigenvector and eigenvalue or, in matrix form, WM = AM where M is the modal
matrix which has the eigenvectors as its columns M = [vp v2,... ,vj and A is
[Xj, X2,..., XJ x I defined in [11] and shown in Appendix A.
A more sophisticated technique of solving the EVP involves finding matrix
similarity transforms. The concept is to transform the general matrix into a matrix
with eigenvalues that are easily calculated or determined by inspection. Forms of
matrices with easily calculated eigenvalues include upper and lower triangular matri
ces or diagonal matrices which have eigenvalues that are simply the values of the
diagonal elements. The transformation equation is of the form: XlAX = B where X
is the transforming matrix (X_1is its inverse), A is the matrix with unknown eigenval
ues and B is the matrix with easily calculated eigenvalues. Matrices A and B that sat
isfy this relationship are said to be similar and similar matrices have the same eigen
values [18]. This method will be used in a proof found in Appendix A.
The symmetric EVP (finding the eigenvalues of a symmetric matrix) is quite
a bit easier than the general EVP. Symmetry can be exploited to make the similarity
transformation easier. The transforming matrix does not have to be inverted, rather
the relationship, X AX = B can be used.
For circulant matrices the discrete Fourier transform (DFT) of the weight dis
tribution immediately yields the weight matrix eigenvalues. This is shown for one
and two dimensional weight distributions in Appendix A. Using WQ = QA where W
is an n x n circulant matrix defined by first row of the weight matrix wdist(0) =
48
a = [a0, av ..., an ,], Q is an n x n modal matrix composed of the complex roots of
unity, and A is as defined above, it is shown that:
DFT[a] = a Q. = Aw where Aw = [/L0,...,^2
is a vector of the eigenvalues of W. The two dimensional case, for block circulant
matrices, differs only in that the weight distribution, A, is a n x n matrix (or an n
vector, a2D) and the transform used is the two dimensional DFT as described in
Gonzalez and Wintz[6].
The DFT describes the entire eigensystem of the homogeneous weight matrix.
The Fourier transform components from Equation 3.2 are the eigenvalues. The nx n
modal matrix, Qn, has the eigenvectors as columns. The components of this matrix
are described by Qn(i,j) = ooy where to, the complex root of unity is, co = e So:
Q) to0 ft)0
a10 ft)11
n)("1)0
(for a one dimensional weight distribution). This means any circulant weight matrix
of size nxn will have eigenvectors described by Equation 3.3. For a two dimension
al weight matrix, the components of the modal matrix, are Qn(2)[(i,/),(&,/)] =
(o,k+jl. An example of an eigenvector in this case for the k,lth column of is
(with m = n 1):
49
L J 3.4
1 2 2
Again the size of the weight matrix, n x n determines the eigenvectors. (The eigen
values are expressed by Equation A.19.) Since the eigenvectors are the basis vectors
of the Fourier transform, they are called Fourier vectors or (discrete) Fourier
harmonics.
The DFT has several important characteristics. Generally, the DFT maps
R" * C", which means that eigenvalues can be complex scalars. For real symmetric
matrices, however, the eigenvalues are real so the DFT maps Rn > R [18]. The val
ues of the components of the DFT are not necessarily unique. It is this fact that leads to
higher dimensional "invariant" subspaces within the state space of the homogeneous
2( 1) (n 2) ( 2)
(2) 2 2
(n 2) 2 2
(a)
8 3 3 3 3
3 2 2 2 2
3 2 2 2 2
3 2 2 2 2
3 2 2 2 2
(b)
Figure 3.10
Eigenvalues of the assignment problem weight matrix.
(a) For a general assignment problem matrix [20], (b) For n = 5.
50
network that will be described.
The eigensystem defines subspaces of the state space known as eigenspaces.
An eigenspace is a subspace spanned by eigenvectors with the same eigenvalue.
Since the Fourier vectors are orthogonal, a set of m of them spans an m dimensional
subspace. This means that the dimension of an eigenspace is the number of eigenvec
tors in the spanning set or the number of eigenvectors with equal eigenvalues. The
eigenspace characteristics of the homogeneous matrix are the key to understanding
the linear approximation to the network's dynamics.
The eigenvalues of the assignment problem are determined by taking the two
dimensional DFT on the weight distribution matrix shown in Figure 3.9. The out
come of this process is shown in Figure 3.10. The eigenvalues calculated here are
equal to those calculated by Wolfe [20] in a different manner.
3.5 Homogeneous Network Linear Dynamics
Shifting perspectives from the state domain to the eigenspace domain gives a
clearer view of network dynamics. Eigenspace analysis determines the equilibrium
point of the linear approximation (Equation 2.6) to the system. The stability of this
equilibrium with respect to individual eigenspaces leads to insight into which final
states of the network will be stable. As mentioned above, these final states cannot be
fully characterized by the linear approximation because the equilibrium point of the
linear system is unstable. What the linear approximation does show is the push
toward states which are made stable by the nonlinear portion of the network dynam
ics (Equation 2.1).
51
3.5.1 Eigenspace Perspective
The starting point for eigenspace analysis is the linear time invariant system
described by Equation 2.6. It is assumed that the weight matrix, W, and the external
input, ext, are constant with respect to time. This results in system eigenvalues and
external biases which are constant. Aside from making the linear system easy to
solve analytically, this assumption results in a constant dynamic flow field defined by
act'(t) = Wact(t) + ext.
The characteristic which makes analysis of eigenspaces important is that
eigenspaces are invariant subspaces of linear system dynamics. If A G C" x (i.e A is
a matrix of complex values) then it defines a linear transformation from C"  C". A
subspace Â£ of C" is an invariant subspace of A if A x G Â§ whenever x G The
effect of this is that, if a state vector is in a invariant subspace  it remains in that
subspace when acted upon by A. Using the definition of eigenspaces, we can show
that eigenspaces are invariant subspaces. An eigenspace, Â§ES, is spanned by a set of
eigenvectors with the same eigenvalue, k'. So a basis for ^ES is the set of eigenvectors
defined by span[^ES] = {v G C" (W AT)v 0}. Since W v = k v for any eigensys
tem, W x = kr x for x G ^ES. The means for x G Â§ES; W x G ^ES since W x is just
a scalar multiple of x: k' x. Since this holds, ^ES is an invariant subspace of W.
Because of the invariance of eigenspaces of the weight matrix, the eigensys
tem determines the linear motion of the state vector in each subspace. Obviously,
once the state vector is completely within a single eigenspace it will stay in that
eigenspace due to the invariance property. The direction of motion in that eigenspace
is determined by the sign of eigenvalue. The velocity or strength of the motion is
52
determined by the magnitude of the eigenvalue. More interestingly, if the state vector
is not strictly in one eigenspace it can be decomposed into components in each eigen
space. The eigenvalue of each component eigenspace determines the motion of the
state vector in that particular eigenspace. For the eigenspaces defined by a homoge
neous weight matrix, there is an easy way to decompose any vector using the DFT.
All that is required is a change of domain from state space to eigenspace. Whereas
state space was defined in terms of a standard basis, the basis for eigenspace is the set
of eigenvectors or, in the homogeneous network case, the Fourier harmonics.
The motion of the state vector with respect to the "linear part" of the system
can be directly expressed in eigenspace terms. From Equation 2.6 and Wolfe [20] for
a homogeneous weight matrix we find:
Aact(r) = At net(r) = At (W act(r) + ext)
= MWdist*act(0 + ext) 3>5
where Wdjst is the one or two dimensional weight distribution and the operator "*" is
the one or two dimensional convolution as defined in [6] and [13], One detail must be
noted; for the one dimensional network wdjst, act(t), and ext are all expressed in vec
tor form. For Equation 3.5 to be consistent for the two dimensional network, the
weight distribution should be in its n x n matrix form, Wdist, and both act(f) and ext
must be expressed in matrix form. To do this, the n vectors, act(t) and ext, must be
partitioned into nx n matrices. Since we are focusing on two dimensional networks,
act(f) and ext will be assumed to be in matrix form.
The linear system expressed in Equation 3.5 has a direct eigenspace interpre
tation. Convolution in the state domain is simple multiplication in the Fourier or
53
frequency domain. If we denote the transformed versions of Wdis( and act(t) as
DFT[WdiJ = K;st and DFT[act(t)] = act(i) then wdist*act(0 = DFT1 [Wiis( Xact(t)]
where DFT_1[ ] is the inverse discrete Fourier transform. The DFT of the activation
transforms it from state space to Fourier space. The same operation on the homoge
neous weight distribution defines the eigenvalues of W. In the homogeneous case,
the transformation from state space to Fourier space is the same as the transformation
from state space to eigenspace. This is because, as was shown above, the transform
ing matrix for the DFT and the modal matrix for any homogeneous matrix are the
same for a given size and dimension. The product above, WAist x act(t), amounts to
the multiplication of the eigenspace/Fourier components of act(t) by the eigenvalues
of W: act. A... This component by component multiplication defines the linear
motion in the eigenspace domain. When the external input vector, DFT[ext] = e%t, is
added Equation 3.5 is:
A act (t) = At (Wiist Xact(t) + e%t) and in state space terms
Aact(r) = DFT1 [Aact(t)] 3 6
This expression justifies the statement that the eigenspaces completely determine the
linear portion of motion.
The dynamic evolution of the linear system can be characterized for the sepa
rate eigenspaces. If there are m<,m unique eigenvalues then there are m unique
eigenspaces. Each of these eigenspaces is of dimension one or larger, based on the
number of harmonics that span it. The whole linear system is decoupled into m
54
separate and independent linear systems [11]. This means the dynamics in each
eigenspace is independent of the others. This is of great use in characterizing the
overall motion of the state vector in eigenspace and will eventually lead to insight
into the complete dynamics of the network.
3.5.2 Stability of Eigenspaces
The external input is of primary importance in determining the equilibrium
point of the linear system. So far, the effects of ext have not been explored in eigen
space terms. As we did above, we assume that W is nonsingular so there is only one
equilibrium point. Equation 2.7 defines that equilibrium point in state space terms.
The homogeneous linear system (not to be confused with the homogeneous weight
matrix) has an equilibrium point at the origin of state space defined by 0 = [0,0,..., 0]
where  0  = n*. The time invariant driving function of a nonhomogeneous linear
system displaces the origin. That displacement can be viewed in eigenspace just as
the time varying activation vector was. Here the bias in eigenspace is DFT[ext] = e%t.
The importance of this eigenspace bias will be taken up in Section 3.6.2. For now, it is
necessary to realize that the equilibrium point is displaced from the state space origin.
Just as the stability of the whole linear system is determined by the signs of
all of the eigenvalues of the coefficient matrix, the stability of the individual eigen
spaces can be characterized by the signs of their eigenvalues. The linear system can
be separated or decoupled into eigenspaces, each of which has a single distinct eigen
value.The stability of the individual eigenspaces can then be determined based on
this eigenvalue.
Eigenspaces with negative eigenvalues are stable. States in these eigenspaces
55
are attracted to the equilibrium point. Assuming real eigenvalues, which is the case
for symmetric matrices, for eigenspace p with Xp< 0 the change in activation in that
eigenspace will be Aactp{t) = A,poctp(t) < 0 where actp(f) E p is the component of act(t)
in eigenspace p. This expression, of course, ignores the constant bias of ext if one
exists in this particular eigenspace. The flow in these eigenspaces can be visualized
as an inward flow toward the equilibrium point because the solution to dactp(t)/dt =
Xpactp(t) is a decreasing exponential. Eigenspaces with positive eigenvalues are
unstable. States that are in these spaces increase in magnitude without bounds. For
eigenspace a with Xa> 0 the change in activation in o is Aactjt) = Xoacta(t) > 0 where
act (t) E a. The flow in these eigenspaces is outward away from the equilibrium
point because the solution to dactjt)/dt = Xaactc(t) is an increasing exponential. The
magnitude of the eigenvalue determines the strength of the flow either inward toward
equilibrium or outward toward infinity. As noted before, in general, weight matrices
will have both positive and negative eigenvalues so the overall linear system defined
by Equation 2.6 will be unstable. This means there is at least one unstable eigen
space.
3.5.3 Eigenspaces and Stable States of the Network
Analysis of the stability of the individual eigenspaces just described leads to a
better understanding of which states of the network will be final stable states. The
linear system cannot fully describe or explain network stability but it can give insight
into the qualitative composition of final states.
Since eigenspaces with negative eigenvalues are stable, the components of
activations that are in these eigenspaces tend to an equilibrium value. If we assume
56
there is no bias in an eigenspace p that is e%tp = 0 then the components tend to
zero: actp(t) 0 as t  oo. This means all stable states of the network will have com
ponents with zero magnitudes in these eigenspaces.
The usual exception to zero bias assumption is the DC eigenspace where
e%tDC I* 's called the DC eigenspace because it corresponds to the DC Fourier
component. In eigensystem terms, it is the eigenspace that corresponds to K0 or XQ 0.
This eigenspace will normally have a negative eigenvalue but the component of the
activation which corresponds to it cannot go to zero. This is because a zero DC com
ponent corresponds to only one state in the wcube, 0, the origin. This is a rather triv
ial feasible state. Any initial activation given to a network that forced its DC eigen
space component to zero would end up at 0. A nonzero bias in the DC eigenspace
allows other states to be final states. This will be brought up again when discussing
the stability of the nonlinear system that defines network dynamics.
Since eigenspaces with positive eigenvalues are unstable, the components of
activation that are in these eigenspaces tend away from the equilibrium point. States
of the linear system will have larger and larger components in the positive eigen
spaces as time progresses. In fact, these components will grow exponentially:
actjf) ^ 00 as t co for Xa> 0 [11]. This can be seen in the analytical solution exam
ple in Appendix B. The nonlinear network dynamics must limit this exponential
growth to come up with stable states at some t = fFi The details of nonlinear
dynamics are left for later but now it is possible to see that the stable final states of
the network will have nonzero components in eigenspaces with positive eigenvalues.
57
Without even looking at the full nonlinear system, it is clear that the final
states of the network will have zero magnitude components in eigenspaces with nega
tive eigenvalues (or constant magnitude components due to a bias, as in the case of
the DC eigenspace) and nonzero magnitude components in eigenspaces with posi
tive eigenvalues. A further observation is that all weight matrices that display any
thing other than trivial behavior must have some positive eigenvalues. These are the
eigenvalues that force activation away from the equilibrium point of the linear sys
tem. A weight matrix with only negative eigenvalues will just tend to the equilibrium
point for all initial conditions. Since "interesting" weight matrices have positive
eigenvalues, the linear systems that approximate them (Equation 2.6) will always be
unstable. The nonlinear limiting of the activation function is ultimately responsible
for network stability.
The analysis so far has been operating under the assumption of "regional" lin
earity which now must be clearly stated. State space can be divided into two regions:
the first is in the middle of the ncube and the second is "near" the boundaries of the
ncube. The assumption is that the linear dynamics of the network (Equation 2.6)
approximate the overall behavior of the network (Equation 2.1) in the first region,
inside the ncube. The full nonlinear model is only necessary to explain network
dynamics in the second region near the surfaces, edges, and vertices of the cube.
The motion in the first region is characterized by the hyperbolic flow field of the lin
ear system while the motion in the second region is characterized by the limiting or
"comer seeking" behavior of the sigmoidal activation function. For the regional lin
earity assumption to be of any value, it must be true that the region of nonlinear
58
behavior is relatively small. This will be shown in Section 3.6.1. The effect is that by
the time the corner seeking (nonlinear) behavior dominates, the linear system has in
effect chosen the final state.
This is the bulk of the discussion on the background of eigenspace analysis.
The remainder of this chapter focuses on the application of this technique to explain
the feasible states of a given network. First, however, we apply eigenspace analysis
to the assignment problem.
+c 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
(a)
.625 108 h* o 1 00 10~8 10'8
108 441 12280 1517 1049
1CT8 999 441 2034 11980
io8 11980 2034 441 999
10"8 1049 1517 12280 441
(b)
Figure 3.11
Comparison of eigenspace solution and analytical solution (from Appendix B) for the
n = 5 assignment problem.
(a) Predicted solution from eigenvalues from Figure 3.10.
(b) Analytical solution from Appendix B for t = 5.
59
3.5.4 Eigenspace Analysis of the Assignment Problem
The object of this section is to look at the analytical solution to the linear sys
tem resulting from Equation 2.6 with the assignment problem weight matrix Wap
(shown in Figure 3.6) and compare it to the eigenspace qualitative "solution." The
response of a general linear system is characterized by its coefficient matrix, W, initial
conditions, act(t0), and its driving function, the constant vector ext. The full analytical
solution, outlined in Appendix B, is based on the analysis in Luenberger [11].
From the eigenspace vantage point many of the characteristics of the analytical
solution can be seen much more clearly. Figure 3.10 shows the eigenvalues of the
assignment problem weight matrix forn = 5. From this, three separate eigenspaces can
be seen: the DC eigenspace, the "stable" eigenspace, and the "unstable" eigenspace. For
an assignment problem matrix that is n x n the stable eigenspace has an eigenvalue of
(n 2) and is 2(n 1) dimensional while the DC eigenspace has an eigenvalue of
2(n 1) and is one dimensional. The unstable eigenspace has an eigenvalue of +2 and
is (n l)x(1) [20]. As the linear system evolves in time, the stable (and the DC)
eigenspace components will head toward the origin. At this point, we assume that the
driving function is a constant vector in the DC eigenspace such as ext = {b, b,..., b}. In
the eigenspace domain this is e?(t= 1/n {n b, 0, 0,..., 0}. So the eigenspace compo
nents of act(tFina]) should be zero in the stable eigenspace and a constant in the DC
eigenspace. In the unstable eigenspace, the components of act(tFinal) should be large
numbers. If we take the DFT of the output of the analytical solution for t 0, we find
that this is the case; the components in the stable eigenspace are near zero, the compo
nent in the DC eigenspace is a constant, and the components in the unstable eigenspace
are large numbers. This is shown for the assignment problem in Appendix B and Figure
60
3.11.
The decomposition into stable and unstable eigenspaces indicates the content
of final stable states of the network. Since the feasible states of the assignment prob
lem are known, their eigenspace domain components can be compared to eigenspace
content of the states of the linear system when t 0.
3.6 Network Stability
Determining network stability involves finding the final states of activation
that will be stable. Whereas the linear approximation of the system (Equation 2.6)
had no stable equilibria, the system defined by Equation 2.1, the "neural" or network
dynamics equation, can have many stable equilibria. Two major factors contribute to
which states, if any, are stable: the activation function and the driving function or
external input. A look at the activation function shows the limiting nature of the sig
moid and the existence of stable states. This was covered briefly in Chapter 2 and is
not directly related to eigenspace based analysis. The details of this stability analysis
are covered elsewhere [1]. Here the discussion is limited to a justification for the use
of "regional" linearity assumption. In effect, this explains the conditions under which
the nonlinear dynamics are overshadowed by the linear approximation. The external
input plays a large part in the types of states which will be stable. This facet of stabil
ity can be clearly seen from an eigenspace perspective.
3.6.1 Approximation of Nonlinear "Neural" Dynamics
To get an informal justification for approximating the nonlinear dynamics
using the linear dynamics inside the ncube, the sigmoidal activation function,
f{xi) = y2 (l + tanh Xj / x0) is analyzed. The main characteristic of this sigmoid is that
it has linear and nonlinear regions. There is a nearly linear region around the origin
61
0
k .119
.881
Activations which fall within the linear region of the sigmoidal activation function
(inverse of the sigmoid from Figure 2.1)
that can be approximated by f(xi) ~y2(l + x(. / x0) over the range from approximately
xQ < x. < xQ. This is because tanh x = x for x close to 0. There are nonlinear regions
from approximately 2x0 < xj < xQ and xQ < xj < 2xQ called transition regions. There
are constant valued regions for x. < 2x0 where /(x.) = 0 and for x. > 2x0 where
/(x;) = 1. These are called the limiting regions and are due to the fact that tanh x = 1
for x > 2 and tanh x 1 for x < 2.
For states of activation inside the ncube, the linear approximation for the sig
moidal activation function is used. From Equation 2.1 and 2.4:
62
3.7
act, (7) = f\neti (t)] = Y2(l + tanh neti (t) / g) and
neti (t) = g / [act, (*)] = g K ln[acf,(0 / (l art, (f))]
where g is "gain." For activations in a region inside the ncube approximately,
.119 < act ft) <.881 or e2/(l + e2) < act ft) < e2/(l + e), the net inputs, net ft), are in
the range g < net ft) < g (see Figure 3.12). This is the linear region of the sigmoid
from Figure 2.1. The system of equations that approximates network dynamics is lin
ear for this region of activations, in fact, it is a scaled version of the system from
Equation 2.6:
act'(f) = K [ 1+(W act(r)+ext)/g] = /2(l+net (t)/g)
J.O
For the majority of the inside of the ncube, the linear approximation is at least a fair
approximation. This allows for conclusions drawn from eigenspace analysis (analysis
of the linear system) to be valid for most of the network state space.
The actual proof of stability for comer states of activation, actftm^ = 0 or 1
for i = 1,..., n*, or vertexes of the ncube is covered by Abe [1]. The nonlinear set
of equations (from Equation 2.1) can be linearized in the region near the comer
states. Then it can be shown that corner states are stable if for actftFiml) = 1,
netft?imi) > 0 and for act.(rFinaI) = 0, netftFmsf) < 0. These stability rules are consistent
with intuition gained by looking at the activation function and will be used below.
3.6.2 Effect of External Input
The external input is a constant bias which effects the components of stable
63
states of the network. If the external input is limited to an equal input to each unit,
ext = [b, b, b,..., b] or ext. b for i = 1,..., n*, then the equilibrium point of the net
work is biased along the diagonal of ncube. The relation act(t) = W1 ext defines,
act(t), the equilibrium point. From this act,(i) = b = b wu >
since for homogeneous weight matrices ^(IF1) = l/^. whj. Another property
of homogeneous matrices is that the sum of any row j is the same so:
act, (0 =... = act, (t) =... actn, (f) = bj Â£. w.
where the sum is the sum of all weights in any row of the weight matrix. This equi
librium point will be on the diagonal of the ncube if the sum of the weights in a row
is negative. This is generally true for a mutually inhibitory weight matrix. The equi
librium point is derived using the assignment problem weight matrix in Appendix C.
For the assignment problem: ^ w/>;. = 2(n 1) therefore act^t) = b/2(nl)
From the eigenspace point of view, the equal external input to each unit is a
bias in the DC eigenspace or the 0th Fourier harmonic. Every state of activation
except 0 has a nonzero DC component so any corner state other than 0 has some DC
component. The DC eigenspace component of a stable state will be yn^.acti(tFim])
including the scaling factor, 1 In, from the definition of the DFT [6], [13]. Any final
activation with at least one unit on will have a nonzero DC component. However,
the eigenvalue of the DC subspace is negative. Motion in this eigenspace is toward
the origin. If its1 eigenvalue were positive, the activation vector would head away
from the eigenspace origin and, due to the network "limiting" dynamics, end up at 1
(the state at the opposite end of the diagonal [1,1,..., 1]). So a bias in the DC sub
64
space must be included so that its origin is not at zero. The necessity of having a bias
in the DC eigenspace, which is a result of ext, should be clear at this point. For the
assignment problem, using Equation 3.9 and the definition of the DFT, the DC eigen
space bias is 1/n = n b/[2(nl) With n = 5 and b = 1, the DC eigenspace
bias is 0.625 as seen in Figure 3.11.
The actual value of exti = b, the bias, is easiest to determine in the state
domain. The value of b required is one that causes states which are solutions to a
given problem (feasible states) to be stable and all other states (nonfeasible states) to
be unstable. This is tested using the stability rule from above. For stability of a feasi
ble state the net input to each unit must follow netftFim]) & 0 for actftFimi) = 1, and
net(?FinaI) < 0 for actftF[n3f) = 0. For the nonfeasible states to be unstable they must
violate the above rules. This is why the value of b is of prime importance in deter
mining the states which will be stable, which is the main subject of the next section.
3.7 Final States of the Network and Feasibility
The major concern now is how the stable and unstable eigenspaces of the lin
ear approximation manifest themselves in the final states of a network with a given
weight matrix and external input. Since eigenspace based methods are emphasized
here, the first step is determining the composition of final states in the eigenspace
domain and relating them to the linear dynamic system. The final states of the net
work defined by eigenspace analysis can then be viewed in the state domain. To com
plete the analysis the effect of each individual element the weight matrix, W, the
external input vector, ext, and the initial activation, act(tQ) is discussed.
65
3.7.1 States of Activation in the Eigenspace Domain
Every state of activation of the network, act(t), has a representation in the
eigenspace domain. The transform to this domain for a homogeneous weight matrix
is the discrete Fourier transform since the Fourier harmonics are eigenvectors of this
type of weight matrix. Since act ft) E R and the DFT maps real input to the complex
domain, each eigenspace component of the state vector has a magnitude and a phase.
It is useful to look at the mapping of activations in terms of the magnitude of compo
nents alone as the magnitude carries all the necessary information. The use of the
DFT on any given activation vector results in conversion: R" PFT > Cn*. The map
ping is one to one, therefore, an activation vector has a unique representation in
eigenspace when its magnitude and phase are considered. Looking at magnitude
alone, R" DFTI > Rn This mapping is not one to one but many to one. Several acti
vations in the state domain map to one magnitude vector in the eigenspace domain. A
set of several activations share a single representation in the eigenspace domain, if
only magnitude is considered. The magnitude of transformed states will be called
eigenspace magnitude and will be represented by act(t).
The states of the network that share the same eigenspace magnitude represen
tation are related by a special symmetry. Given a single state with an eigenspace
magnitude, the other states with that same eigenspace magnitude will be shifted ver
sions of that state in the state domain. This is a direct consequence of the equivalence
of a "phase shift" in the frequency domain here, the eigenspace domain of the DFT
and a "time shift" in the time domain here, the state domain [13]. All of the "time"
shifted versions of an activation differ only by phase in the frequency/eigenspace
66
Figure 3.13
Shifted versions of activations in the state domain
domain. Since phase is ignored here, all shifted versions look the same in the fre
quency domain. "Shift" for a one dimensional network implies left to right rotation
and for a two dimensional network implies left to right as well as top to bottom shift
ing. The number of shifts for a one dimensional network of size n before returning to
the original state is n so the number of activations in state space that share an eigen
space magnitude is n. For a two dimensional network, since n shifts are possible (n
left/right rotates for each of n top/bottom rotates), there are n activations that share
an eigenspace magnitude. For these shifted states to be distinguished from one anoth
er, the phase information is needed.
For an activation to be "in" a certain eigenspace, it must have eigenspace
components only in that eigenspace. Since an eigenspace is a subspace spanned by a
67
set of eigenvectors, any element in the eigenspace must be defined by a linear combi
nation of this spanning set. A linear combination of a set of vectors u = {u15 u2,...,
u } is defined as t,u. + Lu~ + ... + t u where t. is a scalar. This result is from basic
linear algebra. If eigenspace p is spanned by a subset of m* eigenvectors called
v', v'Cv (where the set of all n* eigenvectors is v), then act(f) E p if its eigenspace
domain representation fulfills: act(t) = [tjV, +t2v2 + ... + Tm,vm,] where x. E C. Each
x. is an individual component of the DFT and can have zero or nonzero magnitude.
This is the same as saying if a C p and act(t) E a then act(t) E p. Activation in a
subset of an eigenspace is in that eigenspace. The components in the complimentary
subspace, > p (which is spanned by > [v' (T v] ), must have zero magnitudes for the
activation to be in p. Since only magnitude is considered, we know that if a state is in
an eigenspace then all shift versions of it are in that eigenspace.
3.7.2 Eigenspaces of Linear Approximation and Final States
The utility of using the linear approximation to determine the final stable
states of the network becomes apparent here. Using the concept of stable and unsta
ble eigenspaces of the linear approximation, if it is assumed that there is one unstable
eigenspace, the final states of the network must be in that unstable eigenspace. The
unstable eigenspace is the only eigenspace of the linear approximation that can have
nonzero components when t 0. Therefore, the final state can have only nonzero
components corresponding to the unstable eigenspace. From the definition of mem
bership in an eigenspace these final states are "in"the unstable eigenspace. Put anoth
er way, the eigenspace magnitude components of the final states can only be nonzero
where the eigenspace components of the linear approximation are nonzero for t 0.
68
(Note again that the eigenspace components of the final state can be zero where the
linear system components are nonzero but not vice versa. In the first case, the final
state is in a subspace of unstable eigenspace which is, by definition, a part of that
eigenspace. In the second case, the state is not in the unstable eigenspace.) This
directly ties the eigensystem of the network's weight matrix to the final states of the
network.
It is a necessary condition that the final states are in the unstable eigenspace
of the linear approximation. The final states must also be comer states stable equilib
ria of the nonlinear system. This, too, is a necessary condition but it is based on the
nonlinear dynamics. When paired together, these requirements form a qualitative
definition of the final states of the network. The actual final states can be determined
using these conditions and the specifics of the network, W, ext and act(tQ).
The make up of the final states in state space can be determined from their
eigenspace components. The transform from eigenspace to state space is the inverse
of the transform from state space to eigenspace. Since the DFT was used for the lat
ter, the inverse DFT is used here. To determine the final state both phase and magni
tude are required. Since, in general, Cn* > Cn* care must be taken so that the
final state turns out to be real. This is brought about by assuring the final eigenspace
state exhibits complex conjugate symmetry: ct,y(^FinaI) = actFinai) where x*(t) is
the complex conjugate of jc(t)[13]. This fact will be discussed in Section 4.5.
3.7.3 Factors Affecting Which Final State Is Chosen
The factors mentioned in the last section determine the qualities of the final
states but other factors determine which states become final states for given network
69
conditions. The weight matrix and external input are usually fixed for a given net
work while the initial activation is varied for each run of the network. The first two
factors determine the class of problem that is being solved and the third determines
which specific instance of the problem is being solved.
The weight matrix determines the linear dynamics of the network. Since the
linear dynamics determine many of the qualities of the final states, if not the exact
states themselves, the weight matrix contributes greatly to overall network dynamics.
The weight matrix, by determining stable and unstable eigenspaces, constrains the set
of possible final states. This set is limited to all states in the unstable eigenspace. The
weight matrix does not limit final states to corner states. That condition is motivated
by the nonlinear activation function.
The main function of the constant external input is to assure that the DC com
ponent of the final state is nonzero. Since the DC eigenspace has a negative value,
the linear flow is toward the origin of this eigenspace. The bias assures that the origin
is not zero but at some point along the diagonal of the ncube. As this point varies, so
does the composition of the final state. If we take the DC component of the final state
to be the scaled sum of all activations 1/n ^(.flcf,.(rFinal), then the DC component is an
indication of the number of units that are "on" in a final state since act.(tVin3^) = 1 or 0
for i = 1,..., n*. Ranges of external input can be derived so that a given number of
units are on for that range. This is easy to see graphically as shown in Figure 3.14. It
shows the relationship between the DC eigenspace and the "planes" or eigenspaces
containing final states. Using Equation 3.9, the general relationship between the
value of external input, b from above, and the DC component of final states is:
70
3.10
i / X"" / \ 1 n* b
c = V* X, (0= x v
n Lj wu
for a homogeneous weight matrix. For a two dimensional homogeneous network with
n* = n2, the relationship is: c = n b/^. w:j.
A broad range of final states of the network can be determined by finding the
unstable eigenspaces of W and DC component due to the bias of driving force, ext,
but the initial state of the network must be know to determine a single final state.
Figure 3.14
Unstable eigenspaces and the DC eigenspace
Since the dynamics of the network are deterministic for a set of weights, external
inputs, and initial activations, the network will flow to the same final activation if all
71
of these are constant. The initial state provides the initial conditions for the set of dif
ferential equations, described by Equation 2.1, and determines the starting point of
the dynamical flow. In the problem definition, the initial activation is usually the
input to problem being solved.
Looking at the linear optimization problems, the factor that determines which
final activation is chosen is "closeness" to the initial activation. The corner state that
fulfills the constraints, defined by W and ext, and is closest in euclidian distance to
the initial activation will be the final state. The design factors, W and ext, determine
the nature of the flow field while act(tQ) determines where in the flow field the acti
vation starts. From this starting point, the activation vector moves toward the closest
attractor state or local minimum in the flow field. Since for the linear optimization
problem the attraction toward all final states is equal, the activation generally stops at
the closest local minimum.
3.8 Performance of the Network
Network performance measures how well the network solves the problem it is
modeling. Since the scope here is limited to linear optimization problems, the quality
of performance is easy to judge. The terms feasibility and optimality were defined
above and now they are put to use in examining network performance.
3.8.1 Feasibility
The feasibility of the network is defined in terms of feasible states. These
states must be corner states, they must be stable, and they must be solutions to the
problem being modeled. The characteristics necessary for stability are covered exten
sively above, so here, the last part of the definition of feasibility is emphasized. A
tentative definition of feasibility would require that all initial activations converge to
72
a feasible state. But this does not take into consideration states or inputs for which the
problem model dictates no solution or an ambiguous solution. This includes things
like ties between two or more possible solutions, where any of the solutions is equal
ly valid from the standpoint of the problem definition. Input states such as this should
result in an ambiguous outcome. So a revised definition of feasibility would be, that
the network converge to feasible states for all inputs (initial activations) for which the
problem defines an unambiguous output (final activation).
Eigenspace analysis assures network feasibility. The reason for examining the
stable and unstable eigenspaces of the linear approximation so closely is to assure
that the final states of the network correspond to feasible states. The eigenspaces of
W limit the set of final network states. The weight matrix is constructed (as will be
shown in the next chapter) so that the only final states are feasible states. The exter
nal input determines the DC component of the final state and, thereby, the number of
units which will be "on" in the final states. Feasible states are used directly to deter
mine the range of the external input value which makes feasible states stable. In
short, the factors of the network can be set so that all final states of the network are
feasible states.
3.8.2 Optimality
The concern here is not whether the network finds solutions but whether it
finds the best solutions to the given problems. Feasibility is defined by the existence
of a final state, representing a solution, for each input that has a solution, so its chief
concern is the form of the solution. Optimality is a stricter condition than feasibility.
In addition to the existence of a solution, it requires the single best solution be cho
73
sen. The solution must be, at least, "good" for it to be of any use. It is desirable that
the solution be the "best" possible. The terms "good" and "best" are, of course, deter
mined with respect to the problem model. While the WTA network performs optimal
ly, the assignment problem network from above has suboptimal performance. This
means, where a best solution exists, it does not always find the best solution [20], [21].
The purpose of this discussion of optimality is to suggest one test and apply it
to the suboptimal assignment problem. The states of the network that give ambiguous
solutions are most important in this test. Between any two feasible final states of the
network, there is an equidistant subspace. If state space is n* dimensional, then this
subspace is ri* 1 dimensional so it is called the equidistant hyperplane. Its compli
mentary space is a line (one dimensional), connecting one feasible state to the other,
that is normal to the hyperplane. Equidistant means geometrically equidistant, such
that:
n*
Final)i X,)2
_/= 1
where act(?Fina)1 is one feasible state and act(fFinal)2 is the other and x is an unknown
state in the equidistant hyperplane. Solving for state x defines the equation of the
hyperplane.
Since distance between the initial activation and the final states of the net
work determines the specific state to which the network will converge, activations
which start in an equidistant plane should be unable to choose a final state. This is a
geometric definition of the "ambiguous" states or tie states mentioned above. Initial
states in an equidistant hyperplane should result in ambiguous final activations. The
^(acr(.(rFinaI)2 *,)
,/=i
3.11
74
final activation should never reach a stable corner state because the network will try
to turn on sets of units corresponding to both equidistant feasible states. But both fea
sible states cannot be on because of the limits placed on the final states by the DC
bias, ext, which allows only a certain number of units to be on. The result is that a
stable state is never reached. This is the type of behavior that should occur for any
initial activation in an equidistant hyperplane.
If the definition of invariance is recalled, a condition for optimality can be
developed from the dynamics of activations in equidistant hyperplanes. From an
algebraic point of view, the plane separating any two feasible states should be invari
ant. Once the activation is in one of these planes, it should stay there. It should not be
able to choose between the equidistant final states. The predicted result from network
dynamics would be that neither final state could be reached. This leads to a necessary
condition for optimality. For optimal performance, there must be an invariant hyper
plane separating each pair of feasible states. This is equivalent to saying that any
input which should result in a tie does result in a tie (or an ambiguous final activation
that is a combination of the two final states involved in the tie).
This condition seems relatively restrictive and, hence, difficult to pass. The
performance of the Hopfield networks in optimization problems is generally subopti
mal, lending weight to this conclusion. Since the linear approximation to the network
was used for most of the analysis, the first question that revolves around the perfor
mance of the linear approximation. Is it "optimal" from the standpoint of the invari
ant hyperplane condition? To prove suboptimality, we must show that at least one
hyperplane separating two feasible states is not invariant. This is a sufficient condi
75
tion for suboptimality. Taking the assignment problem, which we know to be subopti
mal, a test of invariance between two feasible states is done using the linear update
approximation and the results shown in Appendix C. For the example states used, the
equidistant hyperplanes is invariant and, therefore, the performance of the linear sys
tem looks optimal. The evidence is not complete and the question of optimality of the
linear system is still open. The nonlinear system, however, definitely introduces sub
optimality into the network's solutions. An example of this is also shown in
Appendix C. This is one explanation of the overall suboptimal behavior of the assign
ment problem Hopfield network. The examples in the appendix are just an indication
of the type of analysis that must be done to gain a full understanding of network per
formance. The performance of Hopfield networks that are applied to optimization
problems must be analyzed more closely before they can be used for more complex
problems where their performance must be predictable.
76
4. Eigenspace Design
The techniques of eigenspace analysis lend themselves to a very logical
design technique. By applying the method of analysis in reverse order, we can begin
with a set of feasible states and design a weight matrix and external bias which will
cause the network to converge to these feasible states. Emphasis must be placed on
assembling the feasible states into a form which is easy to work with in this case.
Once this is done, the dynamics can be "designed" in the eigenspace domain. Implicit
in this is the use of the linearity assumption. From the dynamics defined by the feasi
ble states, the weight distribution can be found. The transformation to the "state"
domain occurs here. After calculating the weight distribution, a homogenous weight
matrix can be formed. The external input that assures stability for desired final states
must, then, be found. This turns out to be an iterative process of adjusting the weights
of the weight matrix to find a range of external inputs that result in stable feasible
states. Once the weight matrix and external inputs are defined, the network can be
tested and its performance gauged. This process is summarized in Figure 4.1.
The chief advantage of this design technique is that it does not rely on the
energy function. The energy function suppresses much of the detail of network
dynamics, making visualization of the dynamics difficult. The eigenspace perspective
gives a relatively clear view of where the network is going and how it gets there. The
dynamics can be fine tuned in the eigenspace domain and errors can be seen more
easily.
The resulting eigenspace design technique will be applied to two dimensional
network design examples in the last section of the this chapter, with the details
77
Figure 4.1
Summary ofeigenspace design.
included in Appendix D. For this reason, the two dimensional case will be empha
sized throughout the chapter. The technique for design of one dimensional networks
is analogous and, in most cases, is just a simplification of techniques used for design
ing two dimensional networks.
78
4.1 Desired Feasible States
Eigenspace design begins in the state domain with the set of states that are
solutions to the specific problem, the desired feasible states. This set comes directly
from the problem model. The size of the set of states can be quite large. For this rea
son, it is desirable to group the states by the symmetry which they may share. Once
the set is grouped into representative states, these states can be transformed into
eigenspace representations.
The first step is deciding on a representation for the outputs of the problem
model in a twostate, max and min, network compatible form. Feasible states are
always assumed to be vectors that correspond to comers of the ncube. Since all final
states are binary, the desired feasible states must be represented as binary states.
There are 2 (n* = n for a one dimensional network and n* = n for a two dimensional
network) possible feasible states for a network of size n one for each corner or ver
tex of the ncube. Although certain weight matrices do result in final activations at
the edges of or inside the ncube, this will be considered degenerate behavior.
Networks of this kind will not be designed by this process.
With this large number of possible feasible states, it is necessary to reduce the
set of feasible states to a manageable number of representative states. This is where
symmetry factors within the set of feasible states are considered. The idea is to find a
small subset of the feasible states of a given problem that typifies the other feasible
states. States in this subset will be called archetypal feasible states. At this point, we
are looking at the physical characteristics of the feasible states in state space. Later, a
greater reduction of dimensionality can be realized in the eigenspace domain.
79
Descriptive terms will suffice to define "symmetry" here. For example, the desired
feasible states for the WTA problem are: all states with one unit "on" and all other
units (n 1) "off." The assignment problem feasible states are all permutation matri
ces of size n x n (or matrices with one unit on per row and per column). For the gen
eral linear optimization or constraint satisfaction problem, the feasible states are all
comer states subject to the given constraint or constraints.
The starting point for eigenspace design is the same starting point for energy
function design so the territory is familiar. From this point, the energy function
approach uses the relation defined in Equation 2.8 to get the weight matrix. Although
energy function design requires manipulations of the energy equation which give
insight into the set of desired feasible states, network dynamics are not explicitly con
sidered [9].
4.2 Transformation of Desired Feasible States to Eigenspace
The dimensionality of the set of desired feasible states must be reduced to a
single state which describes the entire set. "Transformation" is the approach used for
this reduction. First, the nature of the transformation is clarified and it is shown how
symmetry is visible in the transformed domain. Second, the technique for combining
representations in the transformed domain using superposition is developed. The
result is a single state, in the transformed domain, from which a dynamic system can
be designed.
4.2.1 Transformed State Domain
In analyzing homogeneous networks, the transformation from the state
domain to the "eigenspace domain" is accomplished using the discrete Fourier trans
80
form (DFT). The eigenspace domain is defined in terms of the eigensystem of the
homogeneous weight matrix. The set of transformation vectors used is the set of
Fourier vectors. (See Section 3.4 and Appendix A.) It was shown above that this set
of vectors forms the modal matrix (matrix of eigenvectors) for any homogeneous
weight matrix of a given size and dimension. Since the Fourier harmonics are the
same as eigenvectors of any homogeneous weight matrix, they are the factors which
define the basis for "eigenspace." The transformation from the state domain to the
eigenspace domain of some ambiguous homogeneous matrix (of a given size and
dimension) can be made using the DFT, even though the exact matrix is unknown.
We will call this the transformation to the general homogeneous eigenspace domain.
The transformation required here serves the opposite purpose of the one used
in the analysis section. There, the eigenspace characteristics of the final stable states
were transformed to the state domain to get the final stable states of the network. For
this step, which was somewhat implicit, the inverse DFT was used. In the design
process, we have the desired feasible states which will form the final states of the net
work. An eigenspace version of these states is needed to determine the desired eigen
space characteristics of the network. This eventually leads to a weight matrix defini
tion. Using the DFT, we can transform the set of desired feasible states to the general
homogeneous eigenspace domain. (From this point, we will refer to this domain as,
simply, the eigenspace domain.)
Symmetry will reduce the number of states that must physically be trans
formed to the eigenspace domain. If only eigenspace magnitudes are considered, just
as before, the first element of symmetry to become apparent is the symmetry due to
81
shifting. Shifted feasible states in the state domain have the same eigenspace magni
tudes. For the n* feasible states that are shifted versions of each other, there is one
archetypal eigenspace magnitude. This imposes the limitation that, if a state is feasi
ble then all shifted versions of it must be feasible. Shifted states can only be distin
guished by phase information which is not considered by this design process. The
reduction in the number of states which must be transformed due to shift symmetry is
significant but does not fulfill the goal of reduction to a single eigenspace state unless
all desired feasible states are just shifted version of each other. States which are not
shifted versions of an other state must be transformed individually.
Only the magnitude components of the transformed states are of interest. Of
particular concern are the components with zero magnitudes. Later, they will distin
guish the stable eigenspaces of the linear dynamics from the unstable eigenspaces.
The set of archetypal eigenspace states is the set of states with unique eigenspace
magnitudes. This set must be determined before the next step, superposition, can take
place. If the states are defined in terms of their characteristics such as: "all permuta
tion matrices of size nxn," then it is most fruitful to transform several examples of
the set and look for characteristics that define them in eigenspace. If this cannot be
done, then all states must be transformed and then superimposed.
4.2.2 Superposition to Form the Composite State
Once the set of transformed feasible states with unique eigenspace magni
tudes (the set of archetypal states) has been constructed, superposition can be used to
reduce the set to a single state called the composite state. The principle of superposi
tion applies to all linear systems [4], [11]. The goal of this design is a linear system
82
which captures the important part of the nonlinear network dynamics. Since the fea
sible states are' used to determine this linear system, superposition can be used on
them. Above, it was shown that a single set of eigenspace characteristics defined the
linear dynamics of the network and that all final states of the network had to have
these characteristics. In design, the object is, first, to find the eigenspace characteris
tics using each of the feasible states and, then, to use these characteristics to define
the required linear dynamics.
Superposition of transformed feasible states serves to define the set of eigen
space characteristics. It is defined as term by term addition of the eigenspace
magnitudes:
super[jF ] =
Xtfy]* I *' = l>>and;=l,...,n
.*=i
]f + [^1,1 L + + \Fi,i L ]> j [[Tjj ]i + + [7i,j ]*+ [Fij ],]> 4 I
(for a two dimensional network) where super[^] is the superposition in matrix form
of the set of archetypal states so the partitioning of the second line into annxn
matrix is assumed. The element \T^k is the i,jth eigenspace magnitude of the kth
archetypal state (with archetypal states from 1 to m). Superposition maintains the
form the states as they are input. In other words, if the archetypal states are input in
their vector form, f. for i = 1,..., n then the composite state will be in vector form.
If the archetypal states are in matrix form, J.. for i = 1,..., n and j 1,..., n, as
shown, then the composite state will be in matrix form. Unless otherwise stated, we
will assume that the composite state is in matrix form, super [J].
83
The superposition of any two states can broken into two cases: where the
states have the same eigenspace magnitude components and where the eigenspace
components are different. In the first case, the result of superposition is just scaling of
each component. For instance, the superposition of two states with exactly the same
eigenspace magnitudes is two times the eigenspace magnitude of either state. This is
the case for states which are shifted versions of each other in state space. No new
information about the components of the composite state is gained by such a super
position, so it is, in essence, unnecessary to perform it. For this reason, only one
archetypal state representing all shifted versions of a given state needs to be included
in the superposition sum. It is especially important to note that all the zeros of the
eigenspace magnitude (individual components of the eigenspace state that have zero
magnitude) are preserved in there places, as in Figure 4.2.
.5 .2 0 .2 .2 .5 .2 0 .2 .2 1.0 .4 .4 .4
.3 .2 0 .2 .2 .3 .2 0 .2 .2 .6 .4 .4 .4
.4 .3 .5 .3 .3 + .4 .3 .5 .3 .3 = .8 .6 1.0 .6 .6
.3 .2 0 .2 .2 .3 .2 0 .2 .2 .6 .2 El .2 .2
.3 .2 0 .2 .2 .3 .2 0 .2 .2 .6 .4 El .4 .4
  Zeros that are preserved all of them in this case
Figure 4.2
Superposition of states with the same eigenspace magnitude
More interesting results occur when two states with different eigenspace mag
nitudes are superimposed. All magnitudes are, by definition, positive so the addition
of two nonzero magnitude components results in another nonzero magnitude com
84
ponent. This is similar to what happened in the previous case. The distinction comes
when one or both of the components are zero. Because addition is the operation that
defined superposition, two things can happen to a zero of the eigenspace magnitude:
it can be cancelled or it can be preserved. For the zero in a given position J.. to be
preserved, both of the states being superimposed must have zeros in that position,
[f..]. = [J. ]2 = 0. If one of the states has a nonzero component, [J..]. or \f.^ 0,
then the zero in position is cancelled and replaced by a nonzero magnitude. The
zeros that are not cancelled when superposition of all states is complete are important
.4 .2 .1 .2 .3 .5 .2 0 .2 .2 .9 .4 (l) .4 .5
.2 .1 0 .1 .1 .3 .2 0 .2 .2 .5 .3 [0] .3 .3
.3 .2 .2 .2 .2 + .4 .3 .5 .3 .3 = .7 .5 .7 .5 .5
.2 .1 0 .1 .1 .3 .2 0 .2 .2 .5 .3 [0] .3 .3
.2 .1 .1 .1 .1 .3 .2 0 .2 .2 .5 .3 .3 .3
 [ Zeros that are preserved
O Zeros that are cancelled
Figure 4.3
Superposition of states with different eigenspace magnitudes
for determining network dynamics. Figure 4.3 shows an example of cancellation due
to superposition.
The result of superposition is a single composite state. The only zeros that
will be preserved in the process of superposition are the zeros that occur in all of the
states. This is the information that is needed for determining network dynamics. The
existence of a zero in the composite state, super[yr]f.. = 0, means that all the desired
85
feasible states have a zero magnitude component at i,j in the eigenspace domain:
= [jF(y]2 = = [jF(y]m = 0. It is the same as saying that the final activations of the
network should have zeros in this position in eigenspace: 0ct y(*Final) = 0. This is the
key to defining network dynamics.
Before we proceed to the next step, we look at the limitations imposed by this
superposition process. If complimentary states are superimposed the result is a com
posite state with no zeros. For example, if the assignment problem eigenspace magni
tude is superimposed with the eigenspace magnitudes of the "row space" and the "col
umn space," as shown in Figure 4.4, the result is a composite state with no zeros. The
dynamics resulting from a composite state with no zeros lead to a network where all of
the nspace is in an unstable eigenspace in terms of linear dynamics. This means all
comers are feasible, as in the case of the ^winner extension of the WTA problem
[20]. This problem allows any number of units to be "on," regulated only by the value
of the external input. So any superposition which results in a composite state with no
zeros, degenerates into the kwinner network for reason fully described below.
4.3 Design of Linear Dynamics
Now that a full eigenspace description of the desired feasible states has been
calculated, this description can be used to define a linear dynamic system. The main
concern here; is the description of eigenspaces in terms of stability. This entails
grouping thei states by eigenspace magnitudes. These groupings are categorized by
dynamical properties defined in the analysis section. Once the eigenspaces are cate
gorized, a partial eigenspace representation of a homogeneous weight distribution
can be made.
86
Column eigen Row eigenspace Feasible state
space magnitude magnitude eigenspace
magnitude
1 1 1 i 1 1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 .4 .3 .7 .4
0 0 0 0 0 + 1 0 0 0 0 + 0 .3 .4 .4 .7
0 0 0 0 0 1 0 0 0 0 0 .7 .4 .4 .3
0 0 0 0 0 1 0 0 0 0 0 .4 .7 .3 .4
3 1 1 1 1
1 .4 .3 .7 .4
1 .3 .4 .4 .7
1 .7 .4 .4 .3
1 .4 .7 .3 .4
Figure 4.4
Superposition of assignment problem eigenspace magnitude
with row and column eigenspace magnitudes
The grouping of one or more components of the composite state defines a sub
space. In the analysis section, the subspaces in the eigenspace domain were grouped
by eigenvalue. A subspace spanned by eigenvectors with the same eigenvalue was
called an eigenspace. This is the proper definition of an eigenspace. Here, the defini
tion of eigenspace is temporarily relaxed. We will use the term "eigenspace" to denote
a subspace formed from a group of composite state components that share magnitude
characteristics. Later, we will see that these "eigenspaces" do turn out to be eigen
spaces proper because of the way we will define the eigenvalues.
Eigenspaces are created from the composite state based on eigenspace magni
tudes. All components with zero magnitudes are grouped into one "eigenspace" and
the components with nonzero magnitudes are grouped into another "eigenspace."
The DC component is treated separately as the DC eigenspace, even though it has
nonzero magnitude.
The linear motion in the eigenspaces is characterized by the stability of the
eigenspaces. ^Motion inward toward the origin is a characteristic of the stable eigen
space of the linear system. As time progresses, the components of activation in this
eigenspace approach zero. In the final eigenspace state of the network, these compo
nents become zero. Motion away from the origin characterizes the unstable eigen
space. The components of activation in this eigenspace get large. In the final eigen
space state of the network, these components are nonzero. The composite state rep
resents a "sum" of final activations for the network being designed. This means that
the "eigenspaces" of the composite state grouped by zero and nonzero magnitudes
represent eigenspaces of the desired linear dynamics. The group of composite state
components with zero magnitude corresponds to the stable eigenspace since the final
activation components, in the stable eigenspace are zero. Put another way,
the zeros of the composite state define the span of the stable eigenspace. The same is
true of the group of nonzero components. They correspond to the unstable eigen
space because the final activation components of the unstable eigenspace are non
zero. The nonzero magnitude components of the composite state define the span of
the unstable eigenspace. (The DC eigenspace is discussed below.)
Calling to mind the case where there were no zeros in the composite state, the
^winner network, we can now explain the conclusion drawn above. Since there are
no zeros the in composite state, the entire space is the unstable eigenspace (except for
88
the DC subspace which is treated separately). Therefore, the network final states can
have nonzero magnitude for any or all of the eigenspace components. This trans
lates, in the state domain, to any state. (The final state of the network can have zero
components at any location because then it is in a subset of the unstable eigenspace
and, from above, an activation in a subspace of a given eigenspace is still in that
eigenspace.) The factor that determines which states will be stable is the external
input. This DC bias defines the number of units that will be on or "k" of the ^winner
network [20],
The DC eigenspace is treated as a special case. It must have a nonzero com
ponent in the composite state because the desired feasible states have nonzero DC
components.:(The only exception is except in the degenerate case where the only
desired feasible state is 0.) But the DC eigenspace must be part of the stable eigen
space. Its component in the final activation must end up at the origin of the DC
eigenspace but the origin will be nonzero. The freedom allowed in defining the bias
in the DC eigenspace helps in defining the states of the network which can be made
stable as will be described in Section 4.6.
The outcome of this phase of design is a definition of the make up, in eigen
space terms, of the stable, unstable, and DC eigenspaces. This definition of eigen
spaces will result in final activations of the form dictated by the desired feasible
states. At this point, the dimensionality and "span" of each eigenspace is known.
(From linear algebra, span is defined by a set of vectors, in this case the eigenvectors,
such that any state in the subspace can be written as a linear combination of these
vectors.) This is a complete definition of the required linear dynamics of the system
89
we are designing. It does not describe how stable states are reached but it does
describe the allowable characteristics of stable states in essence, the constraints of
the optimization problem in the eigenspace domain. The eigenspace description,
here, is qualitative. To complete the design of a network, the nature of this descrip
tion must be quantified by defining the magnitudes and signs of the eigenvalues.
4.4 Determination of the Eigenvalues of the Weight Matrix
The main goal of the design process is to determine the weight matrix. To
accomplish this, the eigenvalues of the matrix are used. The composite state repre
sents the form of the desired final activations in the eigenspace domain. It is com
posed of zero magnitude and nonzero magnitude components. The eigenvalue, X,
determines the incremental changes in each component of the state vector in the
eigenspace domain, Aact^ hact^t), as it heads toward its final stable value. Note that
the equality of this expression is approximate () for the nonlinear "neural" dynam
ics but is exact (=) for the dynamics of the linear approximation. A conversion from
the essentially binary values extracted from the composite state components (zero or
nonzero) to the eigenvalues, which are assumed to be X G R, is required. The con
version must address two issues: the sign of the eigenvalue and the magnitude of the
eigenvalue.
Analysis of linear dynamics is essentially concerned with the "polarity" of the
eigenspaces. We know that the eigenspaces with nonzero magnitudes repel their
components from the origin while zero valued eigenspaces attract their components
to the origin. This concept of polarity defines the signs of the eigenvalues. The eigen
spaces which are attractors (cause zero magnitude components in network final
90
