PAGE 1
ON NEWTON'S METHOD TO MAXIMIZE THE LIKELIHOOD FUNCTION FOR PAIRWISE COMPARISONS by Vinata Gangolli A thesis submitted to the Faculty of' the Graduate School of the University of Colorado at Denver in partial fulfillment of the requirements for the degree of Master of Science Applied Mathematics 1994
PAGE 2
This thesis for the Master of Science degree by Vinata Gangolli has been approved for the Graduate School by David C. Fisher r James Koehler Burt Simon ii Date
PAGE 3
Gangolli, Vinata (M.S. Applied Mathematics) On Newton's Method to Maximize the Likelihood Function for Pairwise Comparisons Thesis directed by Associate Professor David C. Fisher Abstract We consider the problem of ranking a set of objects based on a number of pairwise comparisons made among them. This becomes a more complex problem if each object is compared only with some of the remaining ones in that set, and if this experiment is repeated unequal number of times for different pairs. A directed graph can be used to represent such a scenario. Given such a directed graph representing pairwise comparisons of objects in a set of data, a likelihood function based on the pairwise comparisons of these objects is constructed with an underlying assumption, that the performance of the object is an independently distributed random normal variable with a common variance. Newton's method is then used to get the maximum solution to this function. It is shown that there exists a unique maximum solution to this problem for the case of a strongly connected directed graph. This abstract accurately represents the content of the candidate's thesis. I recommend its publication. David C. Fisher Ill
PAGE 4
To Chinmoy for his love and support IV
PAGE 5
Chapter 1. Introduction ............................................................................................... 1 2. Ranking of Performances .................................................................... .4 3. Mathematical Model. .............................................................................. ? 4. Solution to the ModeL.. .......................................................................... 1 5 5. Computation of the Solution ............................................................... 2 1 6. Applications ............................................................................................... 2 6 Appendix A. Rankings based on NFL Pre-Playoff Games (1993-94 Season) .............................................................................................................. 2 8 B. Computer program .................................................................................. 3 0 References .......................................................................................................... 4 1 v
PAGE 6
Figure 1. Digraph representing games between teams .............................. 5 2. Plot of Normal CD F ................................................................................ S 3. Plot of r(X) .................................................................................................. l 7 4. Plot of r'(X) ................................................................................................. l 7 5. Plot of r"(X) ............................................................................................... .! 7 vi
PAGE 7
ACKNOWLEDGMENTS I would like to thank David Fisher, James Koehler and Burt Simon for giving me the opportunity to learn by sharing their time and expertise. I especially thank Dr. Fisher for his support, invaluable guidance, patience and wonderful enthusiasm. vii
PAGE 8
1. Introduction We are often confronted by the problem of ranking a set of pairwise alternatives, based on the data available on the merit of each alternative. We consider a population of n objects with vanous degrees of merit in the attributes of our interest. It is desired that these n objects be ranked in a fair manner according to their merits. A number of pairwise comparisons may be made to decide the better object among a pair to be compared. So we would have (;) possible pairs to compare. Suppose we compare only some of these pairs, and repeat this experiment of comparison unequal number of times for each pair, then we need to formulate a way to rank these objects based on the experiments. The method of paired comparisons IS used primarily m cases when the objects to be compared can be judged only subjectively; that 1s, it is impossible to make an individual measurement for each object to judge its merit. Paired compansons are widely employed by psychometricians. The method was introduced by Fechner ([3],[4]) and, popularized by Thurstone after several extensions ([13], [14], [15]). This problem has been widely studied ([2], [7], [9], [10], [11], [12]). The process of ranking is complicated for the case of the absence of a clear-cut winner. Each alternative may have, some characteristic or merit in which it scores over others. Various
PAGE 9
schemes have been suggested and implemented to decide the best alternative. Saaty [ l 0] has introduced a procedure for prioritizing decision alternatives based on an n x n positive matrix, R = (ru ), of pairwise compansons of n <: 2 items expressed on a ratio scale. His method treats r,j > 0 as an estimate of the perceived intensity, w; /,, of a jwj judge's preference m favor of alternative i versus 1, where ( ' ')T d l 1 f w = w"w2 ,w, IS an un er ymg co umn vector o nonnegative weights that sum up to one. The judgment matrix R =h), is thus taken to be reciprocal, namely, rj, = fru, 1 :> i,j :> n, with the assumption that the judge's preference in favor of item i or j IS elicited with only one of the pairs (i,j) or (j,i) for i j. Saaty (A. -n1)/ proposed to use J1 = mox /(n -1) > 0, a linear transform of the Perron eigenvalue A.mox of R, as a measure of the cardinal consistency in the judge's responses (i.e., r,j rj, = rik, 1 :> i,j,:> n ), and posed the problem of determining how it might vary as a function of the rij's. Aupetit and Genest [6] answered this using the Perron-Frobenius theorem. Noether [7] has suggested a method of ranking based on two response curves. A linear model of the form Xu =a,aj + e,j, (1 :> i,j :> n), is defined for the amount of preference of item i over item j. Weights are assigned for each item i (1 :> i :> n), and they are estimated using the method of maximum likelihood estimation. We introduce here a model to rank a set of objects in a probabilistic manner, based on the performance of each object relative to the rest of the population. A directed graph or 2
PAGE 10
digraph is used to represent results of the compansons between different pairs of objects. The probability of an alternative, say i being superior to another alternative, say j, is calculated, based on the results of the comparisons made so far. A variable or weight associated with each object, gives a measure of its merit. Hence difference between weights of two objects gives a measure of superiority of one object over the other. A random normal variable is associated with each object to represent the random error associated with the outcome of each comparison. The probabilities of each alternative i sconng over the alternative j are calculated using this normal variable and then the max1mum likelihood function is formulated. This is solved to get a maximum using Newton's method. It is also shown that a unique maximum solution exists for a strongly connected digraph. If normal density function is used for calculating the probabilities, then the model becomes similar to the one underlying the Law of Comparative Judgment or the ThurstoneMosteller model [2]. 3
PAGE 11
2. Ranking of Performances We start with an example to explain the problem under consideration and to develop and analyze a mathematical model for it. Consider a league of four teams 1, 2, 3, 4, which play several games against each other Teams 1 and 2 play three games against each other, two of which are won by team 2 and one by team 1. Team 2 loses in its only game against team 3 and team 1 wms m the game it plays against team 4. Teams 3 and 4 play two games against each other, both of which are won by team 4. The digraph in figure 1 gives a graph representation of this scenario together with the results of each game. Teams l and 3 do not play against each other. Similarly, teams 2 and 4 do not play against each other. The vertices in the digraph indicate the teams participating in the event. Any two vertices are adjacent to each other, that is they have an arc between them, if there is a game played between them. The head of each directed arc represents the loser in that game and the vertex at the tail of that arc represents the winner in the game. The digraph is said to be strongly connected, if there is a path from each of the nodes to every other node. 4
PAGE 12
1 3 Figure 1. Digraph representing the games played between teams I. 2. 3 and 4 The graph shown above represents the whole scenario being studied here. It is a directed graph or a digraph of the games played between these teams, viz., 1, 2, 3, 4. There is a cycle 1--> 4 --> 3 --> 2 --> 1 in the graph shown above. Thus team 1 beats team 4 and team 4 beats team 3, which beats team 2, which in turn beats team 1. We can see that there exists a path from every node to every other node in the graph above. So it is a strongly connected graph. The whole scenario can be represented using a score matrix. An entry su (1:::; i,j:::; n), is positive, if object i scores over object J those many times. Thus sj, will represent the number of times that object j is found to be superior to object i. Any of the situations such as no comparison made, or no win for object over object j counts as a zero in (i,j)th place in the score matrix. Ties have not been considered for the sake of simplicity. All diagonal entries are zero since an object cannot be compared 5
PAGE 13
with itself. A score matrix corresponding to the graph shown above, is given below. 1 2 3 4 1 0 1 0 1 S=2 2 0 0 0 3 0 1 0 0 4 0 0 2 0 We now proceed to build a generalized mathematical model, which would reflect the characteristics of the problem in mathematical terms, thereby enabling us to compute a sequence of real numbers, which in turn, would give the ranking of these n objects in the data set. 6
PAGE 14
3. Mathematical Model We wish to rank every object based on its performance relative to the performance of the others at the same time. Let us start with some assumptions before proceeding to develop the model. We assume that the outcome of each pairwise comparison is independent of other comparisons. We associate a variable w, known as weight, with object i ( 1 :> i :> 11) for judging its merit. Let X, be the random variable representing the random error in the performance associated with object i. So the value of w, +X, will determine the position of the ith object m the data w, set. The vector w = gives the weight associated with each of the 11 objects. Object i would beat object j, if w, +X,> w1 +Xi. Object j would beat object i, if w, +X,< w1 +Xi. Hence Prob[Object Prob[X, -Xi >wi -w,] scores over Object j ] = Prob[ w, +X,> wi +Xi] = We assume that X1 for all!:> i :> 11, are independently and identically distributed as random normal variables. It follows that X1 -Xi is also a normal random variable. Without loss of generality, we assume that X,-Xi for all pairs in the set, is independently and identically distributed with mean 0 and standard deviation 1. Hence ( x,-Xi)N(O,l), for all 1 :> i,j 9t. The Prob[ X,-Xi> wi-w;] can be calculated using the normal distribution function as follows: 7
PAGE 15
Prob[X,xj > W;w,] We denote this quantity as F( w,-wJ The error function associated with variable X is represented as F(x), where X I F(x)== L -.J'ii/';{dt. Figure 2 gives a plot of F(x). F(X) -4 -2 0 X Figure 2. Plot of Normal C.D.F 2 4 The value of the F(x) can be evaluated using Simpson's rule. The computer program to do this computation is given in Appendix B. 8
PAGE 16
We use this to calculate the probability of the outcome for the problem referred to in Figure 1. First we define the variables W,, for team i to give a measure of the merit of each team. The reader may recall that we referred to them as weights Thus, Team l has weight w,, Team 2 has weight w,, Team 3 has weight wJ, Team 4 has weight w,, Prob[Team l beats Team 2] = Prob[ w, +X, > w2 +X,] = Prob[ X, -X2>w2 -w,] Team 1 beats 4 if w, w, +X, -X,> o, That is, if X,-X, >w4 -w,, Team l loses to 4 if w, w4 +X, -X, < o, That is, if X, -X, < w, -w, Similarly, Team 2 beats Team 3 if X,-X,> w,w2 Team 2 loses to Team 3 if X,-X,< w,-w,, and so on. Therefore, Prob[team l beats team 2 & team 2 beats team 1 twice & team 3 beats team 2 & team 1 beats team 4 & team 4 beats team 3 twice ] Team 1 and team 2 play three games against each other. Since the are order of the win-lose sequence is not important here, there ( 3 ) ways in which team 2 can win any two out of those three games. 9
PAGE 17
= G )Prob [team I beats team 2] Prob [team 2 beats team I twice] Prob [team 3 beats 2] Prob [team 1 beats team 4] Prob [team 4 beats team 3 twice] = Prob [team I beats team 2] Prob [team 2 beats team 1]' Prob [team 1 beats team 4] Prob [team 3 beats team 2] Prob [team 4 beats team 3]2 Having devised a way to compute probabilities of the outcome of comparisons, we proceed further to construct a function of the combined outcome. Since object i is superior to object j on sij occasions, and since the outcome of each comparison is independent of each other, the probability is given by F(w,wj f'. Thus the joint probability that the outcomes are as given by the result in each comparison, is given by the likelihood function, n n so (w)= IliJ 1) F(w, -wj) Here 1J represents the product of the i=l j=l i;t.j s,1>D combinatorial terms corresponding to the outcomes of games between each pair of teams. This likelihood function represents the likelihood of the scenario given by the digraph and its accompanying score matrix Without loss of generality, we put a constraint .2:, w, = 0 on the weights It can be seen that i=l n n (w+ aT)= Illl11 F(w, + a+-(wj +a)) = (w), for some constant a E R. i=l j=l i;t) S;j>O ll Therefore, the constraint I w. = 0 enables the umque estimation ; = 1 l of the weights. We take natural logarithms of the likelihood function, since taking logarithm simplifies the likelihood function considerably. 1 0
PAGE 18
Thus the problem can be defined mathematically as follows. maximize n n s,l n (w)=I1I1TJF(w;-wj) subject to the constraint L,w;=O. i=:d j=! i=! s,1>0 Without loss of generality, the positive constant T) can be substituted by 1, since maximizing T) F( w;-wj) is equivalent to maximizing F( w;-wj ). Since taking the natural logarithm simplifies the function, we redefine the model as " maximize In ( w) = L, L.su In F( w; -w;) subject to the constraint L, W; = 0. i=l j=l i=l s,1>0 We denote r(x)=lnF(x), and !Jl(w)=ln(w). It can be seen that ERF: R (0.1), : R, (0,1) and !p: R, ( -oo,O). Now the above model can be restated as, 11 n n maximize !p ( w) = L, L, sur( w; -wj) subject to the constraint L, W; = 0. i=I i=l i=l i*j :vii >0 We now use this model to construct a likelihood function for the digraph in Figure I. (."ii) = F(w,-w2 ) F( w1 -w4 ) (F(w2 -w,))2 -F( w3 w,) (F(w4 -w3))2 4 subject to the constraint L, w; = 0. i=l Therefore, the problem can be redefined after taking the logarithms as follows: max1m1ze ln((v)) = r(w, -w2)+r(w, -w4)+2r(w2 -w,)+r(w3 -w,)+2r(w4 -w3 ) 4 subject to the constraint L, w; = 0. 1=1 That is equivalent to, maximize 1 1
PAGE 19
4 subject to the constraint 2:, w; = 0. Here w = refers to the 1=1 vector with its ith entry equal to weight assigned to the jth team. We wish to evaluate the 11xl vector w, which would maximize the likelihood function defined above. This means the linear system of 11 equations :W. cp(v) = 6, (1 $ i $ 11), needs to be solved. Newton's method can be used to calculate the vector w iteratively. Suppose we wish to solve V(w) = 6 for the function V:R"-+ R". Newton's method for the n-dimensional case gives, wk+I = w, -C/w, v ( w,) r v ( w,)' where w, is the nxl vector obtained in the kth iteration. Since we took the logarithm of ( w) to make the function linear, we take the first and second order derivatives of cp( w,) to restate the equation as (cp (w,)). Let us call these two matrices of first and second derivatives as matrix A""" and vector respectively, for the sake of simplicity. We have to compute A""" and b,"'' and then solve for the system of 11 equations to get the vector representing the change in the weights in the (k + l)th iteration. This vector is then subtracted from w, to get w,+,. By our previous definition, r(x) = ln(F(x)). Differentiating with respect to x, we have, r'(x)= F'(x) F(x) Differentiating with respect to x once more, we have, 1 2
PAGE 20
4 subject to the constraint L w; = 0. Here w = refers to the l=:d vector with its ith entry equal to weight assigned to the jth team. We wish to evaluate the nxl vector w, which would maximize the likelihood function defined above. This means the linear system of n equations = 6, (1:;; i:;; n), needs to be solved. Newton's aw; method can be used to calculate the vector w iteratively. Suppose we wish to solve V(w) = 6 for the function V:R" -'t R". Newton's method for the n-dimensional case gives, w,+, = w, (a aw, V ( w,) r V ( w,), where ,"V, is the nxl vector obtained in the kth iteration. Since we took the logarithm of 4>(w) to make the function linear, we take the first and second order derivatives of cp(w,) to restate the equation as w,+, = w, (
PAGE 21
:. r"(x)= F"(x)F(x)-\F'(x))2 (F(x )) Hence the first derivative of q:>(w) with respect to w; gtves '( ) '( )f 1< .. < cp w i a wi wi-wj J!;;:sjir wj-wi or -l,jn .. su>O sp>O The ith entry in the vector b,u, represents the first derivative of q:>( w) with respect "(-) _a 'q:> (w) to w,. Taking the second derivatives, we get, q:> w iia w2 Is/'( w,wj) + Is/"( wjw,),Jor IS i,j S n. j=l j=l s;1>0 sp>O "( -) "( -) "( ) "( ) fi I< .. < . qJ w iJq> w 1,.--sur wiw1 s11r w1 w, or _l,J n, l =F J Applying this to the digraph in Figure 1, we get, q:>(w) = ln(rfi(w)) = r(w, -w2)+r(w1 -w4)+2r(w2 -w1)+r(w3 -w2)+2r(w4 -w3 ) a a q:> = r'(w,-w,) + r'(w,-w.)-2r'(w,-w,) w, Differentiating with respect to w" a 'q:> "C l "C l 2 "C l -a 2 = r w, w2 + r w1 -w4 + r w2 w, w, Differentiating with respect to w2 a'q:> "( ) 2"( ) a a = -r w1 -w2 -r w2 -w1 w2 wl Differentiating with respect to a'q:> w,, aw3aw, Differentiating with respect to w., a'q:> a w.a w, 0 = -r"(w,w4 ) Similarly, Differentiating q:>( w) with respect to w2 we get; aq:> =-r'(w1-w2)+2r'(w2 -w,)-r'(w3 -w2 ) aw, 1 3
PAGE 22
and differentiating
PAGE 23
4. Solution to the Model Let us examine how a solution to the model as defined earlier, can be obtained. We prove two results in order to define the special characteristics displayed by the matrix A= cp;(x) corresponding to a digraph D. As seen earlier, all the entries m the matrix A are a function of the second derivative of r(x) = ln(F(x)). Thus the value of r"(x) plays an important role in the structure of A. r"(x)=.!!_F'(x) = F"(x)F(x)-(F'(x))2 dx F(x) (F(x))2 It can be seen that F(x) gives a real value between 0 and 1. Hence (F(x))2 is always a real positive value in (0,1) for any x in (-oo,+oo). Thus the numerator of r"(x) decides the sign of r"(x). The next result shows that the value of F"(x)F(x)-(F'(x))2 and hence r"(x) is always negative. Lemma 1: F"(x)F(x)-(F'(x))2 <0, Proof: As defined earlier, we have F(x)= J / e-'Adt -y 2JT Taking the first and second derivative of F(x) with respect to. x, we have, F '( ) 1 _,,; d F"( ) 1 -2x _,,; x =--e 1 an x =----e 1 = -J2ii -J2ii 2 xF'(x). ..... ( 4.1) Using the equation (4.1), we have, F"(x)F(x)-(F'(x))2 = F'(x){-xF(x)-F'(x)} F'(x) is always positive for any x in (-oo,+oo). F(x) --7 0 as x --7 -oo 1 5 ......... (4.2) (4.3)
PAGE 24
L F'( ) L' l zm x = zm =e 2 =0 x-d<><> x.....;<><> \/ 21C """"' (4.4) We now examine the behavior of the expression (4.2). We have already seen that F'(x) > 0, for all x. Consider h(x) = -xF(x)F'(x). The derivative with respect to. x gives h'(x)=-xF'(x)-F(x)-F"(x). Since we have shown earlier that F"(x) = -xF'(x), the two terms cancel out giving h'(x)=-F(x). This value would always be negative. We can conclude from this that h(x) is a decreasing function. Now, Lim f(x) = Lim-xF(x)-Lim F'(x) = Lim F(x) -0, because Lim F'(x) = 0. X-f-00 X-4-<><> x.....;-<><> ( -Yx) X--t-<><> Applying L'Hospital's rule, we get, Lim F(x) =Lim F'(x) =Lim x' Applying L'Hospital's rule once x.....;-"" (-Yx) x->-"" x-2 x---Jo-oo -!21i more, we get, xLf.'!!, / .;;, = ;I, = 2( -f27i)e .;;, = 4n: F'(x) = 0. Thus {21ie x {2n e we have proved that the limit of h(x) goes to zero as x --7 -=. f(x) is also a decreasing function. Thus the value of h(x) must always be negative. Now, F"(x)F(x)( F'(x))2 = F'(x)h(x ), and F'(x) > O,for all x. Hence the product of h(x), a negative value, with F'(x) will give a negative value. We can conclude from this that F"(x)F(x)-(F'(x))2 would give a negative value for all x. 0 Figures 3, 4 and 5 give plots of the functions r(x), r'(x), r"(x), respectively. 1 6
PAGE 25
r(X) vs X ,_ -----+--3 -2 0 2 3 -4 -8 Figure 3. Plot of r(X) ---------------------r'(X) vs X Y=r'(X) 4 T 2 -!-3 -2 -1 -4 ,_ Y=-X ----------Figure 4. Plot of r'(X) 3 ,---------------------------------r"(X) vs. X ------+---" -3 -2 1 0 3 Y=r"(X) Y=-1 ---------Figure 5. Plot of r"(X) Theorem 1: Matrix A for a strongly connected digraph is a symmetric negative semidefinite matrix with rank n -1. Any 1 7
PAGE 26
vector m the null space of matrix A 1s of the form a 1, where a is a real number. Proof: J' Let A= JW2 ij =-a,, for 1 :s; i :s; n, and Ia)> 0 for all! :s; i :s; n. Thus matrix A is jo:d j:t:.i diagonally dominant. Consider a vector x = x, m the null space of matrix A. Let x, b e x, the max1mum of all x, for 1 :'> j :'> n. Hence Multiplying the kth row of A by .X, we get Since the digraph is strongly connected, and not all the xjs on the right hand side are zero. Using triangular inequality, we have, 1 8
PAGE 27
:.I a"ll x,I:S la"llx11+1a,llx21+ .... +la,,_,llx,_11+1a,,+111 x,+11+ ... +Ia,, llx, I Since x, is the maximum of all x,, where I::; i::; n, we have, Ia,, II x, IS laklllx,l+l a,211x, II+ .... +la,,_,llx, l+la,,+111x, 1+ ... +Ia,, llx, I = lx,l(akl+ .... +a,,_, + akk+1+ .... +a,,). Now consider a set M s {1,2, ..... n} such that if i EM then x, = x,. If M = {1,2, ..... ,n} then we get x, = x,, x2 = x,, .... ,x, = x,. If not, there is at least one element, say j, in {1,2, .... ,n}-M. Since the digraph D is strongly connected, there exists a path from every i EM to J. Therefore, we can find at least one element i EM such that beats j. Similarly, there must exist a path from j to every i EM. This implies that au > 0. But aii =L,aij. So -aux1 :s; ailxi+ ....... +ai1 1xi + aii+Jx;+ ..... +a1nxi = -ai1x1 j=l j,oi Therefore, x1 = x, gives ailxl + a12X2 + .... +aik-Jxk-1 + aik+Jxk+J + ... +ainx11 = ailxk + ai2xk + ... +aik-lxk + aik+Jxk + ... +ainxk. But this would imply that aiixj = aux, = aux,, since au> 0. :. xj = x, =; j EM. This is contradictory to our earlier assumption that jE{!,2, .... ,n}-M. Therefore M={l,2, ..... ,n} must be true. But x, = x, ViE M. That proves that x1 = x,, x2 = x,. .... ,x, = x,, where, x, x = IS in the null space of matrix A. 1 9
PAGE 28
We can see that the product of any row of matrix A with the vector 1 is zero. Thus 1 is in the null space of matrix A. This means the dimension of the null space of matrix A is at least 1. Since we just proved that all entries in any null vector of matrix A are equal, A' i=O, implies i = a1, for some a E R. Therefore, dim(nullspace(A))::; 1. So dim(nullspace(A)) = 1. Thus we can conclude that the rank of matrix A must be n -l. We wish to prove next that matrix A is negative semidefinite. Matrix A would be negative semidefinite if and only if all its eigenvalues are nonpositive. Since the rank of matrix A is n -l, one of its n eigenvalues is zero. Ay = ;l.,y, where A is the eigenvalue of matrix A associated with eigenvector y Therefore, (A-M,)y =0. Now if A > 0, then the subtraction of positive terms from the main diagonal terms of matrix A would make the main diagonal terms of (A-M,) smaller than the sum of the corresponding off-diagonal terms. But (A-M,) is diagonally dominant. Therefore, A > 0 cannot be true. That means the eigenvalues of matrix A cannot be positive. Ay = ;l.,y =;A ::; 0. This would be true only if A is negative semidefinite. Hence matrix A is negative semidefinite with rank n-1 for a strongly connected graph. 20
PAGE 29
5. Computation of the Solution The linear system of n equations can be solved iteratively by Gauss-Seidel method to get a quickly converging solution to the problem. Using Gauss-Seidel method of iterations we get, i-1 'I:' I k J 'I:' I k-1 J b L...Jaijwj L..JaUwj + 1 wl'l = i=' i=i+l where wl'l is the value of the ith I I aii component of the vector r"v obtained in the kth iteration. w1 1 is accepted as a reasonable approximation to the solution if llw1 1 -w1 ,_, 111 1 1 S: TOL, where TOL represents the desired accuracy. llw We are able to get a convergence to a solution using the Gauss Seidel method due to the negative semidefiniteness of matrix A. The program 'GAUSS does this iterative computation. The reader is referred to the program listing in Appendix A. The values of w for the digraph in Figure 1. were as follows: 0.04128163 w= -0.04128163 -0.2806859 0.2806859 Thus teams 4 and 1 were ranked in the first and second position respectively. Team 3 finished last. Next we prove that it is possible to get a maximum likelihood estimate of these paired comparisons for the case of a strongly connected digraph. Theorem 2: There exists a umque maximum to the likelihood function
PAGE 30
Proof: We first prove that there exists a maximum solution to this problem for the case of a strongly connected digraph. Consider the n x 1 vector x. We have, I, x, = 0, because of the i=l constraint defined for the mathematical model defined m chapter 3. Let llxll_=k. Hence lx,l=k, for some i, where 1::>i::>n. We assume that x,;:: 0. The sum of the remaining n -1 variables would add to -k, and therefore their average would be -!7(11 _1 ). So there exists a minimum, say xi, among these n -1 variables, such that <-kl f 1 < "< . xi-/(n-1 ), or -J-n,J*l. :. x,-xi ;:: k-( -Jcn -1)) = n!Y(n -1). Consider r(y) =In( F(y)). Since F(y) is in (0,1) for all y, -oo::>y::>=, the value of r(y) is always negative. Applying Taylor's theorem, we have, r(y) = r(O) + r'(O)y + r"( for some (O,y ). Since Lemma 1 gives r(y)::>r(O)+r'(O)y, for ally. Since r(O) =-In(2), and r'(O)=J%, we have, r(y)<-In(2)+J%y, for all y E R. If the underlying graph is strongly connected, there exists a path from node i to node j and vice versa. Since the total number of nodes are n, the length of this path would be at most n-I. We now consider a sequence of t teams including team j, that are on the path from j to i. If team j beats team z, then t=l. Otherwise, there would be some team, whom team j beats, which in turn beats some other team, which would 'finally beat team i. For the sake of simplicity, we represent team j as } 0 and team i as },. Hence the sequence of the teams on the path 22
PAGE 31
from }0 to }, can be represented as }0 }, ---7 }2 ---7 ... ---7 },_1 ---7 }, This is a path with length t and is a subset of the whole graph. Hence t $ n -l.We now look at the function cp0(:t). n n I, I,sm,r(xm -x,) m=! 1=1 S111f>Ol-;em $s1 1 r(x1 -x1 )+s1 1 r(x. -x. )+ ...... +s. r(x -x. ), 0 1 0 l 2 h h 1:-! ft-1 it since s1 1 0 for all 0 $ q $ t-I, and r(y) < 0 for all -oo $ y $ oo. q q+l $ r(x1 -x1 ) + r(x1 -x. )+ ...... +r(x. x ), 0 I I f2 lt-1 .fr since s1 1 :2: I for all 0 $ q $ t -I. q q+J By using the upper bound on r(x) that we derived earlier, we find that, cp0(i) $ (-In(2)+J%(xj0 -xj,))+ .... +(-lu(2)+J%h., -xj,)). :. cp0(x)$-tlu(2)+ rv:.(x1 -x1 +x. -x + .... +x. -x1.) IJ7;c Q I }j .f2 it-1 I $-nln(2)+ rv:.(x1 -x,. +x1 -x + .... +x. -x1 ). v 7 Jr 0 I I h h-1 I The terms on the right hand side cancel out to give, cp0(x)$-lu(2)+-JY,;'(xj, -xj,) Since xj $ -,.Ycn -I) and I/ k, we have, cp0(i) $ -nln(2)J%k/(n -I)' X-l).This gives the upper bound on cp0(i). We can prove this result in a similar fashion for the case of x, $0. cp0(i) is a linear combination of the natural logarithm of error functions. Error function F(y) is a continuous function with its range in (0,1) for all yE R .. The function ln(v) is a real valued function for any v E R, and it is continuous in the interval (0,1). Therefore, r(y) which equals ln(F(y)) is a continuous function on (0,1). Hence cp0(i) is a continuous function in R;. Let Z be a set where cp0(z) :2: cp0(6), 'dz E Z. Since 6 E Z, Z is a 23
PAGE 32
nonempty set. The upper bound that. was just proved to exist on cp0(x) is also an upper bound for cp0(6). Also cp0(x'):,; cp0(6). for all x';;; Z. We just proved the existence of an upper bound on x for a strongly connected digraph D. Therefore, cp(6)s;;cp(z)s;;-n!n(2)-.JY,;k/(n-l)l:fzEZ. Hence Z is a closed set. Finally, Z is a compact set because of the continuity of cp0 in (0,1). A continuous function attains a minimum and a maximum on a closed and compact set [8]. Using this result we can conclude that cp0 attains a maximum at some point x E Z. Thus we have proved the existence of the maximum solution to the likelihood function corresponding to a strongly connected digraph. We now prove that there IS a umque maximum solution. We know that the function cp0(x) is twice differentiable. Let there be a maximum solution to the problem at two vector points a and b. Therefore, cp;(a) = cp;(E) = 6. Using Taylor's theorem, we have, cp;(E)=cp;(a)+cp;(i')(E-a).forsomecsuchthat c=aa+(!-a)b ....... (5.!) for some a E (0, !). cp;(E)-cp;(a) = 6-6 = 6 = cp;(c)(E-a) ................. (5.2) Therefore, (E -a) is in the null space of cp;(c). Since the sum of all entries in a and in b is zero as a result of the constraint defined earlier, a and b are orthogonal to i. Hence b-a is orthogonal to 1. Therefore, F(E-a)=O. Using the result in theorem 1, we can denote b-a in terms of 1 as b-a = {31, where f3 is a scalar. :. F(f31) = 0. :. f3(F1) = f3 n = 0 f3 = 0. 24
PAGE 33
:. [,-a= 6 =>[,=a. This proves the uniqueness of the max1mum solution to this problem. 25
PAGE 34
6. Applications We have seen that this method can be used for ranking teams in an incomplete or a complete tournament. Although the case of ties has not been considered here, it could be shown that the same approach would give us a solution for a strongly connected graph, if the tied teams are awarded Yz a point each. In general, this method of ranking can be used for ranking a number of objects in a sample. If we are considering different varieties of chickens for their quality, we can make pairwise comparisons depending upon their relative merits in different attributes of interest. Then, with the probabilistic approach used above, we can rank them accordingly. Other potential uses would be for pairwise comparisons in areas such as sensory testing, especially taste testing, to consumer tests, personnel rating, and in general to the study of preference and choice behavior. The methodology presented in this paper was applied to find the ranking of the teams participating in the preplayoff games m the 1993-94 NFL football season. The rankings of the 28 participating teams are given in Table 1 in Appendix A. Table 2 in Appendix A gives the results of the playoff games and also of the Super Bowl. It can be seen from Table 2 playoff results that Kansas Chiefs were indeed better than Houston Oilers as suggested by the rankings, since they beat the Houston m the playoff games. Table 3 in Appendix A gives the schedule of the pre-playoff games for both of these teams. We can see that 26
PAGE 35
although Kansas Chiefs had 11 wins to their credit as against the 12 wins of Houston Oilers, they played against better teams. They also won against some of the better teams and thus their being ranked ahead of Houston Oilers has some justification. Nine out of the eleven playoff games gave results which agree with the ranking of teams given in Table l. A similar approach has been used by Jech [6]. That approach gives a probabilistic estimate of how much better a team is than some other team. Hence decision making is done by using the transitive relationship, that is, team A better than team B and team B better than team C implies that team A is better than team C. 27
PAGE 36
Appendix A Table 1: Results of NFL Pre-Playoff Games (1993-94 Season) Name of the Team Wins Losses Weight Rank Buffalo Bills AFC 1 2 4 0. 7750839 1 Dallas Cowboys NFC 1 2 4 0.6917094 2 Kansas City Chiefs AFC 1 1 5 0.6418724 3 Houston Oilers AFC 1 2 4 0.5756177 4 New York Giants NFC 1 1 5 0.4976263 5 Los Angeles Raiders AFC 1 0 6 0.3867019 6 Green Bay Packers NFC 9 7 0.2879164 7 Minnesota Vikings NFC 9 7 0.2781058 8 Denver Broncos AFC 9 7 0.2324088 9 Detroit Lions NFC 1 0 6 0.2276092 1 0 San Francisco 49ers NFC 1 0 6 0.2203107 1 1 Miami DolPhins AFC 9 7 0.1929059 1 2 San Diego Chargers AFC 8 8 0.1433809 1 3 Philadelphia Eagles NFC 8 8 0.08746736 1 4 Pittsburgh Steelers AFC 9 7 0.06787151 1 5 New York Jets AFC 8 8 0.00514422 1 6 Chicago Bears NFC 7 9 -0.0464035 1 7 New Orleans Saints NFC 8 8 -0.0555461 1 8 Arizona Cardinals NFC 7 9 -0.1798368 1 9 Seattle Seahawks AFC 6 1 0 -0.2970637 20 Cleveland Browns AFC 7 9 -0.318633 2 1 Tampa Bay Bucaneers NFC 5 1 1 -0.356672 22 Atlanta Falcons NFC 6 1 0 -0.4439348 23 Los Angeles Rams NFC 5 1 1 -0.5727481 24 New England Patriots AFC 5 1 1 -0.6100564 25 Washington Redskins NFC 4 1 2 -0.654241 26 Indianapolis Colts AFC 4 1 2 -0.7633866 27 Cincinnati Bengals AFC 3 1 3 -1.00721 28 Source: 1993-94 NFL Season 28
PAGE 37
Table 2. Results of NFL Playoff Games ( 1993-94 Season) Lower Ranked Team Higher Ranked Team Denver Broncos (AFC) LA Raiders (AFC)* Pittsburoh Steelers (AFC) Kansas City Chiefs(AFCl* Houston Oilers (AFC) Kansas City Chiefs(AFC)* LA Raiders (AFC) Buffalo Bills (AFC)* Kansas City Chiefs (AFC) Buffalo Bills (AFC)* Minnesota Vikings (NFC) New York Giants (NFC)* Detroit Lions Green Bay Packers* San Francisco 49ers (NFC)* New York Giants (NFC) Green Bay Packers (NFC) Dallas Cowboys NFC)* San Francisco 49ers (NFC) Dallas Cowboys NFC)* Dallas Cowboys(NFC)* Buffalo Bills (AFC) Note: indicates the winning team. Table 3. Pre-Playoff NFL Schedule (1993-94 Season) Team :Kansas City Chiefs Team: Houston Oilers Record: 11 Wins Rank: 3 Played Against Tampa Bav Sue. Houston Oilers Denver Broncos L.A. Raiders Cincinnati Bengals San Dieoo Chrgrs Miami Dolphins Green Bav Packrs L.A. Raiders Chicaoo Bears Buffalo Bills Seattle Seahawks Denver Broncos San Diego Chrgrs Minnesota Vikings Seattle Seahawks 5 Losses Rank Result 22 w 4 L 9 w 6 w 28 w 1 3 w 1 2 L 7 w 6 w 1 7 L 1 w 20 w 9 L 1 3 w 8 L 20 w Source: 1993-94 NFL Season 29 Record: 12 Wins Rank:4 Played Against New Or. Saints Kan.City Chiefs San Diego Chrgrs L.A. Rams Buffalo Bills New Eng. Pats Cincinnati Bengals Seattle Seahawks Cincinnati Bengals Cleveland Browns Pitts. Steelers Atlanta Falcons Cleveland Browns Pitts. Steelers San Fran. 49ers N.Y. Jets 4 Losses Rank Result 1 8 L 3 w 1 3 L 24 L 1 L 25 w 28 w 20 w 28 w 2 1 w 1 5 w 23 w 21 w 1 5 w 1 1 w 1 6 W'
PAGE 38
Appendix B !******************************************************************** The main program calls the function input() to read the data. The solution is obtained by Newton's method. The Matrices used in Newton's method are initialized using the function initialize(). The function normal_area() is called to calculate normal area function, r(X), r'(X) and r"(X). The computed matrices Anxn and bnxl are checked by the function check(). The function gauss_seidel() is called to do Gauss-Seidel iterations. The convergence of the solution is checked by computing the L2 norm, for which the function 12_norm() is called. *******************************************************************/ #include #include FILE *in, *out; void input(); void initialize(); void normal _area(); void 12_norm(); void check(); void check_sum(); void gauss_seidel(); double val, x, q, PI ; double r ,old_total_r ,new _total_r ,r 1 ,r2,a[ 100 ][1 OO],b [ 1 00], old_x[1 OO],new _x[l OO],norm, gaussold_x[l 00], gaussnew _x[ 1 00]; int no_ var,no_games,iteration,incidence[ 1 00] [ 1 OO],row _sum,column_sum, main_counter=O; void main() { int k,i,j; 30
PAGE 39
PI = 3.14159265358979; while(main_counter == 0) { printf("\nlteration %d\n" ,iteration); if(iteration == 0) { ) in = fopen("in.dat", "r"); out = fopen("out.dat" ,"w"); q = sqrt(2.0*PI); input(); for(i=O;iO) { x = fabs(old_x[i)-old_xU)); val = old_x[i]-old_xUJ; normal_area(); new _total_r += r*incidence[iJU]; a[i]UJ += -incidence[iJU]*r2; aUJ[i] += -incidence[iJU]*r2; a[i][i) += incidence[i]UJ*r2; aUJUJ += incidence[i]UJ*r2; 3 1
PAGE 40
} b[i] += incidence[iJU]*rl; bUJ += -incidence[iJU]*rl; if(fabs(new _total_r-old_total_r )< l.Oe-3) { } main_counter= 1; printf("\n The Solution has converged to its maximum\n"); printf("\n o/od The value of the likelihood function=o/olf", i teration,new _total_r); for(i=O;i
PAGE 41
void input() !**************************************************************** Program to read the input file The program also validates the data. ******************************************************************/ { int k,i,j,sum; fscanf(in, "%d" ,&no_ var); for(k=O;k
PAGE 42
for(j=O;j
PAGE 43
int i,no_inter,j; double a,u ,h,integral,derive_l ,deri ve_2; static int count; integral = derive_! = derive_2 =0.0; no_inter=O; if (x > l.Oe-6) { h = sqrt(sqrt((6.0e-10*q))); h I= sqrt(sqrt(x)); if(h > x) { } else { } no_inter = 2; no_inter= (int)(x/h); if(no_inter % 2 != 0) ( no_inter+= 1; else { no_inter += 2; } h = x/(double)(no_inter); integral = 1.0/q; integral += (exp(-(x*x/2.0)))/q; for(i=l; i
PAGE 44
{ integral += 0.5; } else { integral = 0.5 integral; } else { integral = 0.5; } printf ("\n The area under the normal curve for x= o/oe IS o/olf\n", val, integral); r = log(integral); derive_! = ( 1.0/q)*exp( -l.O*val *val/2.0); derive_2 = -l.O*derive_l *val; r1 = derive_l!integral; r2 = (derive_2*integral derive_l *derive_ I )/(integral*integral); return; !******************************************************************** Program to compute Lz norm for an iteration. *********************************************************************! void l2_norm() { } int k; double sum =0.0; norm = 0.0; for(k=O;k
PAGE 45
/************************************************************ This program checks if the sum of all entries inside box 1 is zero. *************************************************************/ void check() { } int k,i,j; double sum=O.O; for(k=O; k
PAGE 46
} } for(j=O;j inf_norm) { inf_norm = fabs(gaussnew _x[i]); } if( old_inf_norm < new _inf_norm) { old_inf_norm=new _inf_norm; } gaussold_x[i] = gaussnew_x[i]; do_ c heck=old_i nf _n orm/inf _norm; if(do_check
PAGE 47
!****************************************************************** Program to identify two highest ranked teams in each of the two football conferences ********************************************************************! void check_sum() int i,imax=O,second_imax=O,league=l; double max=O.O,second_max=O.O, var_sum=O.O; for(i=O; imax) { } second_max = max; max = new _x[i]; second_imax=imax; imax = i; else if(new _x(i] > second_max) { } second_max = new _x[i]; second_imax = 1; printf("\n The first ranked team for league %d is %d", league,imax); printf("\n The weight associated with team %d is %lf\n", imax,max); printf("\n The second ranked team for league %d is %d", league,second_imax); printf("\n The weight associated with team %d is %1f\n", second_imax, second_max); league += 1; imax=second_imax=O; max=second_max=O .0; for(i=no_ var/2;imax) 39
PAGE 48
} { } second_max=max; max=new_x[i]; imax=i; else if( new _x[i] > second_max) { } second_max = new _x[i]; second imax =t; printf("\n The first ranked team for league o/od is o/od", league,imax); printf("\n The weight associated with team o/od is o/olf\n" ,imax,max); } printf("\n The second ranked team in league o/od is o/od", league, second_imax); printf("\n The weight associated with team o/od is o/olf\n", second_imax, second_max); printf("\n The sum of all variables = %If", var_sum); return; 40
PAGE 49
References 1. Burden, R. A., Faires, J. D. (1989). Numerical Analysis. PWS KENT Publishing Company. 2. David, H. A. (1988). The method of paired comparisons. Oxford University Press, New York. 3. Fechner, G. T. (1860). Elemente der Psychophysik. Leipzig: Breitkopf und Hartel. 4. Fechner, G. T. (1965). Elements of Psychophysics, Vol. 1, trans!. H. E. Adler. New York: Holt, Rinehart & Winston. 5. Genest, C., LaPointe, F., Drury, S.W. (1993). On a proposal of Jensen for the analysis of ordinal pairwise preferences using Saaty's eigenvector scaling method. Journal of Mathematical Psychology 37, 002-038(1993). 6. Jech, T., (1983). The Ranking of Incomplete Tournaments: A Mathematician's Guide to Popular Sports. American Mathematical Monthly Vol. 90, No. 4, 246-264, April 1993. 7. Noether, G.E., (1960). Remarks about a paired Comparison Model. Psychometrika, Vol 25, No.4. 8. Rosenlicht, M., (1968) Introduction to Analysis. Dover Publications, Inc., New York. 9. Saaty, T.L., (1977). A scaling method for priorities in hierarchical structures. Journal of Mathematical Psychology, 15, 234-281. 10. Saaty, T.L., (1980). The analytic hierarchy process. New York, NY : McGraw-Hill. 4 1
PAGE 50
11. Saaty, T.L., (1986). Axiomatic foundation of the analytic hierarchy process. Management Science, 32, 841-855. 12. Saaty, T.L., (1990). Eigenvector and logarithmic least squares. European Journal of Operational Research, 48, 156-160. 13. Thurstone, L. L. (1927a). A Law of comparative judgment. Psycho!. Rev. 34, 273-86. 14. Thurstone, L. L. (1927b). Psychophysical Analysis. Amer. J. Psycho!. 38, 368-89. 15. Thurstone, L. L. (1927c). The method of paired comparisons for social values. J. Abnorm. Soc. Psycho!. 21, 384-400. 42