FACTOR ANALYSIS OF SIMULATED EQUILIBRIA
by
Joseph M. Conny
B.S., State University of New York at Binghamton, 1978
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado at Denver in partial fulfillment
of the requirements for the degree of
Master of Science
Department of Chemistry
This thesis for the Master of Science degree by
Joseph M. Conny
has been approved for the
Department of Chemistry
by
Conny, Joseph M. (M.S., Chemistry)
Factor Analysis of Simulated Equilibria
Thesis directed by Professor Adjoint Robert R. Meglen
Factor analysis was performed on simulated data
matrices that represented the sampling of aqueous complex
equilibria. Using a modification of the program COMPLEX,
data vectors were generated as a function of pH and the
concentrations of selected reactants. Data pretreatment
(log transformation and autoscaling) and factor modelling
were accomplished using the statistical program ARTHUR.
Factor modelling involved eigenextraction and Kaiser's
Varimax rotation. Scree plots and cumulative frequency
plots of principal component variance were used to
determine the number of principal components that
accounted for relevant variance.
For a data matrix representing the sampling of a
single equilibrium, results showed that abstract factors
are chemically interpretable as the expected associations
of chemical species in equilibrium. Factor models showed
that the associations of chemical species are affected
little by the type of distribution (log normal or linear
normal) of the reactants. By analysing data matrices
with the same pH mean but different pH ranges, non
linearity in the data was shown to affect factor
modelling. Associations of the chemical species were
IV
shown to change substantially with different values of
the pH mean. In addition, factor models showed that a
ligand base with a low pK^ (high pKa conjugate acid)
formed complexes that were more sensitive to slight pH
changes than those complexes containing a ligand base
with a high pKb (low pKa conjugate acid).
The factor model of an equilibrium data matrix
containing a slack variable showed that the slack
variable behaves independently of the remaining
variables.
Equilibrium concentrations were perturbed to
simulate random normal measurement error in the data
matrix. Factors were chemically interpretable with
moderate amounts of error (10%).
Factor analysis was performed on an error
perturbed data matrix containing four groups of data
vectors. These groups of data vectors represented the
sampling of four equilibria. Groups differed by the
types of reactants held constant and the types of
reactants that varied with normal distributions. Results
showed that data vectors in each equilibrium can be
identified from factor score plots.
DEDICATION
For Beth, who understood the need to keep
going. For Julie, who helped me understand the need
to stop.
VI
ACKNOWLEDGEMENT
I would like to thank Dr. Robert R. Meglen for
allowing me to pursue an idea for a research project
which he harbored for several years. Without his
support during times of frustration and uncertain reward,
this project would likely have remained little more than
an idea for a few more years.
CONTENTS
CHAPTER
1. INTRODUCTION.................................. 1
1.1 Theoretical Background.................. 2
2. EXPERIMENTAL SECTION.......................... 9
2.1 Generation of Equilibrium Data.......... 9
2.1.1 Generation of Equilibrium Data
With Error........................... 18
2.2 Statistical Procedures................. 18
2.2.1 Data Pretreatment...................... 19
2.2.2 Factor Analysis........................ 22
3. RESULTS AND DISCUSSION.......................... 25
3.1 Factor Models of Single Equilibria
Without Error....................... 25
3.1.1 The CdNHoen Equilibrium at pH
8.39 +/ 0.1....................... 25
3.1.1.1 Principal Component Rotation........... 27
3.1.1.2 Three Factor Model..................... 38
3.1.1.3 Comparison of the Two Factor Model
With the Three Factor Model.......... 42
3.1.1.4 Comparison of Factor Models Based on
Matrices With a Multivariate Log
Normal Distribution and the Multi
variate Linear Normal Distribution 45
3.1.2 Analysis of the CdNHjen
Equilibrium Over Various pH Ranges 45
3.1.3 Analysis of the CdNH^en
Equilibrium at pH 5.85 +/ 0.1 and
pH 10.93 +/ 0.1..................... 60
Vlll
3.1.3.1 The pH 5.85 + / 0.1 Matrix Factor
Model............................... 60
3.1.3.2 The pH 10.93 + / 0.1 Matrix Factor
Model............................. 64
3.1.4 Analysis of the CdNH^en
Equilibrium Matrix With a Slack
Variable............................ 68
3.1.5 Analysis of the CdZnFeNH3~py
en Equilibrium...................... 72
3.1.5.1 The pH 6.0 +/ 0.1 Matrix Factor
Model............................. 77
3.1.5.2 The pH 7.0 +/ 0.1 Matrix Factor
Model............................. 85
3.1.5.3 The pH 8.0 +/ 0.1 Matrix Factor
Model............................. 88
3.1.5.4 The pH 9.0 +/ 0.1 Matrix Factor
Model............................. 88
3.1.5.5 The pH 10.0 +/ 0.1 Matrix Factor
Model............................. 93
3.2 Factor Models of Single Equilibria
With Error........................ 9 8
3.2.1 Comparison of Factor Models of an
Error Perturbed Data Matrix and a
Nonperturbed Data Matrix......... 9 8
3.2.2 The Effect of Increasing Amounts of
Error. ........................... Ill
3.3 Factor Model of Multiple Equilibria
With Error....................... 116
3.3.1 Factors From the Multiple
Equilibria Model.................. 121
3.3.2 Visualization of Separate Equilibria 130
4. CONCLUSION.................................... 13 8
BIBLIOGRAPHY
142
IX
APPENDIX
A. FORTRAN77 Program COMPLEX6..................... 146
B. FORTRAN77 Program COMPLEX7..................... 148
C. BASIC Program RNDERR........................... 150
D. BASIC Program RNDVAR........................... 151
E. BASIC Program ARTTAB
153
TABLES
Table
2.1 Logarithms of Cumulative Stability Constants
and pKa's Used for Data Simulations........ 14
3.1 Chemical Species in the CdNl^en
Equilibrium Model........................... 26
3.2 Principal Component Loadings and Variance
Contributions From the CdNHoen
Equilibrium Matrix at pH 8.39 + / 0.1.... 28
3.3 Factor Loadings and Variance Contributions
From the CdNHoen Equilibrium Matrix at
pH 8.39 +/ 0.1............................. 39
3.4 Loadings and Variance Contributions From
the Two Factor Model of the CdN^en
Equilibrium Matrix at pH 8.39 +/ 0.1.... 43
3.5 Eigenvalues, Loadings and Variance
Contributions From the Three Factor Model
of the Log Normal and Linear Normal Data
Matrices................................... 46
3.6 Portions of the Correlation Matrices From
the Log Normal and Linear Normal Data
Matrices..................................... 49
3.7 Eigenvalues Before and After Rotation of
Five Factors From CdNHjen Equilibrium
Matrices Over Various pH Ranges............. 52
3.8 Comparison of Factor Loadings From the
Three Factor Models of the CdNH^en
Equilibrium Data Matrices Over Various pH
Ranges..................................... 53
3.9 Factor Loadings From the Two Factor Models
of the sig(pH)=0.3 and sig(pH)=0.4 Data
Matrices..................................... 58
3.10 Principal Component Loadings and Variance
1 Contributions From the CdNHoen
Equilibrium Matrix at pH 5.85 +/ 0.1.... 61
XI
3.11 Principal Component Loadings and Variance
Contributions From the CdNHoen
Equililbrium Matrix at pH 5.85 +/ 0.1
Following Deletion of Cd^+ and NH^+. 63
3.12 Eigenvalues Before and After Rotating Five
Principal Components From the CdNHoen
Equilibrium Matrix at pH 10.93 + / 0.1.... 65
3.13 Factor Loadings and Variance Contributions
From the CdNHoen Equilibrium Matrix at
pH 10.93 +/ 0.1............................. 66
3.14 Eigenvalues Before and After Rotating Five
Principal Components From Matrices With
and Without a Slack Variable................. 69
3.15 Factor Loadings and Variance Contributions
From the Three Factor Model of the
CdNH3en Equilibrium Matrix (pH 8.39
+ / 0.1) Containing a Slack Variable..... 73
3.16 Chemical Species in the CdZnFeNH^pyen
Equilibrium Model............................ 75
3.17 Factor Loadings From the CdZnFeNH^pyen
Equilibrium Matrix at pH 6.0 +/ 0.1...... 80
3.18 Factor Loadings From the CdZnFeNH^pyen
Equilibrium Matrix at pH 7.0 + / 0.1..... 86
3.19 Factor Loadings From the CdZnFeNH^pyen
Equilibrium Matrix at pH.8.0 + / 0.1....... 88
3.20 Factor Loadings From the CdZnFeNH^pyen
Equilibrium Matrix at pH 9.0 + / 0.1..... 91
3.21 Factor Loadings From the CdZnFeNHopyen
Equilibrium Matrix at pH 10.0 +/ 0.1....... 94
3.22, Comparison of Factor Eigenvalues From All
CdZnFeNH^pyen Equilibrium Data
Matrices............................................. 97
3.23 Comparison of Factor Loadings and Variance
Contributions From the Error Perturbed
and Unperturbed CdNH3en Equilibrium
Matrices............................................ 104
Xll
3.24 Relative Standard Deviations of Log
Transformed Variables in the CdNH^en
Equilibrium Matrix................. 107
3.25 Factor Loadings and Variance Contributions
From the CdNI^en Equilibrium Matrix
With Error and a Slack Variable........... 110
3.26 Factor Loadings and Variance Contributions
From the CdNHgen Equilibrium Matrix
Containing 15% Error........................ 113
3.27 Factor Loadings and Variance Contributions
From the CdNH^en Equilibrium Matrix
Containing 20% Error........................ 115
3.28 Chemical Species in the Multiple
Equilibrium Model........................... 118
3.29 Reactant Concentration Parameters for
Categories of Data Vectors From the
Multiple Equilibrium Matrix................. 122
3.30 Factor Loadings From the Multiple
Equilibrium Matrix.......................... 126
3.31 Factor Loadings of Acids and Bases From
Category Three of the Multiple
Equilibrium Model......................... 131
FIGURES
Figure
1. FORTRAN Program COMPLEX......................... 10
2. Algorithm for Generating Numbers From a
Random Normal Distribution.................... 17
3. Scree Plot for the CdNI^en Equilibrium
Data Matrix at pH 8.39 +/ 0.1............ 33
4. Comparison of Scree Plots of Rotated and
Unrotated Principal Components From the
CdNHoen Equilibrium Data Matrix at
pH 8.39 +/ 0.1............................... 34
5. Cumulative Frequency Plot of Principal
Component Variance From the CdNHgen
Equilibrium Matrix at pH 8.39 +/ 0.1. 35
6. Cumulative Frequency Plot of Variance From
the First Five Principal Components of the
CdNHoen Equilibrium Matrix at
pH 8.39 +/ 0.1............................. 37
7. Scree Plots for the CdZnFeN^pyen
Equilibrium Data Matrices..................... 78
8. Comparison of Scree Plots for the CdNHoen
Equilibrium Data Matrix Perturbed Witn 10%
Error and the Unperturbed Data Matrix..... 99
9. Comparison of Eigenvalues From Four Factors
as Principal Components From the Error
Perturbed Data Matrix Are Sequentially
Rotated..................................... 102
10. Change in Eigenvalue of the Third Principal
Component From the Error Perturbed Data
Matrix Containing a Slack Variable as
Principal Components Are Sequentially
Rotated.................................... . 108
11. Change in Eigenvalues of the .First Five
Principal Components as Error Increases...
112
XIV
12. Comparison of Scree Plots of Rotated and
Unrotated Principal Components From the
Multiple Equilibria Data Matrix............ 123
13. Cumulative Frequency Plot of Variance From
the First Eight Principal Components of
the Multiple Equilibria Data Matrix........ 125
14. Factor Score Plot for Factor 2 vs. Factor 1
From the Multiple Equilibria Data Matrix.. 132
15. Factor Score Plot for Factor 4 vs. Factor 3
From the Multiple Equilibria Data Matrix.. 135
16. Factor Score Plot for Factor 5 vs. Factor 1
From the Multiple Equilibria Data Matrix.. 137
CHAPTER 1
INTRODUCTION
The use of factor analysis for a variety of
chemical problems has increased dramatically in the last
ten years. These applications include analytical signal
resolution, determination of the number of species in a
mixture, elucidation of reaction mechanisms and the study
of chemistries in environmental systems from large data
bases.[14] Even with this increase in interest among
chemists, factor analytical techniques remain
controversial. Part of the controversy stems from
skepticism of the capability of correctly interpreting
complicated data from a reduced set of composite
variables without prior knowledge of the
interrelationships in the data. Another part of the
controversy stems from a general lack of understanding of
the way measurement error affects the factor model.
Since its initial use in chemistry, factor
analysis has been widely used in the determination of co
eluting species in liquid chromatography and GCMS,
signal resolution in FTIR and the determination of
fragmentation patterns in mass spectrometry.[58] More
recently, factor analysis has been applied to the study
2
of reaction mechanisms, kinetics and the determination of
species in chemical equilibrium. For example, Gilbert,
et al., studied the kinetics and mechanism of the xray
decomposition of KC10^.[9] Cox, et al., used factor
analysis of Raman spectra to determine the existence of
specific species and the degree of hydration during the
ionization of aqueous I^SO^.tlO] Reeves, et al., used
factor analysis of visible spectra to determine formation
constants of nickel(II) complexes. [11]
In all of the references mentioned above, real
chemical data generated by a variety of. instruments was
involved. Therefore, the suitability of factor
analytical techniques for studying systems in equilibrium
was assumed. In this research, data matrices of
equilibrium concentrations were generated by computer
from known complex equilibria to simulate a real data
base. Factor models of simulated chemical data have been
studied previously, but data from simulated detection
were employed, not simulated concentrations.[12] In
addition, the simulated data were used to study signal
resolution, not the chemical system itself. By using
computer generated concentrations, abstract factor models
can be scrutinized to see if they agree with what is
known about an equilibrium. In this way, we address the
concern of whether abstract factor models are viable
3
tools for analyzing the interrelationships in chemical
equilibria. Data bases composed of equilibrium
concentrations are appropriate for this purpose because
relationships among variables in an equilibrium are
inherently nonlinear. In contrast, abstract factor
models are based on linear combinations of variables from
the data matrix.[13] Jochum and Kowalski have stated
that nonlinear techniques such as leastsquares
projection should be included with factor analysis when
studying chemical data.[14]
In abstract factor analysis of real chemical
data, indeterminant error associated with sampling and
detection is assumed to be an integral part of chemical
measurement. It is often difficult to determine how much
error is in the data. As a consequence, it is often
difficult to determine how much damage error causes to
the interpretation of a factor model. One of the first
studies to determine the criterion for chosing factors
that are negligibly affected by analytical error was
presented by Duewer, et al.[15] Malinowski further
defined the types of analytical error that affect factor
models.[16,17] One of the tools he used for testing his
theory of error was an artificial data matrix. However,
this matrix was not a simulated chemical data base such
as detector responses or concentrations from a chemical
equilibrium. In the present work, the effect of error was
4
studied by comparing abstract factor models of simulated
equilibrium data without error and abstract factor models
of simulated equilibrium data with error.
Often, chemical data bases contain
interrelationships that are described as multiple
equilibria. These types of data are commonly
encountered in the factor analysis of environmental
samples.[1820] One group of samples in the data matrix
are separate from other groups because the data reveal a
separate equilibrium. For example, the sample group may
be taken from a single lake or aquifer where a single
aqueous equilibrium exists. Other groups of samples may
represent other lakes or aquifers where different
equilibria exist. In the final part of this research,
data matrices where groups of data vectors represented
separate equilibria were pooled to examine the ability of
factor analysis to uncover separate chemical
characteristics. In this way, factor models of multiple
equilibria were compared with the known
interrelationships in the data matrix.
The objective of this part of the research was to
demonstrate the feasibility of creating a simulated
multiple equilibria data matrix as a data base for
testing different analytical methodologies involving
factor models as well as testing the interpretive
5
abilities of the factor analyst himself. Currie has
shown through the use of simulated test data evaluation
that data interpretation among different laboratories is
seriously flawed by bias.[21] Currie suggests that the
scientific judgement and intuition which misleads the
analyst could be controlled through the use of powerful
modelling procedures such as factor analysis.
Simulated aqueous complex equilibria were
designed so that a variety of different types of species
were present. Since metals were included, the
equilibrium concentrations of all species, free metal and
base ions along with complexes and acids were highly
dependent on pH. Therefore, there were two ways of
making one "sample" in the data matrix different from
another "sample". First, the pH could vary. Second, the
total concentration of metal reactants and ligand
reactants could vary. In each matrix, pH and the
concentration of one or more selected reactants were
varied simultaneously. The purpose for simultaneously
varying pH and reactant concentrations was to simulate
the sampling of natural equilibria where neither the pH
nor concentration of reactants is expected to remain
constant.
6
1.1 Theoretical Background
Factor analysis involves the extraction of
eigenvectors and eigenvalues from the variancecovariance
matrix or correlation matrix. Eigenvectors are referred
to as principal components. The variancecovariance or
correlation matrix is used because these matrices contain
information concerning proportional interactions among
the variables. In practice, the correlation matrix is
preferable to the variancecovariance matrix. The reason
for this preference will be discussed in a later section.
Eigenvectors (principal components) are chosen so
that the following eigenfunction is satisfied:
(1) [A] [X] =\ [X]
In this equation, [A] is the variancecovariance or
correlation matrix, [X] is a column vector and X is a
constant. The column vector [X] is the eigenvector and X
is the eigenvalue. Equation (1) is solved by setting the
right side equal to zero:
(2) ([A]X[I])[X]=0
The eigenvalue, X, must be chosen so that the determinant
of the first term in equation (2) is equal to zero:
(3) AXI=0
The solution yields a first order polynomial equation
such that the number of roots (X's) is equal to the
number of columns (or rows) in the variancecovariance
7
matrix or correlation matrix. Thus, the number of
eigenvalues is equal to the number of variables in the
data matrix.
After the eigenvalues are determined, equation
(2) is solved for the eigenvector [X]. Each eigenvalue
produces an eigenvector.
Interpreted geometrically, the eigenvector is an
orthogonal vector through the swarm of data points in
hyperspace. The eigenvalue is the magnitude and the
eigenvector is the set of slope coordinates of the
vector. The eigenvector with the largest eigenvalue
represents the orthogonal vector that accounts for the
most variance in the data swarm. Conversely, the
smallest eigenvalue represents the orthogonal vector that
accounts for the least variance in the data swarm.
Each observation in the data matrix can be
transformed to an equivalent value in principal component
space where the axes are the orthogonal eigenvectors.
Each observation's score in principal component space is
a.linear combination of products involving the original
variables:
(4) Yij = ajlxli+aj2x2i+***+ajkxik
In this equation, Y^j is the score of the i*"*1 sample on
the principal component axis, aj^ is the first
coefficient of the eigenvector, X^ is the value of
the first variable for the i*^ sample, a^ is the k1'^1
8
eigenvector coefficient of the jth principal component,
and is the value of the k^*1 variable for the i^*1
sample.[22]
Each observation in a data matrix is also a
linear combination of products:
(5)
n
dik=sil1lk+si212k+* *+sin1nk= % sij1jk
1=1
In this equation, s indicates the score of the i
th
sample in j^n principal component, and ljk indicates the
k^ coefficient of the eigenvector. The complete
data matrix can be shown as the product of the scores
matrix and the matrix of eigenvector coefficients
(loadings matrix):
[D] = [S] [L]
[23]
CHAPTER 2
EXPERIMENTAL SECTION
2.1 Generation of Equilibrium Data
Initially, data matrices were generated using a
modified version of the FORTRAN program SOLVAR. [24] As
one of the first computer programs for generating
equilibrium data, SOLVAR uses a cumbersome "trial and
error" algorithm for solving simultaneous equations.
Since its purpose is primarily to assist in the teaching
of elementary equilibria concepts, the number of
equilibrium species SOLVAR can accommodate is limited.
In addition, the,iterative procedures are time consuming.
As research progressed, it became necessary to adopt a
more versatile approach to data generation.
All factor analyzed data matrices were generated
using a modification of the FORTRAN program COMPLEX
(Figure 1.) [25] This program was chosen over other
programs because it satisfied three criteria. First,
large data matrices were required for adequate testing of
factor models. Thus, the generating program had to
accommodate a large number of species. The program
COMICS, upon which COMPLEX is based, has been shown to
accommodate 195 species. [26] Second, the number of
10
X
2
3
4
9
6
7
8
7
10
11
12 100
13 101
14 102
19 103
16 104
17 109
18
IT
20
21
22
23
24
29
26
27
28
27
30 2
31
32
33
34
39
36 3
37
38 13
39
40
41
42 4
43 43
44
49 29
46
47
48 9
44
90
91
92
93 36
94
PROGRAM COMPLEX
CHARACTERS TITLECSO)
DIMENSION ATOT(IB). Yl(18)i MA(IB, SO), Y2(IS>, E(SO).
1A(18>,AK(18>, MOH (SO) MH(SO) > TERM!SO>, COO) > ACALC f 18) > BM 18)
LOGICAL PILE
INOUIRE (FILE'COMPLEXDAT', EXISTFILE)
IF (FILE. EBV. TRUE. ) THEN
OPEN (UNITS, FILE'COMPLEXDAT',STATUS'OLD')
CLOSE (UNIT9, STATUSDELETE')
END IF
OPEN (UNITS, FILE'COMPLEXDAT ', ACCESS 'SEQUENTIAL STATUS'
FORMAT (40ia>
FORMAT (AS)
FORMAT (Ell. 4, F3. 1)
FORMAT (F7. 4)
FORMAT (2212)
FORMAT (F9. 2, III
PRINT*, 'NUMBER OP EXPERIMENTS (12):
READ (>lOO) NE
DO 1 IK1. NE
PRINT*, 'NUMBER OF REACTANTS. COMPLEXES (IZ. 12):
READ (*,100) M.N
DO 2 Il. M
PRINT*, NAME OF REACTANT (AS):
READ (*,101) TITLE!I)
PRINT*, 'CONCENTRATION OF REACTANT,
PRINT*, 'PARAMETER (El 1. 4, F3. 1 >: '
READ (*,102) ATOT(I), BK(I)
rid)ATOT(r)*o. oooi
A(I)ATOT(I)
DO S Jl. N
PRINT*, 'LOO STABILITY CONSTANT OP COMPLEX', J, '(F7. 4):'
READ (*. 103) E(J>
PRINT*, '* UNITS OF REACTANT IN COMPLEX'. J, ', '
PRINT*, '9 0H, HT IN COMPLEX (12, 12, 12): '
READ (*,104) (MA( I, J). Il, M). HOH( J), MH( J)
PRINT*, 'PH (PS. 2), INDEX (II):'
READ (*, 10S) PH, INDEX
NIT0
DO 4 Jl, N
SUME(U)MH( J)*PH+MOH( J)*(PH14. O)
TERM (U ) 10. 0**8UM
CONTINUE
DO 29 Ul, N
C(J)TERM(J)
DO 3 Ul, N
DO 9 Il, M
C ( J ) C ( J ) *A ( I) **HA ( I, J)
NITNIT+1
DO 6 Il,. M
ACALC ( I > A ( I )
DO 36 Jl. N
ACALC(I> ACALC(I>+MA
AK(I)A(I)*(ATOT(I)/ACALC(I))**(1. 0/8K(I))
Source File: ENVIRS>JCONNY>EQUIL>COHPLEX. F77
55 6
96
97 58 7
59 60 16
61 14
62 63 17
64 69 9
66 67 68 11
69 70 12
71 1
72 209
73 206
74 207
79 208
76 77 76 204
Y2(I)ABS(ACALC(I)ATOT(I))
IF (NIT1999) 16,16.7
WRITE (S, 20S) PH
00 TO 11
DO 9 Il, M
IF(Y1 (I >Y2( I) ) 14,9.9
DO 17 IN1, M
A( IN)AK( IN)
00 TO 93
CONTINUE
WRITE (3.209) PH, NIT
WRITE (S.206) ( I, BK( I), TITLE ( I ) ATOT CI),A(I), I 1,M>
WRITE (9,207) (TITLE(I), Il.H)
DO 12 Jl, N
WRITE (9,208) C ( J) > E( J > MH( J), MOH (J), (MA ( I, J) Il.
IF (INDEX. NE 1) 00 TO 13
CONTINUE
FORMAT {//' PH', F7. 3, 40X,
FORMAT ( 110, P10. 9. A10, 2E1S. 3)
FORMAT (/6X, 'C'.SX. LOO BETA'
FORMAT (2E11. 3. 12IS)
FORMAT (//' PH', F7. 3. 40X, NUMBER OF ITERATION', 14/>
, H)
ITERATION DID NOT CONVERGE'/)
, 3X,
H', 4X,
OH', 1B0AS/)
END
Figure 1 FORTRAN program COMPLEX.
NEW' >
11
different equilibria to be studied required that the
program accept a variety of input data for calculating
concentrations of different equilibria. Unlike
algorithms designed for instructional use [24,27],
COMPLEX does not require a unique set of simultaneous
equations for each equilibrium. COMPLEX only requires
constants: pH, total reactant concentration, logarithms
of cumulative stablility constants and pKa's. Third, the
large number of total data matrices to be analysed, each
with hundreds of data vectors, required a fast and
efficient algorithm. On this account, COMPLEX proved to
be several times faster than SOLVAR for generating the
same data matrix. In addition, as a result of shortening
the number commands to 55 from 140 used in COMICS,
COMPLEX was shown to be faster than COMICS.[25]
Compared to some of the more complicated programs
used for modelling equilibrium systems, COMPLEX provides
an additional advantage. Programs such as REDEQL [28]
utilize the NewtonRapheson method [29] for approximating
the roots of polynomials involved in the calculation of
equilibrium concentrations. In the process of finding a
closer approximation, the NewtonRapheson method
occasionally fails to converge on a value for the
root.[30] Other programs such as EQUIL [30] include a
"convergence enforcer", but this involves more
computational steps which reduce speed.
12
In order for COMPLEX to produce the correct set
of equilibrium concentrations, the equilibrium
concentration of each reactant must satisfy the following
relationship:
(1) [Ai
Here, [A^k] is the equilibrium concentration of the i^*1
reactant to the kth approximation. The term
[Ajtot/A.calC] is the ratio of the given total i**1
reactant concentration to the calculated total ith
reactant concentration. In the exponent, p is generally
the largest stoichiometric coefficient of the i^"*1
reactant. Ginzburg [25] has shown that p can remain
constant for all reactants as long as it approximates the
largest coefficient of any reactant in any complex. In
the program, [A^k] is denoted by the variables, AK(I) and
A(I). The terms [A^tot], [A^ca^c] and p are denoted by
ATOT(I), ACALC(I) and BK(I), respectively. If the left
side of the above equation fails to equal the right side
within a tolerance limit, [A^ca*c] is recalculated
iteratively after adjustments are made to the equilibrium
concentrations of the complexes. Since reactants are
either metals or bases, complexes include acids along
with actual metalligand complexes. The calculation in
expression (1) is carried out at line 54 in Figure 1.
Equilibrium concentrations of the complexes are
13
initially set equal to the exponential forms of
conditional cumulative stability constants. These
conditional constants are the cumulative stability
constants (or pKa's) adjusted for the number of H+ or 0H
components in the complex. Equilibrium concentrations
are subsequently calculated during each iteration without
the need for equations involving [H+] or [OH]. In the
program, the log form of the cumulative stability
constant is denoted by E(J). The log form of the
conditional cumulative stability constant is denoted by
SUM and the exponential form by TERM (J). Table 2.1 is a
list of all logs of the cumulative stability constants
and pKa's used in the generation of the data
matrices. [31,32].
In order to calculate the concentration of a
complex, equilibrium concentrations of the reactants in
the complex, adjusted in expression (1) above, are. used
in the following expression:
M ,
(2) = Bi IT [Ai]m(13J
J J i=l
Here, Cj is the concentration of the j*^ complex, Bj is
the exponential form of the conditional cumulative
stability constant for the jth complex, M is the number
of reactants, and m(ij) is the stoichiometric coefficient
of the ith reactant in the jth complex. In the program,
Cj is determined by the variable C(J), M by variable M
14
Table 2.1 Logarithms of Cumulative Stability Constants
and pKa's Used for Data Simulations.[31,32]
Species Constant
Cd(NH3)2+ 2.65
Cd(NH3)22+ 4.75
Cd(NH3)32+ 6.19
Cd(NH3)42+ 7.12
Cd(NH3)52+ 6.80
Cd(NH3)62+ 5.14
Cd(OH)+ 4.17
Cd(OH)2 8.33
Cd(OH)3 9.02
Cd(OH)42" 8.62
Cd(en)2+ 5.47
Cd(en)2^+ 10.09
Cd(en)32+ 12.09
Cd (py) 2 + 1.27
Cd (py) 22 + 2.14
Cd(py)32+ 2.27
Cd (py) 42 + 2.50
Cd(Cl)+ 1.95
Cd(Cl)2 2.50
Cd(Cl)3~ 2.60
Cd(Cl)42" 2.80
Zn(NH3)2+ 2.37
Zn(NH3)22+ 4.81
Species Constant
Zn(NH3j32+ 7.31
Zn (NH3)42 + 9.46
Zn (OH) + 4.40
Zn(OH)2 11.30
Zn (OH) 3_ 14.14
Zn(OH)42 17.66
Zn (en) 2 + 5.77
Zn (en) 22 + 10.83
Zn(en)32+ 14.11
Zn (py) 2 + 1.41
Zn(py)22+ 1.11
Zn(py)32+ 1.61
Zn(py)42+ 1.93
Zn (Cl) + 0.430
Zn(Cl)2 0.610
Zn(Cl) 3 0.530
Zn(Cl) 42" 0.200
Fe(NH3)2+ 1.40
Fe (NH3)22 + 2.20
Fe (OH) + 5.56
Fe (OH) 2 9.77
Fe (OH) ri~ 9.67
Fe(OH)42+
8.58
15
Table
Specie
Fe(en)
Fe(en)
Fe(en)
Fe(py)
Fe(Cl)
nh4+
H(py)+
H(en)+
H2(en)
H(C03)
2.1 (Continued)
Constant
2 + 4.34
2 + 2 7.65
2 + 3 9.70
2 + 0.600
+ 0.360
9.26
5.31
9.93
2 + 6.85
 10.25
h2(C03)
6.38
16
and m(ij) by the variable MA(I,J). The calculation shown
in expression (2) is carried out at line 48 in Figure 1.
At this point, [A^ca^c] is recalculated at line
53 in the kt^ iteration from the following expression:
N
(3) [Aj0310] = [Ak] +2Em(i,j) Cj
j=l
N is total number of complexes having reactant i. If
[A^calc] cannot converge to a value such that the right
side of expression (1) is equivalent to the left side
after 2000 iterations, a message indicates that
convergence did not occur and the program halts.
Without changing the algorithm described above,
COMPLEX was modified to continue to produce
concentrations of equilibrium species for different
values of pH and a reactant's total concentration. In
this way, row vectors (samples with simulated measured
species) in the data matrices were generated as functions
of pH and the concentrations of reactants.
The concentration of a selected reactant and the
pH were generated by separate algorithms which produced
numbers from random normal distributions. The purpose of
this step was to generate a series of simulated measures
with normal distributions. The basic algorithm is shown
in Figure 2.
17
39 RSUM=0
DO 60. 1 = 1. 12
RNUM=RAND*A(TIME)
RSUMRSUM+RNUM
60 CONTINUE
PH=CRSUM6.0000)*SDEV+DMEAN
Figure 2 Algorithm for generating numbers from a random
normal distribution.
Twelve random numbers between 0 and 1.0 are produced by
the FORTRAN Library function RAND$A and then summed.
This produces a number from a normal distribution with a
population mean of 0 and a standard deviation of 1.0.
The distribution is shifted by first multiplying the
random normal number by the desired population standard
deviation and then adding the desired population mean.
The complete programs used to generate
equilibrium data matrices are shown in the appendix.
C0MPLEX6 produces data (row) vectors where the
concentration for a single reactant varies along with pH.
C0MPLEX7 produces data vectors where concentrations for
multiple reactants vary along with pH.
A column vector containing random numbers
normally distributed was attached to the data matrices
using the BASIC program RNDVAR (see appendix). This
provided a slack variable. The purpose was to test a
variable which should show negligible covariance with
18
other variables in the data matrix. Random numbers from
a normal distribution were generated from an algorithm
similar to the one described in Figure 2. As input,
RNDVAR required the smallest and largest standard
deviations of the equilibrium concentrations as well as
the smallest and largest averages. This precluded the
slack variable from being identified as peculiar from
just visual inspection of the data matrix.
2.1.1 Generation of Equilibrium Data With Error
Measurement error was simulated by perturbing
each concentration in the data file produced by C0MPLEX6
or C0MPLEX7. A percentage of error from a random normal
distribution was generated for each concentration by an
algorithm similar to the one shown in Figure 2. The
population standard deviation for the percent error
distribution was constant throughout the error data
matrices. The BASIC program RNDERR, in the appendix, was
used to produce an error perturbed data file from the
unperturbed data file.
For matrices containing error and a slack
variable, RNDERR was run before RNDVAR. This allowed
data files to be handled expeditiously.
2.2 Statistical Procedures
All data pretreatment and factor analytic
19
procedures were carried out with the statistical computer
program ARTHUR [33]. ARTHUR is written in FORTRAN77.
Prior to processing by ARTHUR, data matrices were
placed in ARTHUR compatible form by the BASIC program
ARTTAB (see appendix). In order to be ARTHUR compatible,
a data matrix file must contain specific commands, a data
format specification and data delimiters. The specific
commands were either Command Program Language (CPL) [34]
commands or commands used by ARTHUR to access various
subroutines. CPL is the language used by the Primos
operating system of the Prime 9950 computer to.run
programs interactively. The data format specification
contained F0RTRAN77 format specifiers. The data
delimiters, a single row of 9's for each row in a data
vector, separated the data matrix from CPL commands above
and below the data matrix.
2.2.1 Data Pretreatment
Data pretreatment consisted of converting
equilibrium concentrations to their natural logarithms,
deleting or rearranging variables in the data matrices,
joining data matrices, categorizing data vectors and z
scoring. The following paragraphs describe the routines
used for data pretreatment.[33]
CHFEAT. CHFEAT allows the user to change
features (variables) in a data matrix. These changes can
20
take the form of addition, subtraction, multiplication or
division of all values in a variable column, as well as
conversion to logarithms or exponentials. In addition,
variables can be deleted or rearranged in the data
matrix. CHFEAT was specifically used in this research to
either transform their values to natural logarithms,
delete or rearrange variables.
CHJOIN. Data matrices which were either too
large to generate at one time or required different sets
of input constants in COMPLEX6 (or C0MPLEX7) were
combined by ARTHUR'S CHJOIN routines. This routine
allows the user to combine either the data vectors (rows)
of two matrices or the variables (columns) of two
matrices. In this research, only data vectors were
combined.
CHDATA. This routine allows the user to delete
data vectors, assign category numbers to data vectors or
move data vectors between the training set or the test
set. The training vector set is used in classification
analysis as the template for categorizing members of the
test set. Since this research did not include
classification analysis, all vectors were considered as
members of the training set. CHDATA was used in this
research to assign category numbers to data vectors.
21
SCALE. This routine allows the user to scale
values of each variable in the data matrix. Scaling can
be applied with or without errorweighting. Methods
without errorweighing include autoscaling, range
scaling, variance normalization, and mean normalization.
Methods with errorweighing include autoscaling, mean
subtraction, and mean normalization. The mean in these
methods is actually the sample average. In this
research, autoscaling without errorweighing was used
exclusively. The following relationship (autoscaling) is
used in the calculation:
X'(i,j) = / tSUM(X(i,j)Xi>2J1/2
In this equation X is the scaled value of the i1"11
variable in the jth data vector. The result is a column
of values for the ith variable which has an average of 0
and a variance of 1.0. This differs from the equation
for zscoring where the variable average is subtracted
from each value and the result is divided by the standard
deviation. Thus, the autoscaled value differs from the
1/2
zscored value by only n / Zscoring (or autoscaling)
is necessary to be able to compare variables with widely
differing ranges and averages. In addition, zscoring is
necessary to compare variables that are measured in
different units such as moles/liter, millimoles/liter and
micromoles/liter [22]. If all values were first
22
converted to millimoles/liter without zscoring, the
covariance matrix would show spuriously high covariances
among variables originally measured in moles/liter. At
the same time, all variables originally measured in
micromoles/liter would produce spuriously low
covariances. When all values from the data matrix are z
scored the variancecovariance matrix is identical to the
correlation matrix. In addition to autoscaling, the
SCALE routine tabulates simple statistics from the data
matrix (e.g. average, standard deviation, range, etc.).
2.2.2 Factor Analysis
Factor analytic procedures involved
eigenextraction, Varimax rotation and
reorthogonalization. Results from these procedures were
compared with correlations among the variables in the
data matrix. The values of each data vector projected
onto factor axes, or factor scores, were plotted to show
the separation of vector categories. The following
paragraphs describe the ARTHUR routines involved in these
procedures.[33]
KAPRIN. Principal components, or eigenvectors,
from the variancecovariance matrix were extracted using
the routine KAPRIN.. Since all values were autoscaled,
principal components were extracted from the correlation
matrix. The result was a series of orthogonal
23
eigenvectors along with their eigenvalues. The same
number of eigenvectors and eigenvalues as variables in
the data matrix were generated. The coefficients of the
eigenvectors are referred to as component loadings. The
sum of the squares of the components loadings within an
eigenvector is equilivalent to the eigenvalue. Since the
principal components were extracted from the correlation
matrix, component loadings measure the correlations
between principal components and the zscored (or
autoscaled) variables.[35] In order to compare component
loadings from one.principal component with those on
another, ARTHUR normalizes the loadings. The sum of the
squares of these normalized loadings is equal to 1.
Following KAPRIN, eigenvalues, normalized loadings and
the percent of variance contributed by each variable
within a principal component were tabulated by the
routine KAVECT.
KAVARI. Varimax rotation of principal components
is performed using the KAVARI routine. This procedure is
used to improve the interpretability of the factors.
This is accomplished by discarding principal components
axes which account for minimal amounts of variance in the
data matrix and maximizing the remaining variance around
the remaining axes.[22] KAVARI employs the Kaiser method
of rotation. In this iterative method, two axes are
24
adjusted simultaneously while the remaining axes are held
stationary. In the process, variance is distributed so
that loadings will be either high (i.e. near + / 1) or
low (i.e. near 0) on each axis. Once all axes have been
rotated, the procedure is repeated until little change
occurs in the redistribution of the variance.[33]
KAVARI, however, does not necessarily produce orthogonal
axes. Following KAVARI, the routine KAORTH was used to
insure orthogonality. Following KAORTH, normalized
loadings and the percent variance contributions were
tabulated by the routine KAVECT.
CORREL. Results of principal component
extraction and orthogonal rotation were compared with
correlations among the original variables. At the user's
option, ARTHUR'S CORREL routine provides the variance
covariance matrix of nonautoscaled data as well as the
correlation matrix. The correlation matrix also provides
the probability with 95% confidence that a nonzero
correlation could occur from random chance if the data
are normally distributed.
VARVAR. Factor scores were plotted with the use
of the VARVAR routine. Prior to running VARVAR, the
factor scores matrix was transposed using the routine
KATRAN.
CHAPTER 3
RESULTS AND DISCUSSION
3.1 Factor Models of Single Equilibria Without Error
The purpose of this study is to show how factor
analysis reveals that an equilibrium is made up of
component systems. These component systems, represented
as factors, are groups of closely associating variables
which we assume exist.
3.1.1 The CdNH^en Equilibrium at pH 8.39 +/ 0.1
The CdN^en equilibrium contained cadmium
amine, cadmiumhydroxyl and metalethylenediamine
complexes, along with the free cadmium, NH^,
ethylenediamine and their respective acids. Table 3.1 is
a list of the species involved along with the labels used
in ARTHUR.
A matrix of 200 data vectors based on the CdN^
en equilibrium was generated by simultaneously varying
the pH.and the concentration of ethylenediamine as a
reactant. After all equilibrium concentrations were
converted to natural logs, the reactant ethylenediamine
and H+ concentrations exhibited linear normal
26
Table 3.1 Chemical Species in the CdNH3en Equilibrium
Model.
Species ARTHUR Name
Cd2 + CD
Cd(NH3)2+ CD1A
Cd(NH3)22+ CD2A
Cd(NH3)32+ CD3A
Cd(NH3)42+ CD4A
Cd(NH3)52+ CD5A
Cd(NH3)g2+ CD6A
Cd(OH)+ CDIOH
Cd(OH)2 CD20H
Cd(OH)3" CD30H
Cd(OH)42 CD40H
Cd(en)2+ CD1EN
Cd(en)22+ CD2EN
Cd(en)32+ CD3EN
NH3 NH3
nh4+ NH4
en EN
H(en)+ EN1H
H2(en)2+ EN2H
H+ H
27
distributions. At this point the log of the reactant
ethylenediamine concentration was 11.5 + / 2.3. The pH
mean was 8.39 representing the average of the pKa's of
the conjugate acids of NH^ and en. The pH standard
deviation was 0.1. Concentrations of the NH3 and cadmium
reactant were constant at 1.0E5 M and
1.0E6 M, respectively.
3.1.1.1 Principal Component Rotation
As a result of eigenextraction, the first three
principal components account for 99.9% of the variance.
The first principal component has an eigenvalue of 12.56
and accounts for 62.8% of the variance. The second has
an eigenvalue of .6.07 and accounts for 30.3% of the
variance. The third has an eigenvalue of 1.36 and
accounts for 6.8% of the variance.
A list of the contributions to these principal
components is shown in Table 3.2. Normalized loadings
are shown along with the percent variance contributed by
each variable within each principal component. Each
loading is a normalized coefficient of the correlation
between the variable and the principal component.
The first principal component is mainly composed
of cadmiumamine and cadmiumhydroxyl complexes. These
normalized loadings range from 0.275 to 0.242. The
variable with the next highest loading is 0.215 for
28
Table 3.2 Principal Component Loadings1 and Variance
Contributions^ From the CdN^en Equilibrium
Matrix at pH 8.39 + / 0.1.
Principal Component 1
Variable Loading (% Var.) Variable Loading (% Var.)
CD20H 0.275 (7.59) EN2H 0.215 (4.61)
CD2A 0.274 (7.52) CD 0.198 (3.92)
CD3A 0.273 (7.44) ENlH 0.197 (3.89)
CD30H .0.269 (7.25) EN 0.177 (3.15)
CD4A 0.263 (6.92) CD3EN 0.171 (2.92)
CDlOH 0.258 (6.66) CD2EN 0.167 (2.79)
CD40H 0.257 (6.61) H 0.161 (2.60)
CD1A 0.253 (6.40) NH3 0.161 (2.59)
CD5A 0.252 (6.37) NH4 0.161 (2.58)
CD6A 0.242 (5.88) CDlEN 0.152 (2.31)
Principal Component 2
Variable Loading (% Var.) Variable Loading (% Var.)
CD2EN 0.308 (9.49) CD6A 0.205 (4.19)
CD3EN 0.308 (9.49) CD5A 0.181 (3.27)
NH3 0.308 (9.48) CD40H 0.167 (2.80)
H 0.307 (9.45) CD4A 0.146 (2.13)
EN 0.307 (9.42) CD30H 0.118 (1.40)
CD1EN 0.306 (9.35) CD1A 0.094 (0.89)
NH4 0.303 (9.13) CD3A 0.094 (0.87)
EN1H 0.277 (7.65) CDlOH 0.078 (0.60)
EN2H 0.243 (5.89) CD20H 0.040 (0.16)
CD 0.207 (4.30) CD2A 0.014 (0.02)
Principal Component 3
Variable Loading (% Var.) Variable Loading (% Var.)
CD 0.426 (18.2) CD2A 0.201 (4.03)
CD1A 0.324 (10.5) ENlH 0.188 (3.52)
CDlEN 0.324 (10.5) CD20H 0.167 (2.80)
CDlOH 0.306 (9.36) EN 0.156 (2.43)
NH4 0.280 (7.84) CD3A 0.097 (0.94)
H 0.271 (7.32) CD6A 0.071 (0.50)
NH3 0.268 (7.20) CD30H 0.060 (0.36)
CD2EN 0.233 (5.44) CD5A 0.032 (0.10)
EN2H 0.217 (4.71) CD4A 0.022 (0.05)
CD3EN 0.206 (4.25) CD40H 0.012 (0.02)
1 Loadings are normalized to 1.0.
^ Percent of total variance within each PC.
29
9 +
H2en which denotes the boundary between the higher or
more important loadings and the lower or less important
loadings. The second principal component is mainly
composed of cadmiumethylenediamine complexes and free
ethylenediamine, although NH4 + H+ and NH3 are major
contributors. The third principal components is largely
made up of free cadmium and single ligand cadmium
complexes, although NH4+, H+ and NH3 are major
contributors here also.
Essentially all the variance (99.9%) in this data
matrix is reproduced by three principal components.
Knowing that much fewer than 20 principal components are
sufficient, the next step in factor analysis is to
orthogonally rotate a reduced set of principal
components. Malinowski and Howery describe the rotation
step as critical for relating abstract principal
components to chemical properties. [23] The process of
choosing a reduced set of principal components and
orthogonally rotating them transforms principal
components into factors. Although principal component
analysis is generally thought of as a part of the field
of factor analysis, principal components are different
from factors.[22] In classical analysis factors are
separated in two groups: principal factors and unique
factors.[13] Athough this dichotomy is less useful when
principal components analysis involving eigenextraction
30
is employed, rotation allows us to emphasize the
principal factors which contain the important
information.
The question arises as to how many factor are
sufficient to explain this chemical equilibrium. While
certainly no more than three are needed, two principal
components cover a large portion (93.1%) of the variance.
Various methods have been proposed for determining the
appropriate number of factors for interpretation. These
methods can be classified as either statistically based
or graphically based. Statistical methods test the
significance of the variance associated with a principal
components in question. Graphical methods such as scree
plots [36] and cumulative frequency plots [37] allow the
analyst to visually separate the principal components
containing important information from those which contain
unimportant variance.
One of the statistical methods for determining
which principal components to retain uses a significance
test based on the F distribution. [38] This test involves
the T(M) statistic, which compares the variance
associated with the principal component in question to a
larger portion of the variance that can be considered
residual. The T(M). statistic is a ratio of the two
variances which allows it to be compared with the F
statistic at a specific significance level. A value of
31
T(M) greater than F indicates that the principal
component has significant variance whereas a value less
than F indicates that the principal component has
insignificant variance and can be discarded. The test is
most suitable for those principal components with
eigenvalues of approximately 1.0. Tests of this sort are
quite reliable for identifying the insignificant
principal components but occasionally fail to identify
all of the significant principal components. Thus, an
analyst may be misled into discarding a principal
component that contains valuable information. In
addition, these tests are used when the sample
distribution is known or may be assumed to be normal.
The T(M) test was performed on principal
components 2 and 3 from the CdNH^en equilibrium matrix.
Results were inconclusive. Results showed a T(M) value
of 1135 for principal component 3 which is far in excess
of the corresponding F statistic at 6.04 for the 97.5%
significance level. This would indicate that the third
principal component is quite significant but the large
value for T(M) is dubious. A similar calculation for
principal component 2 produced a T(M) value of 80.9
compared to the corresponding F statistic at 5.98. This
gives the erroneous impression that that principal
component 3 is more significant than 2.
A scree plot of the first five principal
32
components from the CdNH^en equilibrium matrix is shown
in Figure 3. Unfortunately, no sharp delineation between
large and small eigenvalues is seen. In order to improve
delineation, it is often necessary to rotate an "excess"
number of principal components. Ideally,.eigenvalues of
the questionable rotated principal components will either
noticeably increase or decrease. If the eigenvalue
remains the same the rotated factor axis accounts for no
new variance while competing with fewer axes, and we
conclude that the unrotated principal component contains
no important information.
To demonstrate this effect on the scree
plot five principal components were rotated. This is
an excessive number of principal components since 4 and
5 contribute a total variance of only 0.1%. Figure 4
reveals that the eigenvalue of the third principal
component increases more than two fold upon
rotation, while principal components 4 and 5 change
very little.
Cumulative frequency plots provide further
insight into the question of how many principal
components to rotate. Figure 5 is a cumulative frequency
plot of principal component variance from the CdNH^en
equilibrium matrix. In this type of plot, data
associated with a linear portion of the plot are members
of a single normally distributed population. More than
Scree Plot for the CdNHgen Equilibrium
Data Matrix at pH 8.39 +/ 0.1.
Figure 3
34
Figure 4 Comparison of Scree Plots of Rotated and
Unrotated Principal Components From the Cd
NHoen Equilibrium Data Matrix at pH 8.39 +/
0.1.
0.01 0.05 0.1 0.2 0.5 l 2 5 10 20 30 40 50 60 70 80 90 95 96 99 99.B 99 9 99 99
PERCENT PROBABILITY
Figure 5 Cumulative Frequency Plot of Principal Component Variance From /the CdNH3~en
Equilibrium Matrix at pH 8.39 +/ 0.1.
OJ
36
one linear portion indicates that there are more than one
normally distributed population. In Figure 5^ the two
linear portions indicate that there are two populations
of variances. One of the populations of variances is
associated with principal components 13 or 14, and the
other population is associated with the remaining
principal components. Thus, we associate the former
population with "interpretable" variance and the later
with "inexplained" variance.
Figure 6 clarifies whether variance from
principal component 4 is from the same population as
variances from principal components 13. Here, the plot
shows variance from only the first five principal
components. This "magnified" view of the cumulative
frequency plot shows that variance associated with
principal component 4 is not in the same population as
variances of principal components 13. More importantly,
however, this plot reveals that variance associated with
principal component 3 cannot be separated from variance
associated with principal components 1 and 2.
From both the cumulative frequency plots and the
plot of the change in eigenvalues as an excess number of
principal components are rotated, three principal
components are probably required to explain the chemical
interactions.
001 0 05 0.1 0.2 0.5 1 2 5 10 20 30 40 50 60 70 60 90 95 . 96 99 99.8 99.9 99 99
PERCENT PROBABILITY
Figure 6 Cumulative Frequency Plot of Variance From the First Five Principal
Components of the CdNH3en Equilibrium Matrix at pH 8.39 +/ 0.1.
u>
38
3.1.1.2 Three Factor Model
Table 3.3 shows the normalized factor loadings
and percent variance from variables following the
orthogonal rotation of three principal components. The
fraction of original variance from each variable, or
communality, that is assigned to the three factors ranges
from 0.986 to 1.000. The first factor has approximately
equally high loadings from H+, NH3, and NH4 + In
addition, higher coordinated amine and hydroxyl complexes
have larger loadings than than lower coordinated
complexes.
Factor 1 primarily reveals the close association
among NH^+, H+ and NH3. Loading signs indicate that
these species are in equilibrium: as NH3 decreases, NH4 +
and H+ increase. Higher coordinated amine complexes also
associate closely with these species as indicated by the
magnitude and signs of their loadings in relation to NH3.
The order of amine complexes in this factor is from
higher coordinated to lower coordinated, although this
order does not hold below Cd(NH2)4 The table shows
that CdtN^)^* has a higher loading than Cd(NH3)3^+.
The relatively high loading of Cd^+ along with
its sign indicates that it is quite sensitive to changes
in the NH4+/H+/NH3 system. This is expected since the
reactant cadmium concentration remains constant. Any
increase or decrease in concentrations of the amine
39
1 . . 9
Table 3.3 Factor Loadings and Variance Contributions^
From the CdNH^en Equilibrium Matrix at
pH 8.39 + / 0.1.
Factor 1
Variable Loading (% Var.) Variable Loading (% Var.)
NH4 0.440 (19.3) CD30H 0.142 (2.00)
H 0.436 (19.1) CD3A 0.103 (1.06)
NH3 0.435 (18.9) CD1EN 0.101 (1.02)
CD 0.334 (11.1) EN2H 0.092 (0.84)
CD 6 A 0.272 (7.39) EN1H 0.044 (0.19)
CD5A 0.235 (5.54) CD2EN 0.044 (0.19)
CD40H 0.216 (4.65) CD3EN 0.027 (0.07)
CD4A 0.182 (3.32) CD20H 0.024 (0.06)
CD1A 0.173 . (2.99) CD2A 0.015 (0.02)
CD10H 0.149 (2.21) EN 0.004 (0.00)
Factor 2
Variable Loading (% Var.) Variable Loading (% Var.)
CD1EN 0.467 (21.8) NH4 0.084 (0.70)
CD2EN 0.403 (16.2) CD20H 0.078 (0.61)
CD3EN 0.383 (14.7) H 0.074 (0.54)
EN2H 0.365 (13.3) NH3 0.072 (0.51)
EN1H 0.357 (12.8) CD3A 0.056 (0.31)
EN . 0.345 (11.9) CD30H 0.043 (0.19)
CD 0.155 (2.41) CD4A 0.031 (0.09)
CD1A 0.127 (1.62) CD40H 0.018 (0.03)
CD10H 0.122 (1.48) CD5A 0.013 (0.02)
CD2A . 0.090 (0.80) CD6A 0.001 (0.00)
Factor 3
Variable Loading (% Var.) Variable Loading (% Var.)
CD 0.474 (22.5) CD3A 0.149 (2.21)
CD1A 0.379 (14.3) EN2H 0.145 (2.09)
CD10H 0.361 (13.0) CD3EN 0.139 (1.93)
NH4 0.259 (6.68) EN1H 0.117 (1.38)
CD1EN 0.258 (6.64) CD30H 0.110 (1.22)
CD2A . 0.256 (6.53) EN 0.089 (0.79)
H 0.249 (6.22) CD4A 0.069 (0.48)
NH3 0.247 (6.12) CD40H 0.033 (0.11)
CD20H 0.222 (4.91) CD6A 0.029 (0.09)
CD2EN 0.166 (2.76) CD5A 0.012 (0.01)
^ Loadings are normalized to 1.0.
2 Percent of total variance within each factor.
40
9 +
complexes y/ill result in an inverse response for Cd^
Factor 1 does not fully reveal the equilibrium occuring
between H+ and and the higher coordinated hydroxyl
complexes, but it does identify species that are most
sensitive to changes in pH. Interpreted another way,
factor 1 indicates that pH varies among the data vectors.
When factor one is compared with principal
component one (prior to rotation) the resemblence is
minimal. In Table 3.2, NH3, NH^+ and H+ are minor
contributors to principal component one. In addition,
Cd(NH3)62+ has a smaller loading than Cd(NH3)2+. Thus,
we can see that the principal component model without
rotation does not permit an accurate interpretation of
the chemical interactions.
Factor 2 (Table 3.3) is mainly composed of
ethylenediamine species: metal complexes and acids. The
precipitous drop in the magnitude of the free cadmium
loading from that of free ethylenediamine indicates that
this is truly an ethylenediamine factor. The loadings
signs do not actually reveal that free ethylenediamine
and those species containing the base are in equilibrium.
An equilibrium would be shown by opposite loading signs
between free ethylenediamine and species that contain the
base. Instead, loading signs are the same. What this
factor does indicate is that reactant ethylenediamine
concentration varies throughout the data matrix. In
I
41
other words, each data vector was generated as a function
of a different total reactant ethylenediamine
concentration. If we had generated the data matrix as a
function of total NH3 or total cadmium as a reactant, we
would expect to see a separate factor for NH3 or cadmium.
The fact that Cd(enhas a larger loading than
ethylenediamine itself probably indicates that this
complex is most sensitive to changes in the reactant
ethylenediamine concentration.
Principal component two, on the other hand,
contains NH3, H+ and NH^+ interspersed with the
ethylenediamine species. Without rotation identification
of ethylenediamine as the reactant which varies would
elude detection.
Factor 3 mainly contains free cadmium along with
cadmium complexes of low coordination. The degree of
coordination increases as loadings decrease. Therefore,
free cadmium associates more closely with lower
coordinated complexes than higher coordinated complexes.
We would expect this because, in an equilibrium, lower
coordinated complexes form from the supply of free base
and free metal in the vicinity. Cd(en)^ is loaded below
Cd(NH3)2+ and Cd(OH) because of its high loading on
factor 2. This factor reveals little about the
equilibrium between free Cd2 + and the eadmiuin complexes,
since free cadmium has the same loading sign as the
42
complexes.
Little difference exists between factor 3 and
principal component 3. Here is an example where rotation
does not improve the interpretation of the chemical
interactions.
This analysis shows that rotation is necessary
for interpreting two of the three factors from the Cd
NHjen equilibrium matrix at pH 8.39 +/ 0.1. In
addition, the interpretations agree with the expected
chemical interaction.
3.1.1.3 Comparison of the Two Factor Model With the
Three Factor Model
Table 3.4 shows normalized loadings and the
percent variance contributed by each variable within each
factor following rotation of two factors. By chosing only
two factors we have effectively ignored the relationship
between free cadmium and the lower coordinated complexes.
Factor 1 in this model reveals the association
among NH^+, H+, and NH^ as did the three factor model.
The sum of their variance contributions within the factor
is not as high (34%) as that for the three factor model
(57%), and as a result the higher coordinated amine
complexes have larger loadings.
There are two major differences in factor 1
between the three factor model and the two factor model.
First is the importance of free cadmium. In the three
43
1 0
Table 3.4 Loadings and Variance Contributions
From the Two Factor Model of the CdNH^en
Equilibrium Matrix at pH 8.39 +/ 0.1.
Factor 1
Variable Loading (% Var.) Variable Loading (% Var.)
NH3 0.337 (11.4) CD2A 0.192 (3.69)
H 0.337 (11.4) CD1EN 0.129 (1.65)
NH4 0.333 (11.1) CD2EN 0.121 (1.45)
CD6A 0.314 (9.85) CD3EN 0.118 (1.39)
CD5A 0.303 (9.16) EN 0.113 (1.27)
CD40H 0.296 (8.73) CD10H 0.112 (1.26)
CD4A 0.284 (8.05) CD1A 0.097 (0.94)
CD30H 0.267 (7.12) EN1H 0.077 (0.59)
CD3A 0.251 (6.28) EN2H 0.040 (0.16)
CD20H 0.212 (4.49) CD 0.025 (0.06)
Factor 2
Variable Loading (% Var.) Variable Loading (% Var.)
EN 0.354 (12.6) H 0.192 (3.69)
CD3EN 0.352 (12.4) NH4 0.188 (3.54)
CD2EN 0.350 (12.3) CD2A 0.119 (1.42)
CD1EN 0.341 (11.6) CD20H 0.098 (0.95)
EN1H 0.337 (11.4) CD6A 0.063 (0.40)
EN2H 0.316 (9.98) CD3A 0.049 (0.24)
CD 0.277 (7.68) CD5A 0.037 (0.14)
CD1A 0.204 (4.17) CD30H 0.026 (0.07)
NH3 0.193 (3.71) CD40H 0.023 (0.05)
CD10H 0.192 (3.69) CD4A 0.002 (0.00)
^ Loadings are normalized to 1.0.
2 Percent of total variance within each factor
44
factor model Cd^ has the fourth largest loading, m the
two factor model it has the smallest loading. Therefore,
0 +
the relationship between Cd and the higher coordinated
amine complexes is obscure in the two factor model. The
second difference is in the loading signs of the lower
coordinated amine complexes. These signs are opposite
relative to NH3 when three factors are retained; they are
the same when two factors are retained. Therefore, the
two factor model simplifies, somewhat, interpretation of
the relationship between NH3 and cadmiumamine complexes.
The relationship between NH3 and the amine complex
depends solely on values of the factor loadings.
Information from the loading signs among these variables
becomes lost in the two factor model.
The second factor in Table 3.4 shows that
ethylenediamine species have the highest loadings as in
the three factor model. However, a major difference
remains. The two factor model does not show a clear
loading break between the ethylendiamine species and
Cd^ As a consequence, it is more difficult to
interpret. Another difference between the factor models
is. the loading order of the ethylenediamine species,
although this appears to be minor.
45
3.1.1.4 Comparison of Factor Models Based on Matrices
With a Multivariate Log Normal Distribution and the
Multivariate Linear Normal Distribution
The CdNH^en equilibrium matrix was analyzed
without transforming the concentrations to logarithms.
In effect, this produced a matrix with a multivariate log
normal distribution.
The purpose of comparing factor models of the log
normal distribution and the linear normal distribution
was to see whether factor interpretation differs
depending on the type of multivariate distribution. Even
though chemical species are usually distributed log
normally in natural systems, the type of distribution
should not matter because eigenextraction involves only
mathematical manipulation not statistical inference.[13]
In the process of zscoring the data ARTHUR was
not able to detect small differences in concentrations.
As a result, ARTHUR automatically deleted the following
species: Cd(NH3)22 + , Cd(NH3)32 + , Cd(NH3)42 + , Cd(NH3)52 + ,
Cd(NH3)g2 + Cd(OH)3, Cd(OH)42~. To be able to compare
factor models, these same variables were deleted from the
log transformed data matrix.
Three factors models from both data matrices are
shown in Table 3.5 along with eigenvalues and the amount
of variance assigned to each factor.
The most obvious difference between the two
factor models is that factors 1 and 2 are switched. Less
46
Table 3.5 Eigenvalues, Loadings' and Variance
Contributions2 From the Three Factor Model of
the Log Normal and Linear Normal Data
Matrices.
Log Normal Matrix
Factor 1 Eigenvalue: 6.07
Variable Loading (% Var.) Variable Loading (% Var.
NH3 0.508 (25.8) CD1A 0.130 (1.69)
NH4 0.508 (25.8) EN2H 0.047 (0.23)
H 0.497 (24.7) CD2EN 0.047 (0.22)
CD20H 0.328 (10.7) CD3EN 0.046 (0.21)
CD1EN 0.212 (4.48) EN 0.026 (0.07)
CD 0.182 (3.30) EN1H 0.012 (0.01)
CD10H 0.163 (2.73)
Factor 2 Eigenvalue: 3.93
Variable Loading (% Var.) Variable Loading (% Var.'
CD3EN 0.577 (33.3) CD1A 0.107 (1.14)
EN1H 0.385 (14.9) CD10H 0.102 (1.04)
EN 0.385 (14.9) CD20H 0.068 (0.46)
EN2H 0.360 (12.9) NH3 0.033 (0.11)
CD1EN 0.326 (10.6) NH4 0.033 (0.11)
CD2EN 0.304 (9.24) H 0.021 (0.04)
CD 0.116 (1.35)
Factor 3 Eigenvalue: 2.66
Variable Loading (% Var.) Variable Loading (% Var.;
CD1EN 0.602 (36.2) EN 0.129 (1.65)
CD 0.412 (17.0) EN2H 0.125 (1.55)
CD3EN 0.378 (14.3) NH3 0.114 (1.31)
CD1A 0.330 (10.9) NH4 0.114 (1.30)
CD10H 0.310 (9.61) H 0.101 (1.02)
CD20H 0.180 (3.25) CD2EN 0.036 (0.13)
EN1H ' 0.133 (1.77)
^ Loadings are normalized to 1.0,.
2 Percent of total variance within each factor.
47
Table 3.5 (Continued)
Linear Normal Matrix
Factor 1 Eigenvalue: 5.13
Variable Loading (% Var.) Variable Loading (% Var.)
CD1EN 0.462 (21.4) CD1A 0.205 (4.22)
CD2EN 0.387 (15.0) CD10H 0.199 (3.95)
CD3EN 0.364 (13.3) CD20H 0.144 (2.08)
EN2H 0.340 (11.6) NH4 0.083 (0.69)
EN1H 0.332 (11.1) H 0.073 (0.53)
EN 0.321 (10.3) NH3 0.071 (0.50)
CD 0.234 (5.48)
Factor 2 Eigenvalue: 4.82
Variable Loading (% Var.) Variable Loading (% Var.)
NH4 0.543 (29.4) CD1EN 0.073 (0.53)
H 0.540 (29.2) EN1H 0.049 (0.24)
NH3 0.539 (29.1) CD10H 0.045 (0.21)
CD 0.271 (7.33) CD2EN 0.027 (0.08)
CD20H 0.147 (2.17) CD3EN 0.014 (0.02)
EN2H 0.108 (1.16) EN 0.011 (0.01)
CD1A 0.074 (0.54) 
Factor 3 Eigenvalue: 3.06
Variable Loading (% Var.) Variable Loading (% Var.)
CD 0.522 (27.2) NH4 0.113 (1.27)
CD1A 0.478 (22.9) H 0.103 (1.07)
CD10H 0.467 (21.8) NH3 0.101 (1.03)
CD20H 0.362 (13.1) EN2H 0.091 (0.82)
CD1EN 0.247 (6.09) EN1H 0.080 (0.63)
CD2EN 0.149 (2.23) EN 0.067 (0.45)
CD3EN 0.120 . (1.45)
48
importance, though, should be placed on the order of the
factors than the amount of variance assigned to a factor.
Take, for example, the ethylenediamine factor. This
factor from the linear normal data matrix (factor 1)
accounts for approximately 39% of the variance, while
from the log normal data matrix (factor 2) it accounts
for 30%. This disparity in variances is related to
differences in the respective correlation matrices.
Table 3.6 shows a portion of the correlation matrices
involving only the ethylenediamine species. Correlations
from the log normal data range from 0.982 to 0.361. On
the other hand, the correlations from the linear normal
data range from 0.999 to 0.956. Thus, the narrower the
range of correlations in this case, the greater the
amount of variance accounted for.
Another difference between the two factor models
involves factor 3. From the log normal matrix,
Cd(en)^+ has a negative correlation with this factor
while Cd^ has a positive correlation. In addition,
Cd(en)2^+ has a higher loading than either Cd(NH3)^+ or
Cd(OH)+. These anomalies make this factor from the log
normal data matrix more difficult to interpret than the
factor from the linear normal data matrix.
The comparison shows that there is no fundamental
difference between log normal and linear normal factor
models. Generally, interpretation is the same even
49
Table 3.6 Portions of the Correlation Matrices From the
Log Normal and Linear Normal Data Matrices.
Log Normal Matrix
CD1EN CD2EN CD3EN EN EN1H EN2H
CD1EN 1.000
CD2EN 0.681 1.000
CD3EN 0.361 0.878 1.000
EN 0.618 0.982 0.947 1.000
EN1H 0.630 0.970 0.908 0.976 1.000
EN2H 0.611 0.903 0.811 0.895 0.969 1.000
CD1EN Linear Normal CD2EN. CD3EN Matrix EN EN1H EN2H
CD1EN 1.000
CD2EN 0.993 1.000
CD3EN 0.988 0.999 1.000
EN 0.977 0.9.55 0.998 1.000 .
EN1H 0.972 0.990 0.992 0.994 1.000
EN2H 0.956 0.973 0.975 0.976 0.994 1.000
50
though the order of the factors may be different between
the two models. However, the relative importance of a
particular factor differs between the two models, and
this may affect overall interpretation.' In addition,
the linear normal model is easier to interpret than the
log normal model.
3.1.2 Analysis of the CdNH^en Equilibrium Over Various
pH Ranges
Four data matrices were generated at the
following standard deviations for the pH population:
0.05, 0.20, 0.30, 0.40. These were analyzed in relation
to the pH 8.39 +/ 0.1 matrix. Varying the standard
deviation provided a means for controlling the pH range
while maintaining a normal distribution. All matrices
contained 200 data vectors.
This analysis is a comparison of factor
models from an equilibrium where the.size of the pH
window varies over a range of values. Since the pH mean
remains constant, each matrix is a sample of the same
equilibrium. This comparison provides insight as to the
effect on factor models of nonlinear relationships among
the variables.
The cr(pH)=0.4 data matrix was problematic in
that some values for Cd(NH3>g were less than 1.0E38 M.
Since ARTHUR does not accept double precision values, the
lower end of the Cd(NH3)62+
distribution was truncated.
51
To avoid misleading results, this variable was deleted
prior to eigenextraction. In an effort to be consistent,
the same variable was deleted from all data matrices.
Table 3.7 compares eigenvalues of unrotated and
rotated principal components for each matrix. Rotation
was performed on five principal components in order to
determine whether a third factor should be retained in
each case. It is clear from the table that factors 4 and
5 from each matrix contribute negligibly to the variance.
Eigenvalues of the third factor from the a~ (pH) =0.05,
O.(pH)=0.1 and CT~ (pH) =0.2 matrices increase by a factor
of two or more compared to eigenvalues before rotation.
As a result, the third factor accounts for 24%, 17% and
15% of the remaining variance, respectively. On the
other hand, factor models from the C" (pH) =0.3 and
of the third factor. Three factors are justifiably
retained for the first three matrices, whereas it is
debateable whether three factors are necessary for the
CT(pH)=0.3 and (pH)=0.4 matrices.
For the sake of comparison, three factors were
initially retained from all matrices. Table 3.8 compares
the normalized loadings. Results show that the
Cr~ (pH)=0.05 matrix is quite different from the other
matrices. First, there is a reversal in the order of
factors 1 and 3 compared to the other matrices.
52
Table 3.7 Eigenvalues Before and After Rotation of Five
Factors From CdN^en Equilibrium Matrices
Over Various pH Ranges.
Data Matrix Eigenvalues (% Variance)
<7"(PH) Before Rotation After Rotation
0.05 12.7 (66.9) 4.09 (21.5) 2.20 (11.6) 3.19E3 (0.0) 3.46E4 8.31 (42.8) 6.27 (33.0) 4.58 (24.1) 3.14E3 (0.0) 3.47E4
0.1 11.9 (62.4) 5.79 (30.4) 1.35 (7.1) 1..78E2 (0.1) 7.62E5 (0.0) 9.00 (47.4) 6.73 (35.4) 3.23 (17.0) 3.80E2 (0.2) 7.62E5 (0.0)
0.2 10.9 (57.3) 6.70 (35.3) 1.36 (7.2) 6.12E2 (0.3) 1.39E5 (0.0) 9.41 (49.5) 6.71 (35.3) 2.79 (14.7) 9.50E2 (0.5) 1.39E5 (0.0)
0.3 11.0 (57.6) 6.59 (34.7) 1.29 (6.8) 0.169 (0.9) 5.69E6 (0.0) 11.0 (56.8) 6.06 (31.9) 1.90 (10.0) 0.247 (1.3) 5.69E6 (0.0)
0.4 11.6 (60.8) 6.01 (31.7) 1.21 (6.3) 0.227 (1.2) 3.32E6 (0.0) 11.3 (59.2) 6.06 (31.9) 1.37 (7.2) 0.323 (1.7) 3.32E6 (0.0)
53
Table 3.8 Comparison of Factor Loadings^om the Three
Factor.Model of the CdNHoen Equilibrium Data
Matrices Over Various pH Ranges.
Factor 1
Variable pH Standard Deviation
0.05 0.1 0.2 0.3 0.4
CD 0.355 0.348 0.333 0.298 0.069
CD1A 0.331 0.181 0.134 0.026 0.220
CD2A 0.301 0.017 0.050 0.158 0.278
CD3A 0.269 0.106 0.167 0.239 0.289
CD4A 0.236 0.188 0.235 0.278 0.293
CD5A 0.204 0.243 0.277 0.301 0.294
CD10H 0.327 0.156 0.099 0.017 0.234
CD20H 0.293 0.023 0.097 0.196 0.284
CD30H 0.256 0.146 0.207 0.266 0.293
CD40H 0.219 0.223 0.267 0.299 0.296
CD1EN 0.241 0.111 0.117 0.071 0.001
CD2EN 0.163 0.051 0.043 0.006 0.010
CD3EN 0.139 0.033 0.021 0.013 0.013
NH3 0.147 0.451 ,0.420 0.373 0.294
NH4 0.142 0.456 0.425 0.371 0.273
EN 0.095 0.000 0.017 0.046 0.019
EN1H 0.102 0.050 0.061 0.066 0.103
EN2H 0.109 o.ioo 0.140 0.173 0.212
H 0.145 0.453 0.423 0.378 0.296
1
Loadings
are normalized to 1.0.
54
Table 3.8 (Continued)
Factor 2
Variable pH Standard Deviation
0.05 0.1 CM O 0.3 0.4
CD 0.101 0.161 0.161 0.157 0.136
CD1A 0.094 0.132 0.124 0.113 0.073
CD2A 0.085 0.092 0.073 0.055 0.025
CD3A 0.076 0.058 0.034 0.023 0.005
CD4A 0.066 0.031 0.009 0.004 0.004
CD5A 0.056 0.013 . 0.008 0.007 0.010
CD10H 0.093 0.126 0.114 0.097 0.059
CD20H 0.083 0.081 0.057 0.036 0.011
CD 3 OH 0.072 0.044 0.018 0.006 0.007
CD40H 0.061 0.019 0.005 0.010 0 016
CD1EN 0.466 0.466 0.482 0.487 0.457
CD2EN 0.411 0.401 0.402 0.408 0.402
CD3EN 0.393 0.381 0.378 0.382 0.384
NH3 0.047 0.075 0.077 0.050 0.032
NH4 0.041 0.087 0.095 0.088 0.126
EN 0.359 0.343 0.331 0.335 0.349
EN1H 0.361 0.356 0.358 0.368 0.401
EN2H 0.362 0.364 0.374 0.370 0.391
H 0.044 0.077 0.079 0.057 0.041
55
Table 3.8 (Continued)
Factor 3
Variable pH Standard Deviation
0.05 0.1 0.2 0.3 0.4
CD 0.229 0.474 0.518 0.597 0.691
CD1A 0.151 0.381 0.396 0.410 0.365
CD2A 0.073 0.259 0.233 0.185 0.117
CD3A 0.001 0.153 0.109 0.059 0.019
CD4A 0.068 0.074 0.027 0.011 0.030
CD5A 0.126 0.018 0.027 0.053 0.059
CDIOH 0.140 0.363 0.368 0.361 0.318
CD20H 0.052 0.226 0.185 0.127 0.076
CD30H 0.029 0.115 0.061 0.012 0.015
CD40H 0.100 0.038 0.014 0.050 0 060
CD1EN 0.074 0.257 0.271 0.270 0.220
CD2EN 0.031 0.166 0.147 0.127 0.096
CD3EN 0.018 0.138 0.111 0.085 0.058
NH3 0.535 0.242 0.248 0.217 0.169
NH4 0.533 0.253 0.267 0.254 0.336
EN 0.006 0.088 0.046 0.010 0.011
EN1H 0.020 0.116 0.094 0.079 0.066
EN2H 0.046 0.143 0.141 0.141 0.135
H 0.535 0.244 0.252 0.226 0.187
56
Variables with large loadings such as Cd2+, Cd(NH3)2 + ,
and Cd,(OH) in factor 1 from the cf (pH) =0.05 matrix have
large loadings in factor 3 from the other matrices.
Second, factor 3 from the
dependence of the higher coordinated amine and hydroxyl
complexes on NH3 or H+. Factor 3 reveals only the
NH3/H+/NH^+ system. A total variance of 86% within the
factor is attributed to these three variables. The sharp
drop in the Cd2+ loading from the NH^+ loading leads us
to ascribe little importance to Cd2+. In addition to the
contrast in loadings, from the o (pH) =0.05 matrix this
factor accounts for only 25% of the variance remaining
after rotation. From data matrices with larger pH
standard deviations, the factor involving the NH3/H+/NH4+
system accounts for 46%,to 65% of the remaining variance.
Factor loadings from models of the 0"" (pH) =0.1
through CT(pH)=0.3 matrices reveal minor differences.
However, trends can be identified. First, loadings of
NH3, H+ and NH4+ all decrease in factor 1 from the
Q~ (pH) =0.1 to the 0" (pH) =0.3 matrix. At the same time,
loadings of the higher coordinated amine and hydroxyl
complexes increase, while those of the lower coordinated
amine and hydroxyl complexes decrease over this range of
pH. Therefore, this trend indicates that as the pH range
increases the first factor progresses from one which
mainly emphasizes the NH^/NH^ system to one which shows
57
the dependence of higher coordinated amine and hydroxyl
complexes on this system.
Anomalies are found, though, when comparing
factor 1 from the 0~ (pH) =0.4 matrix and the (T~ (pH) =0.3
matrix. First we note that the range in magnitudes of
loadings for the amine and hydroxyl complexes is much
less from the o (pH) =0.4 matrix than from the
O (pH) =0.3 matrix. In addition, Cd^H^)2* and Cd(OH) +
have higher loadings in factor 1 from the (7 (pH) =0.4
matrix than factor 1 from the Q (pH)=0.3 matrix. This
last anomaly appears to reverse the trend observed
earlier where loadings of lower coordinated complexes
decrease as the pH range increases.
The second trend is revealed in factor 3. While
this factor accounts for less overall variance, Cd2 +
becomes more prominent as the pH range increases. This
is to be expected since the loading of Cd2+ in factor 1
decreases slightly as the pH range increases.
We noted earlier that a third factor may not be
necessary in models of the 0~ (pH) =0.3 and
matrices. Table 3.9 shows factor loadings from the two
factor models. We might expect factor one to have a
larger share of variance from Cd and lower coordinated
complexes than that seen in the three factor model
because we have forced variance from factor 3 into two
factors. Surprisingly, this is not the case in models
58
1
Table 3.9 Factor Loadings From the Two Factor Models of
the CJ~ (pH) =0.3 and CT~(pH) =0.4 Data Matrices.
Variable
Factor 1
Factor 2
CT"(PH)= 0.3 0.4 0.3 0.4
CD 0.004 0.027 0.291 0.275
CD1A 0.197 0.273 0.187 0.141
CD2A 0.278 0.296 0.072 0.043
CD3A 0.300 0.294 0.009 0.004
CD4A 0.305 0.290 0.025 0.016
CD5A 0.307 0.287 0.046 0.027
CDlOH 0.219 0.280 0.165 0.128
CD20H 0.298 0.297 0.047 0.033
CD 3 OH 0.304 0.293 0.009 0.003
CD40H 0.307 0.289 0.039 0.021
CD1EN 0.089 0.032 0.357 0.369
CD2EN 0.081 0.025 0.378 0.385
CD3EN 0.078 0.023 0.38,1 0.388
NH3 0.298 0.272 0.125 0.071
NH4 0.273 0.227 0.110 0.070
EN 0.072 0.019 0.385 0.392
EN1H 0.015 0.093 0.366 0.400
EN2H 0.101 0.193 0.315 0.346
H 0.298 0.272 0.125 0.072
1
Loadings
are normalized
to
1.0.
59
from either matrix. The two factor model only confirms
what was shown in the three factor model. The first
factor from the 0~~ (pH) =0.3 matrix shows the connection
between the NH2/H+/NH^+ system and higher coordinated
amine and hydroxyl complexes. Factor 1 from the
Or(pH) =0.4 matrix shows no connection.
The three factor model is preferred, though, to
the two factor model because it provides more information
about the lower coordinated complexes. In addition, the
separation between ethylenediamine species and the
remaining species in factor two is sharper in the three
factor model. This leads to a clearer interpretation of
the factor.
From this analysis, there appears to be an
optimal pH range where factor modelling of this
equilibrium is chemically most interpretable. At this
optimum, represented by the QT" (pH) =0.2 or o~~ (pH) =0.3
matrices, factor modelling predicts the expected chemical
interactions. Higher coordinated amine and hydroxyl
complexes are unequivocally related in these matrices to
the NH3/H+/NHij+ system while the lower coordinated
complexes associate with a separate factor. It appears
that nonlinearity in the data has considerable affect on
factor interpretation when the pH range is not at this
optimum.
60
3.1.3 Analysis of the CdNHoen Equilibrium at pH 5.85
+/ 0.1 and pH 10.93 +/ 0.1
Data matrices with pH means at 5.85 and 10.93
were generated using the same reactant concentrations
from the pH 8.39 +/ 0.1 matrix. These values were
chosen because they are one pH unit below the lowest pKa
and one unit above the highest pKa, respectively. The
purpose is to show that factor models reveal different
relationships among the equilibrium species at different
pH. The number of data vectors in each matrix was 200.
3.1.3.1 The pH 5.85 +/ 0.1 Matrix Factor Model
Normalized loadings on three principal components
from the pH 5.85 +/ 0.1 matrix are shown in Table 3.10.
Three principal components account for 97% of the
variance. The third principal component is unique in
that nearly all its variance comes from Cd ,. In this
O +
data matrix Cd had a very narrow range of 0.998E5 M to
1.000E5 M. Therefore, a narrow range coupled with a
lack of resolution as a consequence of values with only
three significant figures caused ARTHUR to interpret this
variable as random. In addition, NH^* was interpreted as
random due to its narrow range. As a result, principal
component four was also unique. Since unique principal
components provide no information concerning covariances,
both Cd* and NH^ were deleted from the matrix. At this
point, the first two principal components accounted
61
Table 3.10 Principal Component Loadings^ and Variance
Contributions^ From the CdNHoen Equilibrium
Matrix at pH 5.85 + / 0.1.
Principal Component 1
Variable Loading (% Var.) Variable Loading (% Var.)
H 0.294 (8.63) NH4 0.197 (3.88)
NH3 0.294 (8.63) EN 0.135 (1.83)
CD4A 0.294 (8.63) CD3EN 0.135 (1.83)
CD3A 0.294 (8.63) CD1EN 0.135 (1.83)
CD40H 0.294 (8.63) CD2EN 0.135 (1.83)
CD20H 0.294 (8.62) EN1H 0.109 (1.19)
CD30H 0.294 (8.62) CD 0.083 (0.69)
CD2A 0.294 (8.62) EN2H 0.081 (0.66)
CD1A 0.294 (8.62)
CD10H ' 0.294 (8.62)
Principal Component 2
Variable Loading (% Var.) Variable Loading (% Var.)
EN2H 0.399 (15.9) CD20H 0.098 (0.97)
EN1H 0.387 (14.9) CD2A 0.098 (0.97)
CD2EN 0.371 (13.7) CD40H 0.098 (0.96)
EN 0.371 (13.7) CD3A 0.098 (0.96)
CD3EN 0.371 (13.7) CD4A 0.098 (0.96)
CD1EN 0.371 (13.7) NH3 0.098 (0.96)
CD 0.212 (4.50) H 0.098 (0.95)
CD10H 0.099 (0.98) NH4 0.024 (0.06)
CD1A 0.098 (0.97)
CD30H 0.098 (0.97)
Principal Component 3
Variable Loading (% Var.) Variable Loading (% Var.)
CD 0.953 (90.8) CD2A 0.017 (0.03)
NH4 0.190 (3.62) . CD3A 0.016 (0.03)
CD1EN 0.094 (0.89) CD30H 0.016 (0.03)
EN1H 0.094 (0.89) CD40H 0.016 (0.02)
CD2EN 0.094 (0.89) CD20H 0.016 (0.02)
CD3EN 0.094 (0.89) CD4A 0.015 (0.02)
EN 0.094 (0.89) H 0.015 (0.02)
EN2H 0.094 (0.88) NH3 0.015 (0.02)
CD10H 0.018 .(0.03)
CD1A 0.018 (0.03)
Loadings are normalized to 1.0.
2 Percent of total variance within each PC.
62
for nearly 100% of the variance.
Factor analysis of the pH 5.85 + / 0.1 matrix is
the ideal example of one that does not require rotation
to chemically interpret the eigenvectors. Rotation in
this case leaves the loadings virtually unchanged. Table
3.11 shows normalized loadings on the first two principal
components. All cadmium amine and hydroxyl complexes
along with H+ and NH^+ are loaded equally on the first
principal component. Therefore, the higher coordinated
complexes are no more dependent on the NH3, NH^* or H +
than the lower coordinated complexes. The data matrix
shows that the concentrations of all amine complexes
along with NH3 are orders of magnitude less than those
from the pH 8.39 +/ 0.1 matrix. Therefore, the factor
model successfully reveals what is known from the
chemistry. At pH 5.85 the amount of NH3 available as a
ligand is minimal. All amine complexes including the
those with few ligands respond closely to changes in the
NH3 concentration. Likewise, the hydroxyl complexes
respond closely to changes in H+ because the OH
concentration at pH 5.85 is about l/300th of the OH
concentration at pH 8.39.
The second principal component in Table 3.11
shows that H2(en) responds more closely to changes in
the reactant ethylenediamine concentration than the other
equilibrium ethylenediamine species. This is expected
63
Table 3.11 Principal Component Loadings^ and Variance
Contributions^ from the CdNH^en Equilibrium
Matrix at pH 5.85 + / 0.1 Following Deletion
of Cd2 + and NH^+.
Principal Component 1
Variable Loading (% Var.) Variable Loading (% Var.)
H 0.302 (9.12) CD1A 0.302 (9.11)
NH3 0.302 (9.12) CD10H 0.302 (9.11)
CD4A 0.302 (9.11) EN 0.134 (1.78)
CD3A 0.302 (9.11) CD3EN 0.134 (1.78)
CD40H 0.302 (9.11) CD1EN 0.134 (1.78)
CD2A 0.302 (9.11) CD2EN 0.133 (1.78)
CD30H 0.302 (9.11) EN1H 0.107 (1.14)
CD20H 0.302 (9.11) EN2H 0.078 (0.61)
Variable Loading Principal (% Var.) Component 2 Variable Loading (% Var.)
EN2H 0.411 (16.9) CD20H 0.093 (0.86)
EN1H 0.399 (15.9) CD30H 0.093 (0.86)
CD2EN 0.383 (14,7) CD2A 0.093 (0.86)
CD1EN 0.383 (14,7) CD40H 0.092 (0.85)
CD3EN 0.383 (14,7) CD3A 0.092 (0.85)
EN 0.383 (14.7) CD4A 0.092 (0.85)
CD10H 0.093 (0.87) NH3 0.092 (0.85)
CD1A 0.093 (0.86) H 0.092 (0.84)
Loadings are normalized to 1.0.
2 Percent of total variance within each PC
64
because the diprotic acid is more abundant at this pH
than either the monoprotic acid or the base. If the
matrix was generated at pH 6.85 (pKal for the conjugate
acid of en) approximately equal loadings would be
expected for H2(en)2+ and H(en)+.
3.1.3.2 The pH 10.93 +/ 0.1 Matrix Factor Model
Eigenvalues from this matrix before and after
rotating five principal components are shown in Table
3.12. The decision to retain a third factor is justified
because the amount of variance assigned to principal
component 3 increases substantially from 13% to 22% after
rotation.
Table 3.13 shows normalized loadings for each of
the three factors. Factor 1 is largely composed of amine
and hydroxyl complexes along with Cd^ Factor 2 is the
ubiquitous ethylenediamine factor. Factor 3 reveals the
NH3/H+/NH4+ system. The higher coordinated hydroxyl
complexes also have major loadings on factor 3.
In this factor model little covariance exists
between amine complexes and NH^. At this pH, which is
1.7 units greater than the pKa for NH^+, excess NH^ is
available for complexation. Therefore, the amine
complexes are not expected to closely respond to changes
in the NH^ concentration.
Hydroxyl complexes, on the other hand, do respond
65
Table 3.12 Eigenvalues Before and After Rotating Five
Principal Components From the CdNHoen
Equilibrium Matrix at pH 10.93 +/ 0.1.
Eigenvalue (% Variance)
Before Rotation
After Rotation
12.7 (60.4)
5.33 (26.7)
2.58 (12.9)
1.85E2 (0.1)
5.81E6 (0.0)
9.60 (48.)
5.98 (29.9)
4.40 (22.0)
2.00E2 (0.1)
5.81E6 (0.0)
66
1 9
Table 3.13 Factor Loadings and Variance Contributions
from the CdNH^en Equilibrium Matrix at
pH 10.93 + / 0.1.
Factor 1
Variable Loading (% Var.) Variable Loading (% Var.)
CD20H 0.318 (10.1) . CD 0.266 (7.07)
CD30H 0.317 (10.1) CDlEN 0.233 (5.43)
CD40H 0.299 (8.96) CD2EN 0.153 (2.34)
CD10H 0.299 (8.96) CD3EN 0.127 (1.61)
CD6A 0.271 (7.33) EN 0.077 (0.60)
CD5A 0.270 (7.29) EN1H 0.072 (0.52)
CD4A 0.269 (7.25) EN2H 0.067 (0.44)
CD3A 0.268 (7.20) NH3 0.045 (0.20)
CD2A 0.267 (7.15) H 0.042 (0.18)
CD1A 0.267 (7.11) NH4 0.042 (0.18)
Factor 2
Variable Loading (% Var.) Variable Loading (% Var.)
CDlEN 0.460 (21.1) CD10H 0.062 (0.39)
CD2EN 0.423 (17.9) H 0.059 (0.35)
CD3EN 0.407 (16.6) NH4 0.059 (0.35)
EN 0.371 (13.8) CD6A 0.046 (0.21)
EN1H 0.362 (13.1) CD5A 0.046 (0.21)
EN2H 0.349 (12.1) CD4A 0.046 (0.21)
CD40H 0.097 (0.94) CD3A 0.045 (0.20)
CD30H 0.091 (0.83) CD2A 0.045 (0.20)
CD20H 0.080 (0.63) CD1A 0.044 (0.19)
NH3 0.068 (0.46) CD 0.044 (0.19)
Factor 3
Variable Loading (% Var.) Variable Loading (% Var.)
H 0.477 (22.8) CD10H 0.059 (0.35)
NH3 0.477 (22.8) CD 0.056 (0.32)
NH4 0.477 (22.7) CD1A 0.054 (0.29)
CD40H 0.379 (14.4) CD2A 0.052 (0.27)
CD30H 0.296 (8.75) CD3A 0.050 (0.25)
CD20H 0.184 (3.38) CD4A 0.047 (0.22)
EN 0.092 (0.84) CD5A 0.045 (0.20)
CD3EN 0.090 (0.80) CD6A 0.043 (0.18)
CD2EN 0.088 (0.77) EN1H 0.041 (0.17)
CDlEN 0.078 (0.60) EN2H 0.009 (0.01)
Loadings are normalized to 1.0.
2 Percent of total variance within each factor.
67
to changes in the H+ concentration. Along with the
N^/hVnh^ system, factor 3 reveals that higher
coordinated hydroxyl complexes respond more to changes in
H+ than the lower coordinated complexes.
It is certainly not obvious why covariance
between H+ and hydroxyl complexes is apparently greater
than covariance between NH3 and amine complexes. If we
look at the overall statistics for this data matrix, the
average concentration for 0H~ is 8.20E4 M while that for
NH3 is 9.73E6 M. On the other hand, the average Cd(OH)
concentration is 5.53E8 M, while the average CdfN^)2*
concentration is 1.99E11 M. As a result, the 0H and
Cd(OH)+ averages differ by approximately four fold,
whereas the NH3 and Cd(NH3)2 + averages differ by
approximately six fold. Therefore, because their
distribution averages are closer, we would expect greater
covariance between H+ and hydroxyl complexes than between
NH3 and the amine complexes.
Factor models of the pH 5.85 +/ 0.1 and pH 10.93
+/ 0.1 equilibrium matrices reveal significantly
different chemical interactions than those of the pH 8.39
+/ 0.1 data matrix. However, these chemical differences
as interpreted by the factors are consistent with the
expected chemical interactions given the relation of pH
with the pKas.
68
3.1.4 Analysis of the CdNH^en Equilibrium Matrix With
a Slack Variable
The pH 8.39 +/ 0.1 data matrix was expanded to
contain a slack (random) variable. In addition, ten
more data vectors were added so that the ratio of data
vectors to variables was consistant with those matrices
previously analyzed. The purpose of this analysis is to
demonstrate the effect of a slack variable on the factor
model of a matrix without random variance.
Eigenvalues before and after rotation of five
factors are shown in Table 3.14 for the matrices with and
without the slack variable. The matrix without the slack
variable was also expanded to 210 data vectors to
maintain consistency. Preliminary analysis showed
essentially no difference between the n=210 and n=200
data matrices.
In the table, principal component 4 from the
slack variable matrix has an eigenvalue of approximately
one. An eigenvalue of exactly 1.0 indicates that none of
the variance within the principal component is due to
covariance. Only a single variable contributes its
variance provided all the individual variances are
normalized (autoscaled or zscored) prior to
eigenextraction.[13] While eigenvalues greater than
one indicate detectable covariance among the variables,
eigenvalues less than one indicate that covariance
69
Table 3.14 Eigenvalues Before and After Rotating Five
Principal Components From Matrices With and
Without a Slack Variable.
Eigenvalue (% Variance)
Before Rotation After Rotation
Matrix Slack With Variable 12.6 (60.0) 6.07 (28.9) 1.35 (6.4) 0.978 (4.7) 1.81E2 (0.1) 9.32 (44.4) 6.70 (31.9) 3.19 (15.2) 1.76 (8.4) 4.20E2 (0.2)
Matrix Without 12.6 (62.9) 9.48 (47.4) ,
Slack Variable 6.07 (30.3) 7.08 (35.4)
1.34 (6.7) 3.40 (17.0)
1.82E2 (0.1) 4.00E2 (0.2)
7.77E5 (0.0) 7.77E5 (0.0)
70
amounts to less than the variance from one variable
alone. The more randomly a variable behaves in
relation to the other variables the closer a principal
component composed of that random variable approaches the
eigenvalue 1.0. Thus, the model correctly shows that at
least one variable is unrelated to all others.
Also in the table, principal component 5 has an
eigenvalue approximately equal to that of the fourth
principal component from the matrix without the slack
variable. It appears that an "extra" principal component
was created for the slack variable. In reality, of
course, both matrices have the same number of principal
components. To account for the additional variance
contributed by the slack variable, variance is skimmed
away from the first three principal components. This is
shown by slightly lower eigenvalues from the slack
variable matrix compared to those from the matrix without
the slack variable. The third principal component is
affected most, principal component one the least. The
remaining principal components do not appear to
contribute. In fact, principal components 517 from
the slack variable matrix actually increase in eigenvalue
compared to their counterparts from the other data
matrix.
While rotation is essential for interpreting
principal components 13 from the slack variable matrix,
71
it's effect is negligible on principal component 4. The
fourth principal component is correctly shown to be
unique whether rotated or not. Rotation merely served to
increase the amount of variance contributed by the slack
variable from 97% to >99%. However, the eigenvalue of
rotated principal component 4 is larger than expected for
a unique factor. The amount of variance nearly doubles
after rotation from 4.7% to 8.4%. This is likely due to
a smaller than ideal number of data vectors in the data
matrix, i.e, the result of stochastic bias. A minimum of
data vectors would allows outlying data points to distort
the variance distribution and cause random variance to
appear nonrandom. The effect would likely magnify after
rotation because fewer axes are available to account for
this variance. Stauffer, et. al, have shown that percent
variance of principal components from random data
decrease to the theoretical limit for a purely random
variable as the number of data vectors in the matrix
increases.[37]
A comparison of factor models from the two
matrices shows essentially no differences among the first
three factors if four factors are retained from the slack
variable matrix. If only three factors are retained
larger differences appear but these may be insignificant
in the overall interpretation of these factors. The
differences are attributable, of course, to the fact that
72
the three factors must now absorb random variance from
the slack variable. Table 3.15 shows normalized factor
loadings of the first three factors when only three
factors are retained from the slack variable matrix. In
terms of loadings, the slack variable (1RND) affects
factor 3 the most and factor 1 the least. Further
evidence of the vulnerability of factor 3 is shown in
the response of Cd2+, Cd(NH3)2 + and Cd(OH)+ on factor 1.
In factor 1, these variables are most affected by random
variance because they have the largest loadings in
factor 3.
This analysis shows that a variable which behaves
randomly affects principal components with lower
eigenvalues to a larger degree than principal components
with higher eigenvalues. This stands to reason because
strongest correlations should be reflected in principal
components or factors with the highest eigenvalues,
weakest correlations should be reflected in those with
lowest eigenvalues. As a consequence, weakest
correlations should be most affected by imposed random
variance.
3.1.5 Analysis of the CdZnFeNH^pyen Equilibrium
The CdNH^en equilibrium was expanded to include
Zn^ and Fe^ as metal reactants and pyridine as a base
reactant. In addition to all of the species from the Cd
73
Table 3.15 Factor Loadings1 and Variance Contributions2
From the Three Factor Model of the CdNH^en
Equilibrium Matrix (pH 8.39 +/ 0.1) Contain
ing a Slack Variable.
Factor 1
Variable Loading (% Var.) Variable Loading (% Var.)
NH4 0.438 (19.2) CD3A 0.118 (1.39)
H 0.434 (18.9) CDlEN 0.090 (0.81)
NH3 0.433 (18.8) EN2H 0.088 (0.78)
CD 0.315 (9.91) 1RND 0.084 (0.70)
CD6A 0.280 (7.82) CD20H 0.041 (0.17)
CD5A 0.245 (6.00) EN1H 0.041 (0.16)
CD40H 0.226 (5.12) CD2EN 0.037 (0.14)
CD4A 0.194 (3.77) CDE3N 0.021 (0.03)
CD30H 0.155 (2.41) EN 0.007 (0.01)
CD1A 0.153 (2.34) CD2A 0.003 (0.00)
CD10H 0.129 (1.66)
Factor 2
Variable Loading (% Var.) Variable Loading (% Var.)
CDlEN 0.467 (21.8) CD20H 0.052 (0.28)
CD2EN 0.412 (17.0) NH4 0.048 (0.23)
CD3EN 0.394 (15.6) CD3A 0.040 (0.16)
EN2H 0.374 (14.0) H 0.039 (0.15)
EN1H 0.370 (13.7) NH3 0.037 (0.14)
EN 0.361 (13.1) CD30H 0.032 (0.10)
1RND 0.111 (1.23) CD4A 0.025 (0.06)
CD 0.097 (0.93) CD40H 0.017 (0.03)
CDlA 0.081 (0.66) CD5A 0.014 (0.02)
CDlOH 0.078 (0.61) CD6A 0.006 (0.00)
CD2A 0.060 (0.35)
Factor 3
Variable Loading (% Var.) Variable Loading (% Var.)
CD 0.461 (21.3) 1RND 0.164 (2.70)
CDlA 0.365 (13.3) . EN2H 0.159 (2.54)
CDlOH 0.347 (12.1) CD3EN 0.153 (2.35)
CDlEN 0.270 (7.29) CD3A 0.138 (1.89)
NH4 0.257 (6.59) EN1H 0.132 (1.75)
H 0.247 (6.12) EN 0.104 (1.08)
NH3 0.243 (6.01) CD30H 0.100 (1.00)
CD2A 0.243 (5.89) CD4A 0.060 (0.36)
CD20H 0.209 (4.37) CD6A 0.035 (0.12)
CD2EN 0.180 (3.25) CD40H 0.025 (0.06)
CD5A 0.005 (0.00)
j Loadings are normalized to 1.0.
2 Percent of total variance within each factor.
74
N^en equilibrium, the expanded equilibrium contained
amine, hydroxyl and ethylenediamine complexes with zinc
and iro'n along with free pyridine and its conjugate acid.
Table 3.16 is a list of species involved along with the
labels used in ARTHUR.
Five equilibrium data matrices were generated
with pH means of 6.0, 7.0, 8.0, 9.0, 10.0. The pH
standard deviation for all matrices was 0.1. Each data
matrix contained 520 data vectors in order to maintain
the ratio of data vectors to variables at approximately
10:1. Data vectors were generated by varying the
reactant ethylenediamine concentration along with the H+
concentration in a manner similar to the generation of
the CdNH^en equilibrium data matrices. Concentrations
of reactants Zn Fe and pyridine were held constant
at 1.0E7, 1.0E9 and 1.0E5, respectively.
Concentrations of the remaining reactants were the same
as those in the CdNH^en equilibrium data matrices.
In each data matrix the concentrations of
Cd(NH3)52+ and Cd(NH3)g2 + were less than 1.0E38. As a
consequence, these variables were deleted prior to log
transformation. As in the CdN^en equilibrium data
matrices, conversion of concentrations to natural logs
resulted in a multivariate normal distribution.
There were two purposes for factor analyzing
these data matrices. First, we wanted to see if the
75
Table 3.16 Chemical Species in Equilibrium Model. the CdZn FeNH3~py
Species ARTHUR Label Species ARTHUR :
Cd2 + CD Zn(NH3)42+ ZN4A
Cd(NH3)2+ CD1A . Zn (OH) + ZNlOH
Cd(NH3)22+ CD2A Zn (OH) 2 ZN20H
Cd(NH3)32+ CD3A Zn (OH) 3 ZN30H
Cd(NH3)42+ CD4A Zn(OH)42 ZN40H
Cd(NH3)52+ CD5A Zn (en) 2 + ZNlEN
Cd(NH3)g2+ CD6A Zn(en) 22 + ZN2EN
Cd(OH)+ CDIOH Zn (en) 32 + ZN3EN
Cd(OH)2 CD20H Zn (py) 2 + ZN1PY
Cd(OH)3 CD30H Zn(py)22+ ZN2PY
Cd(OH)42 CD40H Zn(py)32+ ZN3PY
Cd(en)2+ CD1EN Zn(py)42+ ZN4PY
Cd(en)32+ CD2EN Fe2 + FE '
Cd(en)32+ CD3EN Fe(NH3)2+ FE1A
Cd (py) 2 + CD1PY, Fe (NH3) 22 + FE2A
Cd(py)22+ CD2PY Fe(OH) + FElOH
Cd (py) 33 + CD3PY Fe (OH) 2 FE20H
Cd (py) 42 + CD4PY Fe(OH)3" FE30H
Zn2 + ZN Fe (OH) 42 + FE40H
Zn (NH3) 2 + ZN1A Fe (en) 2 + FE1EN
Zn(NH3)22+ ZN2A Fe(en)22+ FE2EN
Zn(NH3) 32 + ZN3A Fe (en) 32 + FE3EN
Table 3.16
(Continued)
Species ARTHUR Label
Fe(py)2+ FE1PY
nh3 NH3
nh4+ NH4
PY PY
H (py) + PYH
en EN
H(en)+ EN1H
H2(en)2+ EN2H
EN2H
77
factor model of a multiple metal equilibrium matrix
differentiates between different types of metals. For
example, a CdZnFe equilibrium matrix produces three
factors mainly composed of cadmium, zinc or iron species.
Second, we wanted to see how a series of ligand bases
with widely differing pKa's would load onto factors over
a range of pH.
Figure 7 shows scree plots for the five data
matrices prior to rotation. Littie ambiguity exists in
determining that three factors should be retained from
the pH 7, 8, 9, and 10 data matrices. However, number
of factors to retain from the pH 6.0 + / 0.1 matrix is
not so obvious. Rotating five factors from this matrix
produces little change in the eigenvalues of the first
two factors while the eigenvalue of the third factor
increases from 1.3 before rotation to 1.6 after rotation.
Since this is similar to the increase seen for a rotated
unique factor (section 1.4), it appears.that only two
factors are necessary to interpret the data.
3.1.5.1 The pH 6.0 +/ 0.1 Matrix Factor Model
Table 3.17 shows normalized factor loadings from
the pH 6.0 + / 0.1 matrix. In factor 1 the first 35
species (Zn(py)^+ through H(py)+) have loadings of
approximately equal magnitude and are closely associated
with H+. Complexes involving all three types of metals
78
Figure 7 Scree Plots for the CdZnFeNI^pyen
Equilibrium Data Matrices.
79
Figure 7 (Continued)
80
Table 3.17
Factor Loadings1 From the CdZnFeN^pyen
Equilibrium Matrix at pH 6.00 + / 0.1.
Factor 1
Factor 2
Variable Loading Variable Loading
ZN1PY 0.166 EN2H 0.284
ZN2PY 0.165 EN1H . 0.283
CD1PY 0.165 FE1EN 0.279
ZN3PY 0.165 FE2EN 0.279
ZN4PY 0.165 FE3EN 0.279
CD2PY 0.164 EN 0.279
ZN10H 0.164 CD3EN 0.279
ZN1A 0.164 CD2EN 0.279
CD3PY 0.164 ZN3EN 0.279
CD4PY 0.164 ZN2EN 0.279
ZN20H 0.164 CD1EN 0.279
CD10H 0.164 ZN1EN 0.279
ZN2A 0.164 CD 0.178
CD1A 0.164 ZN 0.164
ZN30H 0.164 ZN1PY 0.023
ZN3A 0.164 CD1PY 0.016
ZN4A 0.164 ZN2PY 0.016
CD2A 0.164 NH4 0.015
CD20H 0.164 FE 0.014
ZN40H 0.164 ZN3PY 0.013
CD3A 0.164 CD2PY 0.012
CD30H 0.164 ZN4PY 0.012
CD4A 0.164 CD3PY 0.010
CD40H 0.164 CD4PY 0.010
FE1A 0.164 FE1PY 0.008
FE2A 0.164 PY 0.008
FEIOH 0.164 ZNlA 0.007
FE20H 0.164 ZN10H 0.007
NH3 0.164 CDIOH 0.006
FE40H 0.164 ZN20H 0.006
FE30H 0.164 ZN2A 0.005
H . 0.164 CD1A 0.005
PY 0.164 ZN30H 0.005
FE1PY 0.164 ZN3A 0.005
PYH 0.164 ZN4A 0.005
FE 0.148 CD2A 0.005
NH4 0.132 CD20H 0.005
EN2H 0.067 ZN40H 0.005
EN1H 0.050 CD3A 0.005
FE1EN 0.032 CD30H 0.005
FE2EN 0.032 CD4A 0.005
FE3EN 0.032 CD40H 0.005
81
Table 3.17 (Continued)
Factor 1
Variable Loading
CD2EN 0.032
CD3EN 0.032
ZN3EN 0.032
ZN2EN 0.032
EN 0.032
ZN1EN 0.032
CD1EN 0.032
CD 0.020
ZN 0.012
Factor 2
Variable Loading
FE1A 0.005.
FE2A 0.005
FE10H 0.005
FE20H 0.005
NH3 0.005
FE40H 0.005
FE30H 0.005
H 0.004
PYH 0.004
1
Loadings are normalized to 1.0
82
are present, specifically complexes of NHj and pyridine.
Pyridinium ion is also included while ammonium ion is
not. Free metals are excluded from this group along
with ethylenediamine species.
The composition of factor 1 comes as no surprise
because the concentrations of all bases are relatively
low at this pH. A slight change in pH results in
concomitant changes in the equilibrium concentrations of
the bases and complexes containing these bases.
Pyridinium ion concentration is also affected by change
in pH because its concentration is on the order of the
pyridine concentration. This, of course, is due to the
fact that the pKa (5.31) is somewhat near the pH mean.
On the other hand, NH^+ concentration is not as affected
by change in pH because its concentration is orders of
magnitude greater than NH^. Ammonium ion's pKa (9.26) is
much greater than the pH mean.
Ethylenediamine varies independently of the other
bases since it is varying as a reactant. As expected,,
free ethylenediamine and species involving it appear in a
separate factor (factor 2).
Further investigation of this model reveals that
O j.
the free metal Fe^ has a much higher loading than either
Cd2 + or Zn2+ in factor 1. On the other hand, Cd2+ and
Zn have much higher loadings than Fe^ m factor 2.
From these observations, Fe2+ appears more sensitive to
83
changes in pH than Cd2 + or Zn2 + while Cd2 + and Zn2 +
appear more sensitive to changes in the reactant
ethylenediamine concentration.
Unfortunately, these conclusions cannot be
confirmed in the. correlation matrix. Absolute values of
the correlations between H+ and each of the free metals
are very low, ranging from 0.012 to 0.133. Furthermore,
the correlation between H+ and Fe2 + is the lowest of the
three correlations. Since correlations between
equilibrium species and reactant ethylenediamine are
unobtainable, an alternative would be to compare
correlations involving the species with the highest
loading on factor 2 H2(en)2 + Here, also, the
correlation matrix fails to support the conclusion that
O j. O x T j.
Cd^ and Zn^ are more sensitive than Fe^ to changes in
the reactant ethylenediamine concentration. Although
Cd^ and Zn do have higher correlations than Fe with
H2(en)2 + they are all low in magnitude (0.068, 0.133,
0.012, respectively).
This apparent discrepancy between the factor
model and the correlation matrix does.not lessen the
importance of the Cd2+, Zn2+ and Fe2+ loadings in overall
interpretation. This is merely an example where the
correlation matrix obscures the full view of the
multivariate interactions. These low correlations do
suggest, however, that a third factor extracted from the
84
data matrix would mainly consist of the free metals.
This is indeed the case since Zn^ and Cd contribute
95% of the variance in the third factor.
It is difficult to explain why Fe is more
prominent in factor 1 while Cd2 + and Zn2 + are more
prominent in factor 2. One explanation may be related to
the fact that there are fewer iron complexes than either
cadmium or zinc complexes. Eleven complexes contain iron
while cadmium and zinc are each involved in 16 complexes.
With iron distributed among fewer total complexes, free
Fe^ may be more sensitive to changes in pH than either
Cd*1 or Zn Along with concentrations of the free
bases, change in pH affects the concentrations of
complexes which contain the bases as ligands. Higher
coordinated complexes are more affected than lower
coordinated complexes. Therefore, if fewer complexes are
involved the difference between the effect to higher
coordinated complexes and the effect to lower coordinated
complexes may be less than if more complexes are
involved. If this is the case, the difference between
the effect to higher coordinated complexes by change in
pH and the effect to the free metal itself may also be
minimal. The result would be a higher loading of the
free metal on the pH dependent factor compared to other
metals in the equilibrium that form more complexes.
85
3.1.5.2 The pH 7.0 +/ 0.1 Matrix Factor Model
In the pH 7.0 + / 0.1 matrix model (Table 3.18)
the first factor involves species that are sensitive to
changes in pH and the third factor involves
ethylendiamine species. The second factor largely
consists of free metals and metalpyridine complexes.
Hydrogen ion has a relatively low loading on this factor
indicating that the free metals and metalpyridine
complexes are less sensitive to changes in pH than the
major variables in factor 1. On the other hand, free
pyridine and its conjugate acid have large loadings on
factor one and are, therefore, sensitive to changes in
pH.
The factor model shows how the two bases,
pyridine and NH^, behave at a pH that lies between the
pKa's of the respective conjugate acids. The
concentration of pyridine is orders of magnitude greater
than its conjugate base. The difference is great enough
so that complexes with pyridine as a ligand respond less
to changes in pH than free pyridine itself or its
conjugate acid. On the other hand, at pH 7 ammonia
concentration is orders of magnitude less than its
conjugate acid concentration and complexes involving the
base respond proportionately to changes in pH.
86
Table 3.18 Factor Loadings^ From the CdZnFeNH^pyen
Equilibrium Matrix at pH 7.00 + / 0.1.
Factor 1 Factor 2 Factor 3
Variable Loading Variable Loading Variable Loading
H 0.208 FE1PY 0.286 ZN1EN 0.292
NH3 0.208 FE 0.285 EN2H 0.287
PYH 0.208 CD 0.266 CD1EN 0.286
FE40H 0.207 CD1PY 0.265 ZN2EN 0.284
FE30H 0.206 CD2PY 0.263 ENlH 0.282
FE20H 0.205 CD3PY 0.260 ZN3EN 0.281
FE2A 0.205 ZN 0.258 CD2EN 0.281
PY 0.204 CD4PY 0.258 CD3EN 0.279
FE1A 0.202 . ZN1PY 0.257 FE1EN 0.277
FE10H 0.202 ZN2PY 0.256 FE2EN 0.277
NH4 0.200 ZN3PY 0.255 FE3EN 0.276
CD40H 0.197 ZN4PY 0.254 EN 0.276
CD4A 0.197 ZN1A 0.135 FE1PY 0.090
CD30H 0.193 ZNIOH 0.135 FE 0.090
CD3A 0.193 ZN1EN 0.087 CD 0.058
ZN40H 0.190 CD1A 0.083 CD1PY 0.057
ZN4A 0.190 CD10H 0.082 CD2PY 0.057
CD20H 0.183 CD1EN 0.077 CD3PY 0.056
CD2A 0.182 H 0.075 CD4PY 0.055
ZN30H 0.182 EN2H 0.075 ZN 0.044
ZN3A 0.181 PYH 0.074 ZN1PY 0.044
ZN20H 0.164 NH3 0.074 ZN2PY 0.044
ZN2A 0.163 ZN2EN 0.073 ZN3PY 0.043
CD10H 0.148 PY 0.072 ZN4PY 0.043
CD1A 0.147 NH4 0.072 H 0.024
FE 0.129 FE40H 0.068 NH3 0.024
ZN10H 0.110 ZN3EN 0.068 PYH 0.024
ZN1A 0.110 CD2EN 0.068 PY 0.023
FE1PY 0.098 ENlH 0.067 NH4 0.022
EN2H 0.067 FE30H 0.067 FE40H 0.022
CD 0.051 CD3EN . 0.065 FE30H 0.021
ZN 0.050 FE20H 0.062 FE20H 0.020
EN1H 0.047 FE2A 0.062 FE2A 0.020
ZN1PY  0.046 FE1EN 0.062 FE1A 0.016
CD1PY 0.046 FE2EN 0.060 FEIOH 0.016
ZN2PY 0.043 FE3EN 0.060 ZN1A 0.015
CD2PY 0.041 EN 0.059 ZNIOH 0.015
ZN3PY 0.040 ZN2A . 0.053 CD40H 0.014
ZN4PY 0.037 ZN20H 0.052 CD4A 0.014
CD3PY 0.036 FE1A 0.050 ZN40H 0.012
ZN1EN 0.033 FEIOH 0.049 ZN4A 0.012
CD4PY 0.030 CD40H 0.031 CD1A 0.011
CD1EN 0.030 CD4A 0.031 CD30H 0.011
