Citation
Prediction of prokaryotic optimum growth temperature based on genmoic and proteomic features

Material Information

Title:
Prediction of prokaryotic optimum growth temperature based on genmoic and proteomic features
Creator:
Iyer, Mallika ( author )
Place of Publication:
Denver, Colo.
Publisher:
University of Colorado Denver
Publication Date:
Language:
English
Physical Description:
1 electronic file (60 pages) : ;

Thesis/Dissertation Information

Degree:
Master's ( Master of science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Integrative Biology, CU Denver
Degree Disciplines:
Integrated science

Subjects

Subjects / Keywords:
Bacterial genetics ( lcsh )
Prokaryotes ( lcsh )
Temperature -- Physiological effect ( lcsh )
Bacterial genetics ( fast )
Prokaryotes ( fast )
Temperature -- Physiological effect ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Review:
Bacteria and Archaea are known to grow at a wide range of temperatures, from as low as -1.8oC to as high as 110oC. However, less is known about what adaptations allow them to do so. Many studies have tried to identify the genomic and proteomic adaptations that allow these organisms to grow at temperature extremes. None of these studies have led to the development of a predictor of optimum growth temperature that is accurate across a wide range of temperatures, highlighting in particular the knowledge gap in our understanding of what feature selection operates on at intermediate temperatures. Furthermore, many of these studies were performed when genomic databases were small and phylogenetically biased. In this study, we attempt to validate the correlations reported in these studies between genomic/proteomic features and optimum growth temperature on a large, phylogenetically diverse modern genomic dataset. We then use a machine learning approach to combine these features into a super-predictor of optimum growth temperature. We find that many of these features are only weakly correlated with optimum growth temperature in our expanded dataset, although the correlations are stronger at higher temperatures. However, a novel combination of features significantly outperforms all individual features in predicting optimum growth temperature with high accuracy. Finally, we extend this study to shotgun metagenomic data by calculating two of the features on metagenomic reads and testing the correlation between the feature and metagenomic sampling temperature. Our study offers new insights into the selective pressures on genomic and proteomic features that may help tune prokaryotic optimum growth temperature.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: Adobe Reader.
Statement of Responsibility:
by Mallike Iyer.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
on10066 ( NOTIS )
1006655200 ( OCLC )
on1006655200
Classification:
LD1193.L584 2017m I94 ( lcc )

Downloads

This item has the following downloads:


Full Text
PREDICTION OF PROKARYOTIC OPTIMUM GROWTH TEMPERATURE BASED ON
GENOMIC AND PROTEOMIC FEATURES by
MALLIKA IYER
B.Sc., Savitribai Phule Pune University, 2015
A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Master of Integrated Sciences Integrated Sciences Program
2017


This thesis for the Master of Integrated Sciences degree by
Mallika Iyer
has been approved for the Integrated Sciences Program by
Christopher S. Miller, Chair Hai Lin
Michael Strong
Date: July 29, 2017


Iyer, Mallika (MIS, Integrated Sciences Program)
Prediction of Prokaryotic Optimum Growth Temperature Based on Genomic and Proteomic Features
Thesis Directed by Assistant Professor Christopher S. Miller
ABSTRACT
Bacteria and Archaea are known to grow at a wide range of temperatures, from as low as -1.8C to as high as 110C. However, less is known about what adaptations allow them to do so. Many studies have tried to identify the genomic and proteomic adaptations that allow these organisms to grow at temperature extremes. None of these studies have led to the development of a predictor of optimum growth temperature that is accurate across a wide range of temperatures, highlighting in particular the knowledge gap in our understanding of what feature selection operates on at intermediate temperatures. Furthermore, many of these studies were performed when genomic databases were small and phylogenetically biased. In this study, we attempt to validate the correlations reported in these studies between genomic/proteomic features and optimum growth temperature on a large, phylogenetically diverse modern genomic dataset. We then use a machine learning approach to combine these features into a super-predictor of optimum growth temperature. We find that many of these features are only weakly correlated with optimum growth temperature in our expanded dataset, although the correlations are stronger at higher temperatures. However, a novel combination of features significantly outperforms all individual features in predicting optimum growth temperature with high accuracy. Finally, we extend this study to shotgun metagenomic data by calculating two of the features on metagenomic reads and testing the correlation between the feature and metagenomic sampling temperature.


Our study offers new insights into the selective pressures on genomic and proteomic
features that may help tune prokaryotic optimum growth temperature.
IV


ACKNOWLEDGEMENTS
This project would not have been possible without the help and guidance of many people. I would like to thank Dr. Christopher Miller, my advisor, for mentoring me throughout the duration of the project and helping me pursue my goal of research in interdisciplinary science. I would also like to thank Dr. T. B. K. Reddy, from the Department of Energy Genomes Online Database, for providing the initial optimum growth temperature and accession information that allowed me to put together a dataset of genomes annotated with optimum growth temperatures. I would like to thank the rest of my lab members, for critiquing my work at all stages and providing useful feedback. Finally, I would like to thank my family who made it possible for me to pursue graduate studies and have supported me throughout the duration of my studies.
v


TABLE OF CONTENTS
CHAPTERS
I. INTRODUCTION.................................................2
II. MATERIALS AND METHODS........................................7
III. RESULTS AND DISCUSSION.....................................15
IV. CONCLUSIONS................................................39
REFERENCES........................................................41
APPENDIX A: SUPPLEMENTARY METHODS.................................45
APPENDIX B: SUPPLEMENTARY FIGURES AND TABLES......................47
1


CHAPTER I
INTRODUCTION
Prokaryotes are known to grow at a wide range of temperatures, ranging from as low as -1.8C (1) to as high as 110C (2). However, it is not understood what allows them to grow at such a range of temperatures. Selection for specific genomic or proteomic features might stabilize molecular constituents of the cell for increased half-life or stability that is optimal at organism growth temperature. Although this stability cannot be measured easily, it has long been suggested that stability can be inferred from genomic and proteomic sequence information, such as the GC percentage of certain genes (2), or frequency of certain amino acids [4, 5).
Many studies have examined the relationship between various genomic and proteomic features and prokaroytic optimum growth temperature (OGT) (5-9). For examples, studies examining genomic features have found that the GC content of 16S rRNA stems, 5S rRNA and tRNA correlate with OGT (6). This can be explained by the fact that GC base pairs are bonded by three hydrogen bonds, which makes them more thermostable than AT base pairs, which are bonded by two hydrogen bonds. Higher environment temperatures would require greater thermostability to prevent denaturation of RNA secondary structures. GC content has also been examined in certain genes and used to predict OGT (2).
Some proteomic features have also been found to vary based on optimum growth temperature. For example, Beeby et al. (10) and Jorda et al. (7) found a greater number of disulfide bonds in proteins derived from thermophilic organisms. Disulfide bonds are stabilizing bonds formed between pairs of cysteine residues that would increase thermostability at higher temperatures. Chakravarty and
2


Varadarajan (5) examined the structural differences between a set of mesophilic and thermophilic proteins and found that the latter have a greater number of salt bridges/ion pairs, suggesting that these non-covalent interactions also stabilize proteins at high temperature. Studies have also shown that different amino acid groups (polar, charged, hydrophobic groups) have different frequencies in the secondary structures of thermophilic, mesophilic and psychrophilic proteins (4, 5,
8). This could be due to the fact that thermophilic proteins require increased thermostability of their secondary structures, while psychrophilic proteins require more flexibility to allow for proper functioning at lower temperatures. In one study conducted by Zeldovich et al. (9) it was found that the fraction of IVYWREL amino acids in a proteome correlates strongly with the OGT of the organism. This correlation was found by calculating the combined frequency of all possible combinations of amino acids to determine which had the strongest correlation. In another study by Ku et al. (11) it was found that dipeptide frequencies could be used to predict protein melting temperature (Tm). In this study, Ku et al. (11) calculated the potential for any given dipeptide to occur in a high Tm protein and a low Tm protein. Based on this, some dipeptides were inferred to contribute to higher protein melting temperatures and some were inferred to reduce protein melting temperatures. By calculating the frequency of occurrence of these dipeptides, an unknown protein could be classified as a "high Tm" or "low Tm" protein. The percentage of high Tm proteins in the proteome of an unknown organism could then be used to classify the organism as thermophilic or mesophilic. Chang et al. (12) found in their study on a structural model of metabolism in E. coli, that the thermostability of a few key proteins determined the OGT of the E. coli. Although it is currently only possible to infer such a model for well-studied organisms like E. coli,
3


this suggests a mechanism by which the Tm of the proteins in an organism might dynamically dictate OGT. Finally, Sawle et al. [13), and Dill et al. [14), found that the protein chain length distribution of a proteome could be used to predict the OGT of the organism. In combination with calculations of the free energy of folding of thermophilic and mesophilic proteins, the chain length distribution was successfully used to predict the growth rate at different temperatures for a set of six mesophiles and six thermophiles.
Although these studies found a relationship between various genomic/proteomic features and OGT, none of these features were found to be good predictors of OGT across a broad range of temperatures. Furthermore, none of these studies tried a combination of more than one feature to predict OGT. Machine learning approaches involving a combination of features have been found to work well in other bioinformatics problems [15,16). For example, in the DREAM5 challenge, competing groups used different approaches to infer the transcriptional regulatory networks of E. coli, S. cerevisiae and an in silico network from gene expression data [15). Marbach et al. [15) used the DREAM5 platform to determine the performance of a community approach involving a combination of the different methods used in the challenge. It was found that the community predictor performed better than any individual approach. Similarly, a combination of features could be used to predict prokaryotic optimum growth temperature with higher accuracy than any individual feature.
Prediction of OGT could also be applied to the study of microbial communities. Microbial communities consist of groups of organisms with different OGTs growing together in a common environment. Community structure affects community dynamics, which in turn influences the environment [17). Many
4


environmental factors influence community structure, with environment temperature being a powerful one [17-19). A predictor of prokaryotic OGT could be used to predict the "optimum growth temperature" of a microbial community based on features calculated on metagenomic reads. If successful, this predictor could in turn be used to predict and understand how community structure might change with a shift in environment temperature.
Many of the studies conducted previously to identify genomic and proteomic determinants of OGT were based on small, phylogenetically biased datasets. With the advent of next generation sequencing technologies, the number of genomes in public databases has increased exponentially (Figure 1).
(D
E
o
c
(D
O)
o
L_
(D
-Q
E
13
C
o
l
Year
Figure 1 Total number of genomes in the NCBI database (56) by year. The
advent of next generation sequencing technology has led to an exponential increase in the number of genomes in the database.
In this study, we used a large, phylogenetically diverse dataset of genomes annotated with OGT to examine and validate previously reported correlations. We then used a machine learning approach to combine these features and predict OGT on individual genomes. We found that many of the correlations found in earlier
5


studies were reproduced only weakly on our expanded dataset, when the full temperature range was considered. However, most features showed stronger correlations at higher temperatures. When these features were combined using Support Vector Machine algorithm, the combined predictor outperformed all individual features in predicting OGT.


CHAPTER II
MATERIALS AND METHODS
Genomic Dataset: A table containing optimum growth temperature (OGT) data and accession numbers to publicly available genomes was obtained from the Department of Energy Genomes Online Database (GOLD) [20) courtesy of Dr. T. B. K. Reddy on 19 October 2016 (personal communication). Each prokaryote in the table had a unique GOLD ID. GOLD IDs were further linked to one or more Genbank IDs [21). The optimum growth temperature data is either reported in the literature or self-reported by investigators when they deposit genomes, and contained single temperatures as well as ranges. Temperature data was cleaned by taking the mean of any temperature ranges that were provided. The Genbank IDs [21) provided for each GOLD ID were used to obtain Genbank/Refseq assembly accession numbers from the National Center for Biotechnology (NCBI) Genbank /Refseq webpages [21, 22). These assembly accessions were then used to download genomes, proteomes and fasta files of RNA sequences for each organism via the NCBI FTP site. Genbank accessions were used when Refseq accessions were not available. The total number of organisms with OGT data and downloaded genomes was ~3000.
Computing Resources: All analyses were performed on a Linux scientific computing cluster. The statistical calculations were conducted using RStudio version 0.99.491.
Calculation of Non-Structural Features: Each GOLD ID in the data was treated as an individual organism. Some GOLD IDs had multiple Genbank IDs (and therefore, multiple genomes/proteomes/RNA files) associated with it representing, for
7


example, different chromosomes or plasmids. Features were calculated by concatenating all genomes/proteomes/RNA files associated with each GOLD ID.
Calculation of IVYWREL fraction: A custom python script making use of Biopython (23) was used to combine all the fasta files (all proteomes) for each GOLD ID into one file, and then to calculate the fraction of all amino acids that were I, V, Y, W, R, E or L. Unidentified amino acids (such as those encoded by the character 'X') were left out from the calculation entirely, as they do not add any information about the IVYWREL frequency of the proteome.
Calculation of the High Tm Protein Percentage (HTPP): All proteomes associated with a single GOLD ID were combined for this calculation. The Tm Index (TI) weights assigned for each dipeptide by Ku et al. (11) was used in the calculation, along with the formula presented by Ku et al. (11):
() Pindexixtyt+1) 9372 77 - 398
(L = number of amino acids in the sequence
(AiYi+i) is a specific dipeptide in the sequence)
Dipeptides containing unidentified amino acids were left out entirely from the calculation as no reasonable conclusions can be made about the TI weights for such dipeptides.
GC content of 16S rRNA stems: RNA files ending in "rna_from_genomic.fna.gz" downloaded via the NCBI FTP were used for this calculation. Fasta header descriptions were searched for the string "16S ribosomal RNA" to identify all 16S sequences from all RNA files associated with each GOLD ID. These were then combined into a single file, which was aligned with SINA (24) to the SILVA SSU rRNA
8


database (SSURef NR99 version 128) (25) using the following command line parameters:
sina -i intype=fasta -o outtype=fasta -ptdb /home/data/silva/SSURef_NR99_128_SILVA_07_09_16_opt.arb
The SILVA alignment also contains annotation of the 16S rRNA helix structure, which was obtained as a string matching the full 16S alignment courtesy Dr. Elmar Pruesse (personal communication). This string was then used to annotate ah the bases in our SINA-produced 16S sequence alignment that belonged to helical regions. If a GOLD ID had multiple 16S sequences, for each helix position, we calculated the fraction of GC bases over ATCG bases, down the alignment (so that each helix position had a GC fraction). The mean of these GC fractions was then used as the final fraction of GC bases in the helical regions of the 16S sequences per GOLD ID.
GC content of 5S rRNA and tRNA: Parsing the rRNA file mentioned above for fasta header descriptions for the string "5S ribosomal RNA" identified ah 5S sequences from ah RNA files associated with each GOLD ID. These were then combined into a single file per GOLD ID, which was then parsed to calculate the fraction of GC bases over ATGC bases. The GC content of tRNA was calculated in a similar way, except that the fasta header descriptions were parsed for the string "tRNA."
Prediction of Protein Structures: MODPIPE (26) was used to predict structures for ah proteins. This is an automated pipeline that uses MODELLER (27) to model protein structures from primary sequence and existing homologous templates. The MODPIPE runtime was found to average approximately 0.5 hours per protein on our hardware. For a dataset of ~3000 bacterial proteomes (containing approximately ~3000 proteins each), the total runtime was calculated to be ~5
9


years (when run on 100 CPUs simultaneously). Therefore, we decided to model only a subset of commonly found proteins for each organism (GOLD ID).
Selection of the protein subset: Hidden Markov Models (HMMs) were
downloaded for 107 single-copy bacterial genes [28). Since the Protein Data Bank
(PDB) (29) was to be used as the source of protein structure templates in the
following modeling step, we decided to identify 10 single copy bacterial genes that
had the most number of structures available in the PDB (29). In order to do this, we
used the following hmmsearch command line from the HMMer suite of tools
(version 3.1b2) [30) against a database of PDB sequences downloaded from the
MODELLER website on 28 June 2016.
hmmsearch -tblout -cut_tc -o essential.out
/home/iyer/research/modtest/analyses/essential.hmm
/home/software/modpipe-2.2.0/database/PDB95/db/pdb_95.fasta
We then used the same HMMs and hmmsearch command to calculate how many of these ten proteins are found in our dataset of proteomes (Table 1).
Table 1 Number of proteomes with 0-10 of the top ten proteins.
NUMBER OF PROTEINS (x) NUMBER OF PROTEOMES WITH x PROTEINS
0 44
1 1
2 0
3 5
4 43
5 162
6 12
7 25
8 71
9 379
10 2700
Although all ten proteins are not found in all proteomes, it was assumed that the large number of proteomes available for training our prediction model would
10


outweigh the information lost from the lack of some of these proteins in these proteomes.
Modeling of protein structures and selection of best model: Since each GOLD ID
had one or more proteome files associated with it, we used hmmsearch (30) to
identify the best hit for each of the ten proteins across all the proteomes associated
with each GOLD ID. Structures for the best hits were then modeled using the
following MODPIPE commands:
~/AddSeqMP.py conf_file
sequence_file
~/ModPipe.pl conf_file
sequence_id
hits_mode 1000
MODPIPE is a template-based protein structure prediction program, and produces multiple models for each modeled protein based on different possible templates chosen as the initial model. The Discrete Optimized Protein Energy (DOPE) score was used to select the best model (31). The DOPE score is an atomic distance-dependent statistical potential and lower DOPE scores correspond to better models (31). The model with the lowest DOPE score (31) was selected as the best model for each protein and was used for the calculation of the structural features.
Annotation of Secondary Structures and Exposed Residues" for Best Models: DSSP (32) was used to annotate secondary structures in predicted protein structures, via the PDB.DSSP module in Biopython (23,33). DSSP uses the codes B and E for beta sheets and beta strands respectively. These were treated as a single secondary structure type. DSSP uses the codes T and S for turns and bends respectively. These were treated as another single secondary structure type (loops). Helices were identified by the DSSP codes H, I and G. A residue was considered
11


exposed if its relative solvent accessibility was greater than 0.05 [34). We used the Wilke values [34) to define each residues accessible surface area.
Calculation of Amino Acid Group Frequencies in Secondary Structures and Exposed Residues: A custom script using the Biopython PDB module [23, 33) was used to calculate the frequencies of the following amino acid groups in helices, beta sheets/strands, loops/bends and exposed residues:
Positively charged amino acids: lysine, arginine and histidine Negatively charged amino acids: aspartate, glutamate Polar amino acids: glutamine, asparagine, serine, threonine, and cysteine Hydrophobic amino acids: alanine, valine, lysine, isoleucine, methionine, phenylalanine, tyrosine, and tryptophan.
Frequencies for proline and glycine were calculated individually as they confer the properties of rigidity and flexibility respectively to protein structures. Unidentified amino acids (such as those encoded by the character 'X') were ignored completely from the calculations, as they provide no information about the frequencies of the groups defined above.
Disulfide bond richness: Disulfide bond richness was defined as the number of disulfide bonds per residue in the proteome. Two cysteine residues were considered to form a disulfide bond if their sulfur atoms were at a distance of 8A or less (35). A custom script using Bio.PDB [23,33) was used to calculate this for predicted structures. The number of disulfide bonds was calculated for each best model for a given organism and divided by the number of residues in the model. These fractions were then averaged across ah the ten proteins for the organism.
12


Salt bridge richness: Salt bridge richness was defined as the number of salt
bridges per residue in each protein. We used the program VMD (version 1.9.3) (36)
to calculate the number of salt bridges in each model. The command used was:
mol new type pdb package require saltbr
saltbr -sel [atomselect top all] -upsel yes -frames 0:1:0 -ondist 4.0 -comdist none -writefiles no
The number of salt bridges was divided by the number of residues in the model, and the fractions calculated were then averaged across all proteins per GOLD ID.
Machine learning integration of features: We used two support vector machine approaches (as implemented by scikit-learn (version 0.15.0) (37) to combine the features and predict OGT. We implemented Support Vector Regression and Support Vector Classification, using a nested cross-fold validation approach.
This was done for our entire dataset, and two subsets of the dataset corresponding to temperatures <40C (low temperature subset) and >40C (high temperature subset). We also applied these methods to a subsampled dataset containing a random subset of organisms from each 10C bin from 0-100C. Detailed methods can be found in appendix A (supplementary methods).
Metagenomic dataset: We used a subset of the samples generated by the Tara Oceans project (33). Supplementary table SI (appendix B) lists the accession information and temperature metadata for the samples we used (39, 40). The SRA accession numbers were used to download reads from NCBIs Sequence Read Archive [41), using the following prefetch (version 2.8.0) and fastq-dump (version 2.8.0) commands from the SRA toolkit [42): prefetch
fastq-dump gzip skip-technical readids dumpbase split-files clip
13


Annotating genes in the reads: We converted the fastq files to fasta format
(only the .1 fastq files were used). FragGeneScan+ (43) was used to annotate genes.
FGS+ -s -o -w 0 -r -t illumina_5
-p 8 -m 12288 -e 1 -d 0
The protein sequence files produced by FragGeneScan+ were used to calculate the features.
Calculation of IVYWREL on metagenomic reads: Some of the samples produced by the Tara Oceans project (38) had one or more SRA accessions associated with it. We combined the reads from all accessions for a given sample and calculated the fraction of amino acids that were I, V, Y, W, R, E or L. Unidentified amino acids were ignored completely from the calculation.
Calculation of the HTPP from metagenomic reads: We combined the reads from all accessions for a given sample. The percentage of high Tm proteins was calculated on these reads using the same method used for complete proteomes.
14


CHAPTER III
RESULTS AND DISCUSSION
We curated a dataset of genomes (and associated inferred proteomes and RNA sequences) annotated with optimum growth temperature (OGT) based on data available in the Genome Online Database (GOLD) (44) and NCBIs Genbank (21)/Refseq (22) repositories. We then used this dataset to examine and validate correlations previously found in the literature between OGT and various genomic and proteomic features.
Fraction of IVYWREL amino acids in the proteome: Zeldovich et al. (45) showed that the fraction of IVYWREL amino acids in the proteome of an organism is correlated with its optimum growth temperature (Pearson r = 0.93). We calculated the fraction of amino acids that were I, V, Y, W, R, E or L across all the available NCBI proteomes for each organism. It was found that the fraction of IVYWREL showed an overall correlation with OGT (r = 0.75), with a stronger correlation for the data points above 40C (r = 0.84) (Figure 2).
a
0.48
LU 0.45
§
>
O 0.42
C
o
H
o
03
i
LL
0.39
Figure 2 Fraction of IVYWREL amino acids in the proteome vs. OGT.
r = 0.75
6 25 50 75 100
Optimum Growth Temperature C
15


10 20 30 40
Optimum Growth Temperature C
0.475 -
LLI
CC 0.450-
§
>
>
o 0.425-
o
CD
0.400 -
0.375 -
? II p DO 4^
1 i <
a Jtl : r i : t ir : I "I* i- 1 1 *. : . 1
. ,-i jr't l ih-r-'1 fiit
1 s .
40 60 80 100
Optimum Growth Temperature C
Figure 2 Fraction of IVYWREL amino acids in the proteome vs. OGT. The
IVYWREL fraction shows a strong correlation to OGT over the entire temperature range (a). The correlation is weaker at temperatures below 40C (b) and stronger at temperatures above 40C (c).
In their original publication, Zeldovich et al. (45) calculated the correlation between OGT and all possible combinations of amino acids and found that the IVYWREL combination had the strongest correlation. We found this to be one of our strongest individual predictors of OGT using an updated genomic dataset. Although no biophysical explanation was provided for this, it can be observed that the IVYWREL set of amino acids contains the largest hydrophobic amino acids (I, V, Y,
16


W, L) and the largest positively charged (R) and negatively charged (E) amino acids. It is possible that this combination maximizes both hydrophobic interactions and charge interactions in the proteome at higher temperatures, leading to increased stability. However, future studies could attempt to explore the correlations between OGT and other amino acids combinations (such as all charged amino acids: RKHDE), which could have a stronger biophysical basis.
GC content of 16S rRNA stems, 5S rRNA and tRNA: Galtier and Lobry (6) found that the fraction of GC bases in the helical regions of 16S rRNA stems, 5S rRNA and tRNA in an organism is correlated with optimum growth temperature (OGT). GC base pairs are bonded with 3 hydrogen bonds, while AT base pairs are linked with 2 hydrogen bonds. Therefore, an increased GC fraction could confer additional stability to RNA secondary structures at higher temperatures via more hydrogen bonds per base pair. However, when we calculated these fractions for the organisms in our dataset, we found only a weak correlation for all three features (0.41, 0.47 and 0.24). Again, these correlations were stronger at higher temperatures (OGT >40C) (0.76, 0.75, 0.52), consistent with the idea that additional stability is only selected for with increased temperature, and there is little selection pressure to maintain fewer hydrogen bonds in RNA secondary structure below a certain growth temperature. The dataset used by Galtier and Lobry (6) ranged from 51-165 organisms, while our dataset ranged from 3104-3276 organisms, which could account for the discrepancies in the results of the two studies.
17


a
b
o
CD
o
c
O
0.90-
0.85-
0.80-
0.75-
0.70-
0 25 50 75 100
Optimum Growth Temperature C
40 60 80 100
Optimum Growth Temperature C
d
c
O
0
r = 0.47 _ 10. 1.V *. r ; .;
* yirf m i a

0 25 50 75 100
Optimum Growth Temperature C
r = 0.75 1
i;: *.!
iifctfi i *' * kj. f r
i i
40 60 80 100
Optimum Growth Temperature C
6 25 50 75 1 00 40 60 80 100
Optimum Growth Temperature C Optimum Growth Temperature C
Figure 3 Correlation between GC content of rRNA and OGT. (a) The GC content of 16S rRNA stems is weakly correlated with OGT across the entire range of temperatures (r = 0.41). (b) This correlation is much stronger at temperatures above 40C with r = 0.76. (c) The GC content of 5S rRNA is correlated with OGT across the entire range of temperatures (r = 0.47). (d) This correlation is much stronger at temperatures above 40C (r = 0.75). (c) The GC content of tRNA is very weakly correlated with OGT when the entire range of temperatures is considered (r = 0.24). (d) The correlation is stronger when only temperatures above 40C are considered (r = 0.52).
Dipeptide frequency: Ku et al. {11} found that the frequency of different dipeptides in a protein could serve as a marker of its melting temperature. In this study, Ku et al. {11} examined the frequency of occurrence of all combinations of dipeptides in a set of 16 high Tm proteins and in a set of 19 low Tm proteins. Based on their occurrence in each set, each dipeptide was assigned a weight. By calculating
18


the frequency of occurrence of each dipeptide in a protein with unknown Tm, they were then able to classify the protein as high Tm or low Tm, with 100% accuracy. The percentage of high Tm proteins in a proteome (HTPP) was then used to predict whether a prokaryote was mesophilic or thermophilic. Furthermore, in another study by Chang et al. [12), it was shown that the growth rate of E. coli depended on the melting temperature of a few key metabolic enzymes. This suggests that the Tm of the proteins in a proteome, if possible to infer from primary sequence for even a few key proteins, could serve as an indicator of the optimum growth temperature of a prokaryote.
We implemented the same computation as Ku et al. [11) to determine the correlation between the HTPP of a proteome and the OGT of the prokaryote. When applied to our larger, modern dataset, only a weak correlation was found between HTPP and OGT (r = 0.38). The correlation was very weak below 40C (r = 0.11) and was stronger at higher temperatures (r = 0.61) suggesting that this would be a good indicator of Tm only at higher temperatures (Figure 4).
OGT C
Figure 4 Correlation between the High Tm Protein Percentage and OGT.
19


c
0.7
0.6
CL
0.5
X
0.4
0.3
Figure 4 Correlation between the High Tm Protein Percentage and OGT. (a)
The correlation between HTPP and OGT is weak when the entire range of temperatures is considered. This could be because of the data points below 40C, which show a very weak correlation when considered separately (b). However, the data points above 40C show a much stronger correlation, with r =0.61.
r n
r = u.bl
. I

?
-Mi
i *i i
r !
: a
tg f I g
. I*.
i *.*! j*

: ; : :

40
60 80 OGT C
100
20


The proteins used by Ku et al. (11) in their analysis conformed to specific characteristics, such as having a single transition state during thermal denaturation. Furthermore, they used an empirically derived formula based on this dataset to predict the Tm of any unknown proteins. We used the same formula in our analysis. However, the proteins in our dataset would not necessarily conform to the same characteristics as those in the original study. This could explain why the correlation fails to hold up against a more diverse dataset with different proteins of interest.
Prediction of protein structures: Many of our proteomic features required
protein structures. We used MODPIPE (26) to model structures for ten proteins in all
the proteomes in our dataset (see Methods). These ten proteins are in a larger set of
single copy proteins found in 95% of all bacterial genomes (28) and are the proteins
in this set that have the highest number of homologs deposited as 3D protein
structures in the Protein Data Bank (29). Therefore, these proteins are highly
conserved and can be considered to be an important subset of the proteins found in
a typical bacterial proteome. They should therefore, be the suitable candidates for
detecting adaptations to growth at different temperatures across a wide range of
proteomes. Table 2 lists the names and TIGRFAM IDs of these ten proteins.
Table 2 List of top ten proteins and TIGRFAM IDs. *The PFAM ID is listed for PGK as a TIGRFAM was not available for it.
PROTEIN NAME TIGRFAM ID
serine tRNA-ligase TIGR00414
dephospho-CoA kinase TIGR00152
preprotein translocase Sec A subunit TIGR00963
histidine tRNA-ligase TIGR00442
DNA polymerase III beta subunit TIGR00663
protein Rec A TIGR02012
signal recognition particle-docking protein TIGR00064
tyrosine tRNA-ligase TIGR00234
DNA-directed RNA polymerase, beta' subunit TIGR02386
phosphoglycerate kinase PF00162.17*
21


Many of the ten proteins we modeled are required for DNA replication/maintenance or RNA synthesis, which are crucial functions in all cells. Three different tRNA ligases are also listed. These three ligases belong to both classes I and II of the tRNA ligases. However, they are not structurally identical and thus add to the structural diversity in the set of ten proteins. Tyr tRNA ligase has a different structure than His tRNA ligase and Ser tRNA ligase. Although the latter two both belong to class II, they only share a similar core domain structure. Thus, we chose to use all three ligases as separate proteomic indicators of OGT.
Amino acid frequencies in secondary structures and exposed residues:
Studies conducted by Cambillau and Claverie (4), Metpally et al. (8), and Chakravarty and Varadarajan (5) have suggested that amino acids frequencies, particularly in secondary structures of proteins, differ between mesophilic and extremophilic prokaryotes. For example, Cambillau and Claverie (4) found an increase in Lys and Glu and a decrease in Gin in thermophilic proteins, when compared to mesophilic proteins.
We examined the frequency of positively charged and negatively charged amino acids in helices, beta sheets/strands and loops. The frequency of these residues was found to increase in helices with an increase in OGT (Figure 5 a and b). This suggests that electrostatic stabilization of helices is one mode of thermostability adopted by proteins at high temperature. We also found an overall decrease in polar residues in all secondary structures with increased OGT (Figure 5c, 6c and 7c) similar to the results found in the two previous studies (4, 5).
We examined the frequency of glycine and proline residues separately as glycine confers increased conformational flexibility while proline confers increased rigidity. Metpally et al. (8) found that glycine residues are preferred in psychrophilic
22


proteins, and proline residues are avoided, as compared to mesophilic proteins. However, in our analyses, we found no striking trends relating glyince or proline frequencies in any of the secondary structures to OGT (except for glycine in helices). This could suggest that the frequency of these residues alone does not play an important role in determining the stability/flexibility of proteins or that organisms use these adaptations in combination with other more frequent adaptations.
a b
c
d
0.25- r = -0.42
Optimum Growth Temperature C
Optimum Growth Temperature C
e
f
6 25 50 75 100
Optimum Growth Temperature C
Figure 5 Correlation between fraction of amino acid groups in helices and OGT. The fraction of positively charged (a) and negatively charged (b) amino acids increases with increasing OGT. (c) The fraction of polar amino decreases with increasing OGT. (d) There is no strong trend between the fraction of hydrophobic amino acids and OGT. (e) The fraction of glycine residues decreases with increasing OGT. (f) No trend is observed for the fraction of proline residues.
23


a
b
r = -0.59
0.05-
6 25 50 75 100
Optimum Growth Temperature C
g-s
o o
0.70 0.65 0.60 0.55 0.50
0 25 50 75 100
Optimum Growth Temperature C
e
f
0.10-
0.02- , ,
0 25 50 75 100
Optimum Growth Temperature C
0.06-o "g
C W
o 0.04-O 0)
2 -
U- O 0.02 -
Q-
*
0.00-
0 25 50 75 100
Optimum Growth Temperature C
Figure 6 The fraction of amino acid groups in beta sheets vs. OGT. The
fraction of positively charged amino acids (a) increases with OGT while negatively charged amino acids (b) show no trend. The fraction of polar amino acids (c) decreases strongly with increasing OGT and hydrophobic amino acids (d) increase with increasing OGT. The fraction of glycine (e) and proline (f) residues show no trend with OGT.
24


a
b
c
6 25 50 75 100
Optimum Growth Temperature C
d
Optimum Growth Temperature C
e
f
r = -0.03
25 50 75
Optimum Growth Temperature C
0 25 50 75 100
Optimum Growth Temperature C
Figure 7 Fraction of amino acid groups in loops vs. OGT. No trend is observed for the fractions of positively charged amino acids (a), negatively charged amino acids (b), polar amino acids (c), hydrophobic amino acids (d), glycine residues (e) or proline residues (f).
Cambillau and Claverie (4) also found differences with differing OGT in the amino acid frequencies of surface residues/exposed residues. They found an increase in the water accessible surface of charged amino acids and a decrease for polar amino acids in hyperthermophiles, when compared to mesophiles.
25


Chakravarty and Varadarajan (5) also found the increase in charged residues at the surface in thermophilic proteins. We observed similar trends. The frequency of positively charged and negatively charged amino acids constituting exposed residues increases with OGT, while the frequency of polar residues decreases.
Hydrophobic residues did not show any strong trends with temperature in the secondary structures of proteins, except in beta sheets where they appear to increase with temperature. Surprisingly, they also did not show any trends in the fraction of exposed residues. It has been suggested that thermophilic proteins show greater packing, which could account for their increased thermostability (46, 47). This increased packing would lead to a stronger hydrophobic effect, which we expected to see as a decrease in the fraction of exposed hydrophobic residues (47).
Anomalous data points: The figures for positively charged and negatively charged residues in helices and exposed regions, and negatively charged residues in loops showed a set of data points that were separated from the general trend shown by the other data in the plots (Figure 5a, 5b, 7b, 8a, 8b). We traced these points back to their GOLD IDs and found that most of the organisms making up these points were halophilic organisms. Halophiles living in high salt concentrations experience different selective pressures on their proteins, and our data suggests that these organisms have adapted to maintain different electrostatic properties in their proteins that stabilize them in highly saline environments [48). For example, it has been found that some halophiles, such as Halobacterium and Halococcus adopt a 'high salt-in mechanism of adaptation. These organisms accumulate a high concentration of salt in their cells and their proteins were found to be enriched in acidic amino acids, as compared to basic amino acids (49). This trend is also observed in our results, as shown by these anomalous data points.
26


a
b
r = 0.47
o
0 25 50 75 100
Optimum Growth Temperature C c
r = -0.40
CD
0.10-, , , , ,
0 25 50 75 100
Optimum Growth Temperature C
!d o.35-r = 0.28
13
g
co
0 25 50 75 100
Optimum Growth Temperature C
d
0.55-r = -0.09
0 25 50 75 100
Optimum Growth Temperature C
Figure 8 Fraction of exposed/surface amino acid groups vs. OGT. The
fraction of exposed positively charged (a) and negatively charged (b) amino acids increases with increasing OGT, while the fraction of polar residues (c) decreases. The fraction of exposed hydrophobic residues (d) shows no trend with OGT.
Disulfide bond richness: We calculated the number of disulfide bonds in the set of ten proteins for each organism. This number was normalized by protein chain length, as longer proteins have more opportunity to form disulfide bonds than shorter proteins. Disulfide bonds are strong covalent bonds that stabilize protein structures. Beeby et al. (55) and Jorda et al. (7) found that thermophiles had a greater fraction of disulfide bonds. Beeby et al. (55) found that these disulfide bonds
27


could even be found in intracellular proteins, However, we found no trend in disulfide bond richness and optimum growth temperature. This could be because of the different metrics used for defining disulfide bond richness. Beeby et al. (55) found that the proximity of Cys-Cys resides was greater than expected, and considered this to be an indication of increased disulfide bonding. Jorda et al. (7) calculated the fraction of cysteine residues in a protein that are disulfide bonded.
0.004-
0
g
co o
0.003 -"O c o n
o
P 0.002-
"5
0 T3
O
!_ 0.001 -0 n
E Z
0.000-
0 25 50 75 100
Optimum Growth Temperature C
Figure 9 Correlation between disulfide bond richness and OGT. The number of disulfide bonds per residue in the proteome of an organism shows no correlation with the organisms OGT.
Salt bridge richness: Chakravarty and Varadarajan (5) found that
thermophilic proteins form more salt bridges than their mesophilic counterparts.
Their study also showed that many salt bridges were located in helices, at the
surface of the protein. We calculated the number of salt bridges per residue in the
ten modeled proteins for each organism and found that the fraction of salt bridges
increases with temperature, although the trend is not strong (r = 0.35).
28


a
r = 0.35
0.07-
co
05
"oo 0.06-
05
05
o
\8 *
0.05-
irt -ii-.ivf rg-, 1 Hit it r1 , I % l#
0.03-
25 50 75
Optimum Growth Temperature C
r = 0.24
V 0.07-~o co 0)
~C/) 0.06-
0)
CD
O
^ 0.05-as
;

t
I *
T*:i\
i
Pi

I
!>
::i


3
10 20 30
Optimum Growth Temperature C
§ 0.07-
g
co
05
O 0.06-O)
gs
h
"cc 0.05-co
o
05
^ 0.04-
r = 0.42
\i t L
! I 1 1 1
5 <'; ;! f a 1 i # s I *r it... *1. i i I- %
1 ..
60 80 Optimum Growth Temperature C
Figure 10 Correlation between salt bridge richness and OGT. (a) The
number of salt bridges per residue in a proteome increases with OGT. (b) This trend is not observed below 40C (r = 0.24), but can be seen above 40C (r =
0.42, c).
Salt bridge stability is affected by many factors (50) and it has been found that formation of salt bridges can be both stabilizing (51) and destabilizing (52).
29


However, it was shown by Elcock (53) that at higher temperatures, salt bridges are stabilizing, due to a decrease in the desolvation penalty. This would explain the trend seen in our study of an increased number of salt bridges at higher temperatures as compared to lower temperatures.
Machine learning integration of features: We integrated all the features described above using two machine learning approaches- support vector regression and support vector classification. The entire pipeline was run individually on each feature and then on all the features combined. R2 scores were reported for the SVR (Figure 11 and supplementary Figure SI, appendix B) and accuracy scores (fraction of accurately classified organisms) were reported for the SVC (Figure 12). The SVR outperformed all individual methods when applied to the low temperature dataset (Figure 12a), high temperature dataset (Figure 12b), complete dataset (Figure 12c) and subsampled dataset (Figure 12d). The negative R2 values seen for some features mean that the mean of the OGTs is a better estimator of OGT than these features. The combined predictor performed best on the high temperature and complete datasets. This was expected for the high temperature dataset, as most of the individual features showed stronger correlations at higher temperatures. As the SVR showed an improvement over individual features in both the low and high temperature datasets, it can be concluded that the imbalance of data in favor of lower temperatures did not reduce/bias the performance of the SVR when applied to the entire dataset. Figure lie shows predicted OGT (as predicted by the SVR on the complete dataset) vs. actual OGT. The correlation is found to be 0.85, indicating that the SVR has high accuracy in predicting OGT across the range of temperatures.
30


w
tSQ
C
"1
fD
or
S
n
n
o
<
fD
n
o
PB
fD
OfQ
"1
fD
V)
(/)
O
S3
c
33
SURFACE_ACC_NEG_CHARGED -LOOPS POS CHARG ED -HELICES_POS_CHARGED -HELICES_P-FJVYWREL-B ETA_N EG_CHA RG ED -HELICES_NEG_CHARGED -NORMALIZED_DISULFIDE_BONDS -HELICES JHYDROPHOBIC-LOOPS_NEG_CHARGED -BETA_POLAR-BETA_POS_CHARGED -HTPP-
SURFACE ACC POLAR -HELICES_POLAR-LOOPS G -rRNA_5S_GC -SU RFAC E ACC POS CH A RG ED -BETA JHYDROPHOBIC-BETA_P-rRNA_16S_GC-LOOPSP-LOOPS JHYDROPFIOBIC -SURFACE ACC FIYDROPFIOBIC -LOOPS POLAR -BETAJ3 -tRNA_GC -HELICESJ3 -NORMALIZED_SALT_BRIDGES -COMBINED-
-0.4-
SVR on <40C data
cr
R2 DJ
6 o o o
44. O 44. CD
SURFACE ACC NEG CHARGED HELICESP BETA G
SURFACEACCHYDROPHOBIC LOOPSG LOOPSP HELICES NEG CHARGED NORMALIZED DISULFIDE BONDS BETA NEG CHARGED BETA P
HELICES HYDROPHOBIC
LOOPSPOLAR
LOOPS NEG CHARGED
n LOOPSHYDROPHOBIC
n
5 rRNA 5S GC
3 NORMALIZED SALT BRIDGES n
3 HELICES POLAR
LOOPSPOSCHARGED BETA HYDROPHOBIC HTPP
HELICES POS CHARGED SURFACEACCPOLAR BETA POS CHARGED HELICES G SURFACEACCPOSCHARGED BETA POLAR tRNA GC rRNA 16S GC F IVYWREL COMBINED


ora
s
CD
IsJ
or
s
n
n
o
<
CD
n
o
PB
fD
ora
fD
V)
V)
O
S3
i- W
I I
BETA_P- ----------
NORMALIZED_DISULFIDE_BONDS i--------
BETA_NEG_CHARGED i----------
SURFACE. ACC_NEG_CHARGED- i----------
LOOPS.P i---------
HELICES.H YDROPHOBIC --------
HELICES.P- i----------
LOOPS.G i---------
BETA.G- i----------
LOOPS_NEG_CHARGED i----------
SURFACE_ACC_H YDROPHOBIC - i------
HELICES_NEG_CHARGED - -----
HTPP- i-----
n LOOPS_POS_CHARGED - h-
> BETA_POS_CHARGED -
§ LOOPS.H YDROPHOBIC -
m
w HELICES.G - h
NORMALIZED_SALT_BRIDGES - i
LOOPS.POLAR -BETA_H YDROPHOBIC -HELICES_POS_CHARGED -rRNA_5S_GC -SURFACE_ACC_POS_CHARGED -HELICES.POLAR -BETA.POLAR -SURFACE_ACC_POLAR-tRNA.GC -rRNA_16S_GC -FJVYWREL-COMBINED -
GO
CsJ
SVR on subsampled data
d R2 n
poop o k> b>

LOOPS.H YDROPHOBIC-
LOOPS.POLAR-
HELICES.H YDROPHOBIC-
LOOPS.G -HELICES.P -LOOPS_POS_CHARGED -BETA_NEG_CHARGED-NORMALIZED_DISULFIDE_BONDS i
BETA.P -
LOOPS_NEG_CHARGED "
LOOPS.P- ^
SURFACE_ACC_POLAR w
HELICES_NEG_CHARGED -
-n SURFACE_ACC_NEG_CHARGED w m
^ BETA.G-
§ SURFACE_ACC_H YDROPHOBIC - i
m
w HELICES.POLAR -
HTPP- *
BETA_POS_CHARGED - -
NORMALIZED_SALT_BRIDGES -
rRNA_5S_GC-
HELICES_POS_CHARGED -
rRNA_16S_GC- w
HELICES.G -
SURFACE_ACC_POS_CHARGED - w
BETA.H YDROPHOBIC - ^
tRNA.GC-
BETA.POLAR - --
FJVYWREL-COMBINED-
SVR on all temp data with rbf kernel


e
ACTUAL OGT C
Figure 13 Support Vector Regression. We integrated each individual feature using a SVR method. The combined predictor outperformed all individual predictors when applied to low temperature data (<40C, a), high temperature data (>40C, b) and the entire dataset (c). It did not perform well on the subsampled data (d). A plot of predicted OGT (as predicted by the SVR on the complete dataset) vs. actual OGT shows a correlation of 0.85 (e).
The least improvement in performance was seen on the subsampled dataset (Figure lid). This dataset was generated by selecting 14 organisms from each 10C bin ranging from 0-90C and 11 organisms from 90-100C, resulting in a total 137 organisms. The organisms in the 30-40C bins showed the greatest variation for each of the features examined. The organisms that were selected for the subset could have had vastly different values for each feature, which could explain why most of the individual features have negative R2 values in this plot, and why the combined predictor failed to perform well on this dataset.
33


Figure 12 Support Vector Classification.
Accuracy Score
c
J3
m
03
ro

BETA G -BETA NEG CHARGED -BETA P-HELICES HYDROPHOBIC -HELICES NEG CHARGED -HELICESP -LOOPSG-LOOPSPOLAR -SURFACEACCH YDROPHOBIC -SURFACE ACC NEG CHARGED -LOOPSH YDROPHOBIC -NORMALIZED DISULFIDE BONDS -BETA HYDROPHOBIC -LOOPSP -rRNA 5S GC -BETA POS CHARGED -LOOPS NEG CHARGED -LOOPSPOSCHARGED -NORMALIZED SALT BRIDGES -SURFACEACCPOLAR -HELICES POLAR -BETA POLAR -HELICES POS CHARGED -tRNA GC -rRNA 16S GC -HTPP -HELICES G -F IVYWREL -SURFACEACCPOSCHARGED -COMBINED-
o o
03 4^

o
03

cr
w
SVC on >40C data
FEATURES
BETA_G -BET A_H YDROPHOBIC -BETA_NEG_CHARGED -BETA_P-BETA_POLAR -BETA_POS_CHARGED -F_IVYWREL-HELICES_G -HELICES_H YDROPHOBIC -HELICES_NEG_CHARGED -HELICES_P -HELICES_POLAR -HELICES_POS_CHARGED -HTPP -LOOPS_G -LOOPS_H YDROPHOBIC -LOOPS_NEG_CHARGED -LOOPS_P -LOOPS_POLAR -LOOPS_POS_CHARGED -NORMALIZED_DISULFIDE_BONDS -NORMALIZED_SALT_BRIDGES -SURFACE_ACC_H YDROPHOBIC -SURFACE_ACC_NEG_CHARGED -SURFACE_ACC_POLAR-SURFACE_ACC_POS_CHARGED -rRNA_16S_GC -rRNA_5S_GC -tRNA_GC -
Accuracy Score
CO
<
O
o
=5
A
O
O
Q.
0)
w
M
M
M
W
M
M
M
M
W
M
M
M
W

COMBINED-


SVC on all temp data
0.70-
8 0.65-
O
CC
g 0.60-
<
I I I I I I I I I I I I I I I I I I I I I I I
0.55-
i-I
CD O
?'!
LU X
CD X O x Q > X
Q X Q CD o Q X Q X 0 g Q X X Q co co g Q Q o X X o X O X Q
X X CD O X Q_ X co X g X X CD o X Q_ X co X o < X a X CD o X Q_ X X CD < < CD < CD X X
CD X < < 1 X CD CD X < co X o CD X < CD X < X co X o CD X < X O X CD X < X O CD CD g cn CD X < CD X < CO LO X o X X o X CO CD X O X < X X 5 > X CD
X o X o X X ~r o X X o 1 X X X o o X o X X o o X co X X o X o CD Hi o X X o X o < X co X o o < X > O o
CD co Q CD co Q CD O co X X Q CD co X g < X cn X CD
X o >- X o >- X O o X < >- X o X X o <
X < X < X co X co X co X co X co X X co Z> CO co Q X o X o X o X X
1 X X X X X X o X o Q o X
X CD X CD o X X X o X X X g X X X o o X o o X o o X Q X N i N X < 2 <1 X o co
X
o
< <
FEATURES
SVC on subsampled data
0.5-
0.4-
8 -3-
CO
cc 0.2-
=3
O
O
< 0.1 -
0.0-


I I
t i

O
LU
CD
X
<
X
o
CD
o
o
o
LU
O
<
X
X
Z>
CO
Q Q LU LU O O CO X X CL < < O x x O O O -1 I
CD CD LU LU
CO O LU O
O <,
CD CO Q
O
<
X
X
z>
co
z>
CO
a
Q
X
N
O
X
CD
X
<
X
o
co
o
X
o
o o 5 o,
Is
o
X Q >
X
<
co
x
o
o
o
o
UJ CO LU O
g
co
O
x
x
O
X
Q
>
X
o
o
<,
o
<
X
X
z>
co
X O o o o X X
3 CD X 2 CD CD 3 X X
o X co LO m o X X CO CD o X § >
co < O O <
X X o X <
g X o X X X
X > X CD
co
X
o
o
Figure 14 Support Vector Classification. We integrated each individual feature using a SVR method. The combined predictor did not perform well below 40C (a), above 40C (b), on the complete dataset (c) or on the subsampled data (d).
35


The same analyses were performed using a SVC to classify each organism into a 10C temperature bin. However, the combined predictor did not perform much better than the individual features on any dataset (Figure 12a, b, c, d). In fact, when applied to the subsampled dataset, it was out-performed by four individual features. The accuracy score used shows the fraction of organisms that was correctly classified. However, this metric does not reflect the ordinal nature of these bins. For example, if an organism with an OGT of 45C was classified in the 60-80C, it would be considered as incorrect as classifying it in the 30-40C bin. This could explain why the combined predictor apparently did not perform well on any dataset.
A comprehensive combination of features has not been tried previously in the prediction of OGT by genomic/proteomic features. A study conducted by Zheng et al. (3) used the GC fraction of four genes to predict OGT for prokaryotes, using a SVM to build their predictor from the four genes GC content. Their predictor had a prediction accuracy of 84.09%. However, their predictor was used to classify organisms as either mesophilic or thermo/hyperthermophilic, and not into finer temperature bins. However, when we used HMMs to find these genes in our genomes, we found that most of our genomes did not have all four of these genes. Table 3 lists the distribution of the number of genomes with zero, one, two, three and all four genes. Therefore, this feature could not be used on our dataset and is less generalizable to prediction of OGT across a broad range of organisms.
36


Table 3 Number of genomes with 0,1, 2, 3 and all 4 of the genes identified by Zheng et al. (3).
NUMBER OF GENES (x) NUMBER OF GENOMES WITH xGENES
0 159
1 1468
2 943
3 666
4 207
It is important to note that our predictor was developed without knowledge of the phylogeny of the organisms used. Adding phylogeny to the feature set could further improve the accuracy of predictions.
Calculation of IVYWREL and HTPP on metagenomic reads: We used a subset of the samples produced by the Tara Oceans project (38) as our metagenomic dataset. The protein coding regions were used to calculate the IVYWREL fraction and HTPP. However no correlation was found between either of these features and the sampling temperature (Figure 13).
The sampling temperatures of the subset of samples used were all in the 0-30C range. Both the IVYWREL fraction and HTPP were not good predictors for mesophilic OGTs. This could explain the lack of correlation found for the metagenomic data. When an SVR was used to predict OGTs for organisms below 40C, only the combined predictor had a positive R2 value. However, many of the individual features cannot be easily predicted on metagenomic reads. This suggests that in order to predict metagenomic sampling temperature from shotgun sequencing data, a combination of individual predictors trained on the sequencing data, or on assembled data, would be the best predictor. Using information on the taxonomic abundance in the samples has also been shown to be a good predictor of sampling temperature (54). Adding this feature might further boost the accuracy of
37


our sampling temperature or even optimum growth temperature predictions.
a
b
0.6- r = 0.014
20
Sampling Temperature C

. .V
I* t . ...

10 20 30
Sampling Temperature C
Figure 15 Fraction of IVYWREL (a) and HTPP (b) vs. sampling temperature. No correlations were observed for either feature (r = 0.014 and -0.15).
38


CHAPTER IV
CONCLUSIONS
Many studies have been conducted to try and understand the genomic and proteomic determinants of prokaryotic optimum growth temperature, and have found correlations between various features and OGT [4-6, 8, 35). However, these studies were conducted 5-10 years ago, and used datasets that were small in size. In this study, we attempted to validate these correlations on an updated, larger dataset. We also used a machine learning approach that would combine the results of all individual features to predict OGT, which has not been tried previously.
We found that many of the correlations were only weakly held when applied to the entire range of temperatures, but were stronger when only high temperatures were considered. None of the individual features were good predictors of OGT in the 0-40C range. The combined predictor out-performed all individual features for all temperature ranges, highlighting the utility of such an approach. However, when applied to the low temperature range, the prediction accuracy was much lower than for the high temperature range. This highlights the gap in our understanding of what selection pressures act at temperatures below 40C. Most of the studies mentioned previously examined features that allowed organisms to grow at high temperature extremes [4-6,35). These were based around the idea that genomic and proteomic adaptations would act to stabilize macromolecules at higher temperatures. Not many studies have examined low temperature adaptations [8, 55).
Therefore, future studies could be aimed at furthering our understanding of what allows prokaryotes to grow at low temperatures, and in particular, what adaptations distinguish mesophiles from each other. This would improve our ability
39


to predict OGTs in this range and contribute to our understanding of
macromolecular stability.
40


REFERENCES
1. A. Torstensson et al, Physicochemical control of bacterial and protist community composition and diversity in Antarctic sea ice. Environ. Microbiol. 17, 3869-3881 (2015).
2. P. Ji, W. J. Rhoads, M. A. Edwards, A. Pruden, Impact of water heater temperature setting and water use frequency on the building plumbing microbiome. ISMEJ. 11,1318-1330 (2017).
3. H. Zheng, H. Wu, Gene-centric association analysis for the correlation between the guanine-cytosine content levels and temperature range conditions of prokaryotic species. BMC Bioinformatics. 11 Suppl 1, S7 (2010).
4. C. Cambillau, J. M. Claverie, Structural and genomic correlates of hyperthermostability./. Biol Chem. 275, 32383-32386 (2000).
5. S. Chakravarty, R. Varadarajan, Elucidation of Factors Responsible for Enhanced Thermal Stability of Proteins: A Structural Genomics Based Study. Biochemistry. 41, 8152-8161 (2002).
6. N. Galtier, J. R. Lobry, Relationships Between Genomic G+C Content, RNA Secondary Structures, and Optimal Growth Temperature in Prokaryotes./.
Mol Evol. 44, 632-636 (1997).
7. J. Jorda, T. 0. Yeates, Widespread disulfide bonding in proteins from thermophilic archaea. Archaea. 2011 (2011), doi:10.1155/2011/409156.
8. R. P. R. Metpally, B. V. B. Reddy, Comparative proteome analysis of psychrophilic versus mesophilic bacterial species: Insights into the molecular basis of cold adaptation of proteins. BMC Genomics. 10,11 (2009).
9. I. N. Berezovsky, K. B. Zeldovich, E. I. Shakhnovich, Positive and negative design in stability and thermal adaptation of natural proteins. PLoS Comput. Biol. 3, 0498-0507 (2007).
10. M. Beeby et al., The genomics of disulfide bonding and protein stabilization in thermophiles. PLoS Biol. 3,1549-1558 (2005).
11. T. Ku et al., Predicting melting temperature directly from protein sequences. Comput. Biol. Chem. 33, 445-450 (2009).
12. R. L. Chang et al., Structural systems biology evaluation of metabolic thermotolerance in Escherichia coli. Sci. (New York, NY). 340,1220-1223 (2013).
13. L. Sawle, K. Ghosh, How do thermophilic proteins and proteomes withstand high temperature? Biophys.J. 101, 217-227 (2011).
41


14. K. A. Dill, K. Ghosh, J. D. Schmit, Physical limits of cells and proteomes. Proc. Natl. Acad. Sci. 108,17876-17882 (2011).
15. D. Marbach et al., Wisdom of crowds for robust gene network inference. Nat Methods. 9, 796-804 (2012).
16. P. Larsson, M. J. Skwark, B. Wallner, A. Elofsson, Assessment of global and local model quality in CASP8 using Peons and ProQ. Proteins Struct. Funct. Bioinforma. 77,167-172 (2009).
17. R. Mackelprang et al., Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature, advance on, 368-71 (2011).
18. C. E. Sharp et al., Humboldts spa: microbial diversity is controlled by temperature in geothermal environments. ISMEJ. 8,1166-74 (2014).
19. K. D. Kohl, J. Yahn, Effects of environmental temperature on the gut microbial communities of tadpoles. Environ. Microbiol. 18,1-18 (2016).
20. S. Mukherjee et al., Genomes OnLine Database (GOLD) v.6: Data updates and feature enhancements. Nucleic Acids Res. 45, D446-D456 (2017).
21. D. A. Benson et al., GenBank. Nucleic Acids Res. 41, 36-42 (2013).
22. K. D. Pruitt, T. Tatusova, D. R. Maglott, NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, 61-65 (2007).
23. P. J. A. Cock et al., Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 25,1422-1423 (2009).
24. E. Pruesse, J. Peplies, F. O. Glockner, SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics. 28,1823-1829 (2012).
25. E. Pruesse et al., SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 7188-7196 (2007).
26. https://salilab.org/modpipe/
27. A. Sali, T. L. Blundell, Comparative protein modelling by satisfaction of spatial restraints./. Mol. Biol. 234, 779-815 (1993).
28. M. Albertsen et al., Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533-538 (2013).
42


29. H. M. Berman et al., The protein data bank. Nucleic Acids Res. 28, 235-242
(2000).
30. H. Development Team, HMMER User Guide, 120 (2015).
31. M. Yi Shen, A. Sali, Statistical potential for assessment and prediction of protein structures. Protein Sci., 2507-2524 (2006).
32. W. Kabsch, C. Sander, Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 22, 2577-2637 (1983).
33. T. Hamelryck, B. Manderick, PDB file parser and structure class implemented in Python. Bioinformatics. 19, 2308-2310 (2003).
34. M. Z. Tien, A. G. Meyer, D. K. Sydykova, S. J. Spielman, C. 0. Wilke, Maximum allowed solvent accessibilites of residues in proteins. PLoS One. 8 (2013), doi: 10.1371/journal.pone.0080635.
35. M. Beeby et al., The Genomics of Disulfide Bonding and Protein Stabilization in Thermophiles. PLoS Biol. 3, e309 (2005).
36. W. Humphrey, A. Dalke, K. Schulten, VMD: Visual Molecular Dynamics./. Mol. Graph. 14, 33-38 (1996).
37. F. Pedregosa et al., Scikit-learn: Machine Learning in Python. 12, 2825-2830
(2012).
38. E. Karsenti et al., A holistic approach to marine Eco-systems biology. PLoS Biol. 9, 7-11 (2011).
39. Tara Oceans Consortium, Coordinators; Tara Oceans Expedition, Participants (2014): Registry of selected samples from the Tara Oceans Expedition (2009-2013). doi:10.1594/PANGAEA.840721
40. S. Chaffron, L. Guidi, F. d'Ovidio, S. Speich, S. Audic, S. De Monte, D. Iudicone, M. Picheral, S. Pesant, Tara Oceans Consortium Coordinators, Tara Oceans Expedition Participants (2014): Environmental context of selected samples from the Tara Oceans Expedition (2009-2013). doi:10.1594/PANGAEA.840718
41. R. Leinonen, H. Sugawara, M. Shumway, The sequence read archive. Nucleic Acids Res. 39, 2010-2012 (2011).
42. Sequence Read Archive Submissions Staff. Using the SRA Toolkit to convert .sra files into other formats. In: SRA Knowledge Base [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2011-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK158900/
43


43. D. Kim et al, FragGeneScan-plus for scalable high-throughput short-read open reading frame prediction. 2015 IEEE Conf Comput. Intell. Bioinforma. Comput. Biol CIBCB 2015, 1-8 (2015).
44. T. B. K. Reddy et al, The Genomes OnLine Database (GOLD) v.5: A metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res. 43, D1099-D1106 (2015).
45. K. B. Zeldovich, I. N. Berezovsky, E. I. Shakhnovich, Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput. Biol. 3, 0062-0072 (2007).
46. P. J. Haney et al., Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. Proc. Natl. Acad. Sci. U. S. A. 96, 3578-83 (1999).
47. T. Salminen et al., An unusual route to thermostability disclosed by the comparison of Thermus thermophilus and Escherichia coli inorganic pyrophosphatases. Protein Science. 5,1014-1025 (1996).
48. H. Zhang et al, A Cell-penetrating Helical Peptide as a Potential HIV-1 Inhibitor./. Mol. Biol. 378, 565-580 (2008).
49. R. Reistad, On the composition and nature of the bulk protein of extremely halophilic bacteria. Arch. Mikrobiol. 71, 353-360 (1970).
50. S. Kumar, R. Nussinov, Salt bridge stability in monomeric proteins./. Mol. Biol. 293,1241-55 (1999).
51. S. Kumar, C. J. Tsai, B. Ma, R. Nussinov, Contribution of salt bridges toward protein thermostability./. Biomol. Struct. Dyn. 17, 79-85 (2000).
52. Z. S. Hendsch, B. Tidor, Do salt bridges stabilize proteins? A continuum electrostatic analysis. Protein Sci. 3, 211-226 (1994).
53. A. H. Elcock, The stability of salt bridges at high temperatures: implications for hyperthermophilic proteins. Journal of Molecular Biology. 284:2, 489-502 (1998).
54. S. Sunagawa et al., Structure and function of the global ocean microbiome. Science. 348,1261359-1261359 (2015).
55. B. A. Methe et al., The psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses. Proc. Natl. Acad. Sci. U. S. A. 102,10913-10918 (2005).
56. https://www.ncbi.nlm.nih.gov/nuccore
44


APPENDIX A: SUPPLEMENTARY METHODS
Support Vector Regression: The data was first imputed so all missing values were replaced with the mean of the feature. The imputed data was normalized (the mean was subtracted from each value and the result was divided by the standard deviation). The data was then randomly split into a training set and test set (50% each), using the train_test_split function.
train_test_split(X_all_scaled, y, test_size=0.5, random_state=0)
A grid search was performed on the training set using 10-fold cross validation and R2 as the scoring function, with the following hyperparameter space:
For the high temperature dataset and the subsampled dataset:
Kernel: RBF Gamma: lxlO'3 and lxlO'4 C: 0.001, 0.01, 0.1,1,10, 100,1000 Kernel: Linear C: 0.001, 0.01, 0.1,1,10,100,1000
For the low temperature dataset:
Kernel: RBF Gamma: lxlO'3 and lxlO'4 C: 0.001, 0.01, 0.1,1 Kernel: Linear C: 0.001, 0.01, 0.1,1
The best estimator selected from the grid search was used on the test set. The model was fit on the test set using 10-fold cross validation and ten R2 values were calculated.
For the complete dataset:
Two separate scripts were run, one using a RBF kernel and one using a linear kernel. This was done to reduce runtime.
The data was first imputed so all missing values were replaced with the mean of the feature. The imputed data was normalized (the mean was subtracted from each value and the result was divided by the standard deviation). The data was then randomly split into a training set, test set (50% each), using the train_test_split function.
The grid search was performed using 10-fold cross validation and R2 as the scoring function, with the following hyperparameter space:
Kernel: RBF Gamma: lxlO'3 and lxlO'4 C: 0.001, 0.01, 0.1, 1, 10,100, 1000 OR
Kernel: Linear C: 0.001, 0.01, 0.1,1,10,100,1000
The best estimator was fit to the test set, with 10-fold cross validation and ten R2 values were calculated.
Support Vector Classification: The dataset was split into a training set and test set (50%) each, using the StratifiedKFold function. Only one fold was used further.
45


A grid search was performed using 5-fold StratifiedKFold cross validation and accuracy as the scoring function, over the following hyperparameter space:
For the low temperature dataset:
Kernel: RBF Gamma: lxlO'3 and lxlO'4 C: 0.001, 0.01, 0.1,1 Kernel: Linear C: 0.001, 0.01, 0.1,1
For the high temperature dataset, the complete dataset and the subsampled dataset: Kernel: RBF Gamma: lxlO'3 and lxlO'4 C: 0.001, 0.01, 0.1,1,10, 100,1000 Kernel: Linear C: 0.001, 0.01, 0.1,1,10,100,1000
The best estimator was fit to the test set, using 5-fold StratifiedKFold cross validation and ten accuracy scores were calculated.
46


APPENDIX B: SUPPLEMENTARY FIGURES AND TABLES
SVR on all temp data w/ linear kernel
I I I I I I I I I I I I I I
O X O O X
rO < LU LU I
S j o (it co
n 11111
0 CD O LU
o o
CD CD
Q-, < < O Z w,
o o m 0,
X Q X CL, o X
0
CO
CO O CC O O-
CD CL < CL
CO
J I z z z
I z
Q 0 Q X cr Q Q
LU LU X LU LU LU
CD < 0 1- 0 0 0
CC 1 CC X Q X X
< m < cc < <
X LLJ X CD X X
o o o o
CD1 01 ! w1 01
LU LU < o o
Z z w X X
I I
0
1
o
o
<
FEATURES
Figure SI R2 score for each individual feature and for the combined predictor. The combined predictor outperforms all individual methods, but the performance of the IVYWREL fraction is comparable.
47


Table SI Metagenomics dataset information. Sample label, sample ID, sample accession, SRA accession and sampling temperature information.
SAMPLE_LABEL SAMPLEJD SAMPLE_ACC SRA_ACC TEMP
TARA_004_DCM_0.22-1.6 TARA_X000000368 ERS487936 ERR598950 16.240096
TARA_004_DCM_0.22-1.6 TARA_X000000368 ERS487936 ERR599095 16.240096
TARA_004_SRF_0.22-1.6 TARA_Y200000002 ERS487899 ERR598955 20.5232
TARA_004_SRF_0.22-1.6 TARA_Y200000002 ERS487899 ERR599003 20.5232
TARA_007_SRF_0.22-1.6 TARA_A200000113 ERS477931 ERR315857 23.82415
TARA_009_DCM_0.22-1.6 TARA_X000001036 ERS488147 ERR594315 16.143033
TARA_009_DCM_0.22-1.6 TARA_X000001036 ERS488147 ERR594329 16.143033
TARA_009_SRF_0.22-1.6 TARA_X000000950 ERS488119 ERR594288 24.5021
TARA_009_SRF_0.22-1.6 TARA_X000000950 ERS488119 ERR594316 24.5021
TARA_009_SRF_0.22-1.6 TARA_X000000950 ERS488119 ERR594317 24.5021
TARA_018_DCM_0.22-1.6 TARA_S200000501 ERS488346 ERR599073 18.357443
TARA_018_DCM_0.22-1.6 TARA_S200000501 ERS488346 ERR599092 18.357443
TARA_018_DCM_<-0.22 TARA_A100000172 ERS488354 ERR594352 18.357443
TARA_018_SRF_0.22-1.6 TARA_A100000164 ERS488330 ERR598993 21.4384
TARA_018_SRF_0.22-1.6 TARA_A100000164 ERS488330 ERR599140 21.4384
TARA_018_SRF_<-0.22 TARA_A100000171 ERS488340 ERR594358 21.4384
TARA_022_SRF_<-0.22 TARA_S200002703 ERS488448 ERR594378 17.040533
TARA_022_SRF_<-0.22 TARA_S200002703 ERS488448 ERR594406 17.040533
TARA_023_DCM_0.22-1.6 TARA_E500000081 ERS477998 ERR315859 15.696917
TARA_023_DCM_0.22-1.6 TARA_E500000081 ERS477998 ERR315860 15.696917
TARA_023_DCM_<-0.22 TARA_X000001382 ERS478007 ERR594408 15.696917
TARA_023_SRF_0.22-1.6 TARA_E500000075 ERS477979 ERR315858 17.635625
TARA_023_SRF_0.22-1.6 TARA_E500000075 ERS477979 ERR315861 17.635625
TARA_025_DCM_0.22-1.6 TARA_E500000331 ERS488509 ERR599094 15.153159
TARA_025_DCM_0.22-1.6 TARA_E500000331 ERS488509 ERR599153 15.153159
TARA_025_DCM_<-0.22 TARA_E500000305 ERS488518 ERR594375 15.153159
TARA_025_SRF_0.22-1.6 TARA_E500000178 ERS488486 ERR598951 18.341542
TARA_025_SRF_0.22-1.6 TARA_E500000178 ERS488486 ERR599043 18.341542
TARA_025_SRF_<-0.22 TARA_E500000318 ERS488499 ERR594396 18.341542
TARA_030_DCM_0.22-1.6 TARA_A100001011 ERS478040 ERR318618 18.966866
TARA_030_DCM_0.22-1.6 TARA_A100001011 ERS478040 ERR318619 18.966866
TARA_030_DCM_0.22-1.6 TARA_A100001011 ERS478040 ERR318620 18.966866
TARA_030_DCM_0.22-1.6 TARA_A100001011 ERS478040 ERR318621 18.966866
TARA_030_DCM_<-0.22 TARA_X000001388 ERS478052 ERR594405 18.966866
TARA_030_SRF_0.22-1.6 TARA_A100001015 ERS478017 ERR315862 20.452725
TARA_030_SRF_0.22-1.6 TARA_A100001015 ERS478017 ERR315863 20.452725
TARA_031_SRF_0.22-1.6 TARA_A100001388 ERS488545 ERR598969 25.060908
TARA_031_SRF_0.22-1.6 TARA_A100001388 ERS488545 ERR599106 25.060908
48


Table SI cont'd.
SAMPLE_LABEL SAMPLEJD SAMPLE_ACC SRA_ACC TEMP
T A R A_031_S R F_< -0.22 TARA_A100001391 ERS488558 ERR594401 25.060908
T A R A_031_S R F_< -0.22 TARA_A100001391 ERS488558 ERR594410 25.060908
TARA_032_DCM_0.22-1.6 TARA_A100001037 ERS488599 ERR599061 26.0308
TARA_032_DCM_0.22-1.6 TARA_A100001037 ERS488599 ERR599097 26.0308
TARA_032_DCM_<-0.22 TARA_A100001518 ERS488613 ERR594360 26.0308
TARA_032_SRF_0.22-1.6 TARA_A100001035 ERS488569 ERR599041 25.823117
TARA_032_SRF_0.22-1.6 TARA_A100001035 ERS488569 ERR599116 25.823117
TARA_032_SRF_0.22-1.6 TARA_A100001035 ERS488569 ERR599155 25.823117
TARA_032_SRF_<-0.22 TARA_A100001515 ERS488589 ERR594393 25.823117
TARA_033_SRF_0.22-1.6 TARA_A100001234 ERS488621 ERR599049 27.32825
TARA_033_SRF_0.22-1.6 TARA_A100001234 ERS488621 ERR599134 27.32825
TARA_034_DCM_0.22-1.6 TARA_B100000029 ERS488685 ERR598975 27.575658
TARA_034_DCM_0.22-1.6 TARA_B100000029 ERS488685 ERR599111 27.575658
TARA_034_DCM_<-0.22 TARA_R100000008 ERS488701 ERR594390 27.575658
TARA_034_SRF_0.1-0.22 TARA_Y100000004 ERS488658 ERR594328 27.59735
TARA_034_SRF_0.22-1.6 TARA_B100000003 ERS488649 ERR598959 27.59735
TARA_034_SRF_0.22-1.6 TARA_B100000003 ERS488649 ERR598991 27.59735
TARA_034_SRF_<-0.22 TARA_R100000005 ERS488673 ERR594368 27.59735
TARA_034_SRF_<-0.22 TARA_R100000005 ERS488673 ERR594370 27.59735
TARA_036_DCM_0.22-1.6 TARA_B100000035 ERS488747 ERR598974 25.370083
TARA_036_DCM_0.22-1.6 TARA_B100000035 ERS488747 ERR599028 25.370083
TARA_036_DCM_<-0.22 TARA_R100000030 ERS488757 ERR594402 25.370083
TARA_036_SRF_0.1-0.22 TARA_Y100000015 ERS488722 ERR594334 25.5802
TARA_036_SRF_0.22-1.6 TARA_Y100000022 ERS488714 ERR598966 25.5802
TARA_036_SRF_0.22-1.6 TARA_Y100000022 ERS488714 ERR599143 25.5802
TARA_036_SRF_<-0.22 TARA_R100000027 ERS488737 ERR594369 25.5802
TARA_037_M ES_0.1-0.22 TARA_Y100000310 ERS488773 ERR594290 11.985711
TARA_037_M ES_0.1-0.22 TARA_Y100000310 ERS488773 ERR594345 11.985711
TARA_037_MES_0.22-1.6 TARA_B100000315 ERS488769 ERR599031 11.985711
TARA_037_MES_0.22-1.6 TARA_B100000315 ERS488769 ERR599062 11.985711
TARA_038_DCM_0.22-1.6 TARA_B100000073 ERS488830 ERR598949 25.488758
TARA_038_DCM_0.22-1.6 TARA_B100000073 ERS488830 ERR599082 25.488758
TARA_038_DCM_<-0.22 TARA_R100000084 ERS488836 ERR594386 25.488758
TARA_038_DCM_<-0.22 TARA_R100000084 ERS488836 ERR594399 25.488758
TARA_038_M ES_0.1-0.22 TARA_Y100000296 ERS488853 ERR594312 14.935498
TARA_038_MES_0.22-1.6 TARA_Y100000294 ERS488849 ERR599109 14.935498
TARA_038_MES_0.22-1.6 TARA_Y100000294 ERS488849 ERR599167 14.935498
TARA_038_SRF_0.1-0.22 TARA_Y100000289 ERS488803 ERR594330 26.23015
TARA_038_SRF_0.22-1.6 TARA_Y100000287 ERS488799 ERR599102 26.23015
TARA_038_SRF_0.22-1.6 TARA_Y100000287 ERS488799 ERR599158 26.23015
TARA_038_SRF_<-0.22 TARA_R100000081 ERS488813 ERR594374 26.23015
TARA_038_SRF_<-0.22 TARA_R100000081 ERS488813 ERR594400 26.23015
49


Table SI cont'd.
SAMPLE_LABEL SAMPLEJD SAMPLE_ACC SRA_ACC TEMP
TARA_039_DCM_0.22-1.6 TARA_B100000085 ERS488916 ERR599145 26.806383
TARA_039_DCM_<-0.22 TARA_R100000482 ERS488929 ERR594363 26.806383
TARA_039_DCM_<-0.22 TARA_R100000482 ERS488929 ERR594397 26.806383
TARA_039_M ES_0.1-0.22 TARA_Y100000034 ERS488945 ERR594346 15.571793
TARA_039_MES_0.22-1.6 TARA_Y100000031 ERS488936 ERR599037 15.571793
TARA_039_MES_0.22-1.6 TARA_Y100000031 ERS488936 ERR599172 15.571793
TARA_039_SRF_0.1-0.22 TARA_Y100000033 ERS488879 ERR594327 26.816392
TARA_039_SRF_<-0.22 TARA_R100000479 ERS488892 ERR594356 26.816392
TARA_039_SRF_<-0.22 TARA_R100000479 ERS488892 ERR594365 26.816392
TARA_041_DCM_0.22-1.6 TARA_B100000287 ERS489074 ERR598977 27.104162
TARA_041_DCM_0.22-1.6 TARA_B100000287 ERS489074 ERR599053 27.104162
TARA_041_DCM_<-0.22 TARA_R100000458 ERS489084 ERR594366 27.104162
TARA_041_DCM_<-0.22 TARA_R100000458 ERS489084 ERR594367 27.104162
TARA_041_DCM_<-0.22 TARA_R100000458 ERS489084 ERR594371 27.104162
TARA_041_DCM_<-0.22 TARA_R100000458 ERS489084 ERR594372 27.104162
TARA_041_DCM_<-0.22 TARA_R100000458 ERS489084 ERR594373 27.104162
TARA_041_SRF_0.1-0.22 TARA_Y100000052 ERS489047 ERR594295 29.08805
TARA_041_SRF_0.22-1.6 TARA_B100000282 ERS489043 ERR599011 29.08805
TARA_041_SRF_0.22-1.6 TARA_B100000282 ERS489043 ERR599074 29.08805
T A R A_041_S R F_< -0.22 TARA_R100000455 ERS489059 ERR594384 29.08805
TARA_042_DCM_0.22-1.6 TARA_B100000131 ERS489134 ERR599013 27.704835
TARA_042_DCM_0.22-1.6 TARA_B100000131 ERS489134 ERR599130 27.704835
TARA_042_DCM_<-0.22 TARA_R100000152 ERS489148 ERR594413 27.704835
TARA_042_SRF_0.22-1.6 TARA_B100000123 ERS489087 ERR599075 29.978933
TARA_042_SRF_0.22-1.6 TARA_B100000123 ERS489087 ERR599141 29.978933
T A R A_042_S R F_< -0.22 TARA_R100000149 ERS489113 ERR594398 29.978933
T A R A_042_S R F_< -0.22 TARA_R100000149 ERS489113 ERR594403 29.978933
TARA_045_SRF_0.22-1.6 TARA_B100000161 ERS489236 ERR599045 30.495017
TARA_045_SRF_0.22-1.6 TARA_B100000161 ERS489236 ERR599054 30.495017
T A R A_046_S R F_< -0.22 TARA_R100000406 ERS489285 ERR594376 30.124333
TARA_048_SRF_0.1-0.22 TARA_Y100000114 ERS489324 ERR594314 29.8169
TARA_048_SRF_0.22-1.6 TARA_B100000242 ERS489315 ERR599019 29.8169
TARA_048_SRF_0.22-1.6 TARA_B100000242 ERS489315 ERR599138 29.8169
TARA_052_DCM_0.22-1.6 TARA_B100000214 ERS489585 ERR599002 24.87709
TARA_052_DCM_0.22-1.6 TARA_B100000214 ERS489585 ERR599016 24.87709
TARA_052_DCM_<-0.22 TARA_R100000234 ERS489603 ERR594394 24.87709
TARA_052_SRF_0.22-1.6 TARA_B100000212 ERS489529 ERR599098 27.867083
TARA_052_SRF_0.22-1.6 TARA_B100000212 ERS489529 ERR599139 27.867083
TARA_056_MES_0.22-3 TARA_B100000378 ERS489727 ERR599112 6.1433
TARA_056_SRF_0.22-3 TARA_B000000609 ERS489712 ERR599057 27.32185
TARA_057_SRF_0.22-3 TARA_B000000565 ERS489733 ERR599058 26.964183
TARA_058_DCM_0.22-3 TARA_B000000557 ERS489846 ERR599026 25.293572
50


Table SI cont'd.
SAMPLE_LABEL SAMPLEJD SAMPLE_ACC SRA_ACC TEMP
TARA_062_SRF_0.22-3 TARA_B000000532 ERS489877 ERR599012 25.078983
TARA_064_DCM_0.1-0.22 TARA_Y100000401 ERS490010 ERR594324 22.237532
TARA_064_DCM_0.22-3 TARA_B100000405 ERS490002 ERR598972 22.237532
TARA_064_DCM_0.22-3 TARA_B100000405 ERS490002 ERR599023 22.237532
TARA_064_DCM_0.22-3 TARA_B100000405 ERS490002 ERR599025 22.237532
TARA_064_DCM_<-0.22 TARA_R100000315 ERS490026 ERR594385 22.237532
TARA_064_MES_0.22-3 TARA_B100000408 ERS489987 ERR599021 7.54715
TARA_064_MES_0.22-3 TARA_B100000408 ERS489987 ERR599164 7.54715
TARA_064_SRF_0.22-3 TARA_B100000401 ERS489917 ERR598970 22.164483
TARA_064_SRF_0.22-3 TARA_B100000401 ERS489917 ERR599088 22.164483
TARA_064_SRF_0.22-3 TARA_B100000401 ERS489917 ERR599150 22.164483
TARA_064_SRF_<-0.22 TARA_R100000322 ERS489943 ERR594392 22.164483
TARA_065_DCM_0.1-0.22 TARA_Y100000361 ERS490089 ERR594291 21.809275
TARA_065_DCM_0.22-3 TARA_B000000441 ERS490085 ERR598990 21.809275
TARA_065_DCM_0.22-3 TARA_B000000441 ERS490085 ERR599018 21.809275
TARA_065_DCM_0.22-3 TARA_B000000441 ERS490085 ERR599110 21.809275
TARA_065_DCM_<-0.22 TARA_R100001198 ERS490120 ERR594382 21.809275
TARA_065_DCM_<-0.22 TARA_R100001198 ERS490120 ERR594414 21.809275
TARA_065_MES_0.22-3 TARA_B000000460 ERS490065 ERR598960 8.3215
TARA_065_MES_0.22-3 TARA_B000000460 ERS490065 ERR599034 8.3215
TARA_065_SRF_0.1-0.22 TARA_Y100000356 ERS490035 ERR594320 21.81569
TARA_065_SRF_0.22-3 TARA_B000000437 ERS490029 ERR598979 21.81569
TARA_065_SRF_0.22-3 TARA_B000000437 ERS490029 ERR599146 21.81569
TARA_065_SRF_<-0.22 TARA_R100001230 ERS490053 ERR594359 21.81569
TARA_065_SRF_<-0.22 TARA_R100001230 ERS490053 ERR594361 21.81569
TARA_066_DCM_0.22-3 TARA_B000000477 ERS490163 ERR598982 15.01455
TARA_066_DCM_0.22-3 TARA_B000000477 ERS490163 ERR599107 15.01455
TARA_066_DCM_<-0.22 TARA_R100000908 ERS490180 ERR594389 15.01455
TARA_066_SRF_0.22-3 TARA_B000000475 ERS490124 ERR598973 15.032708
TARA_066_SRF_0.22-3 TARA_B000000475 ERS490124 ERR599068 15.032708
TARA_066_SRF_0.22-3 TARA_B000000475 ERS490124 ERR599173 15.032708
TARA_066_SRF_<-0.22 TARA_R100000900 ERS490142 ERR594362 15.032708
TARA_067_SRF_0.22-3 TARA_B100000497 ERS490183 ERR598994 12.833708
TARA_067_SRF_0.22-3 TARA_B100000497 ERS490183 ERR599144 12.833708
TARA_067_SRF_0.22-0.45 TARA_Y100000389 ERS490192 ERR594313 12.833708
TARA_067_SRF_0.45-0.8 TARA_Y100000385 ERS490193 ERR594325 12.833708
TARA_067_SRF_<-0.22 TARA_R100000951 ERS490204 ERR594395 12.833708
TARA_067_SRF_<-0.22 TARA_R100000951 ERS490204 ERR594404 12.833708
TARA_068_DCM_0.22-3 TARA_B100000482 ERS490296 ERR599017 16.780772
TARA_068_DCM_0.22-3 TARA_B100000482 ERS490296 ERR599056 16.780772
TARA_068_DCM_0.22-3 TARA_B100000482 ERS490296 ERR599103 16.780772
TARA_068_DCM_0.22-0.45 TARA_Y100000748 ERS490303 ERR594294 16.780772
51


Table SI cont'd.
SAMPLE_LABEL SAMPLEJD SAMPLE_ACC SRA_ACC TEMP
TARA_068_DCM_0.45-0.8 TARA_Y100000746 ERS490304 ERR594348 16.780772
TARA_068_DCM_<-0.22 TARA_R100000995 ERS490320 ERR594415 16.780772
TARA_068_MES_0.22-3 TARA_B100000470 ERS490230 ERR598947 6.956633
TARA_068_MES_0.22-3 TARA_B100000470 ERS490230 ERR599131 6.956633
TARA_068_MES_0.45-0.8 TARA_Y100000758 ERS490238 ERR594302 6.956633
TARA_068_SRF_0.22-3 TARA_B100000475 ERS490265 ERR599129 16.83115
TARA_068_SRF_0.22-3 TARA_B100000475 ERS490265 ERR599171 16.83115
TARA_068_SRF_0.22-3 TARA_B100000475 ERS490265 ERR599174 16.83115
TARA_068_SRF_0.22-0.45 TARA_Y100000741 ERS490272 ERR594318 16.83115
TARA_068_SRF_0.45-0.8 TARA_Y100000739 ERS490273 ERR594297 16.83115
TARA_068_SRF_<-0.22 TARA_R100000988 ERS490285 ERR594391 16.83115
TARA_070_MES_0.22-3 TARA_B100000446 ERS490373 ERR599044 4.156283
TARA_070_MES_0.22-3 TARA_B100000446 ERS490373 ERR599149 4.156283
TARA_070_MES_0.22-0.45 TARA_Y100000782 ERS490380 ERR594299 4.156283
TARA_070_MES_0.22-0.45 TARA_Y100000782 ERS490380 ERR594308 4.156283
TARA_070_MES_0.45-0.8 TARA_Y100000780 ERS490382 ERR594331 4.156283
TARA_070_MES_<-0.22 TARA_R100001039 ERS490388 ERR594407 4.156283
TARA_070_SRF_0.22-3 TARA_B100000459 ERS490327 ERR599135 19.775817
TARA_070_SRF_0.22-3 TARA_B100000459 ERS490327 ERR599165 19.775817
TARA_070_SRF_0.22-0.45 TARA_Y100000768 ERS490336 ERR594349 19.775817
T A R A_070_S R F_0.45-0.8 TARA_Y100000766 ERS490337 ERR594335 19.775817
T A R A_070_S R F_< -0.22 TARA_R100001015 ERS490346 ERR594353 19.775817
TARA_072_DCM_0.22-3 TARA_B100000427 ERS490476 ERR599133 24.092317
TARA_072_DCM_0.22-3 TARA_B100000427 ERS490476 ERR599137 24.092317
TARA_072_DCM_<-0.22 TARA_R100001082 ERS490494 ERR594379 24.092317
TARA_072_MES_0.22-3 TARA_B100000508 ERS490507 ERR599005 4.610525
TARA_072_MES_0.22-3 TARA_B100000508 ERS490507 ERR599048 4.610525
TARA_072_MES_<-0.22 TARA_R100001086 ERS490522 ERR594388 4.610525
TARA_072_SRF_0.22-3 TARA_B100000424 ERS490433 ERR598984 25.0249
TARA_072_SRF_0.22-3 TARA_B100000424 ERS490433 ERR599105 25.0249
T A R A_072_S R F_< -0.22 TARA_R100001079 ERS490452 ERR594364 25.0249
TARA_076_DCM_0.22-3 TARA_B100000519 ERS490597 ERR599040 21.613933
TARA_076_DCM_0.22-3 TARA_B100000519 ERS490597 ERR599148 21.613933
TARA_076_DCM_0.22-0.45 TARA_Y100000817 ERS490602 ERR594321 21.613933
TARA_076_DCM_0.45-0.8 TARA_Y100000814 ERS490603 ERR594298 21.613933
TARA_076_DCM_<-0.22 TARA_R100001129 ERS490610 ERR594355 21.613933
TARA_076_M ES_0.22-3 TARA_B100000749 ERS490633 ERR599000 4.685912
TARA_076_M ES_0.22-3 TARA_B100000749 ERS490633 ERR599154 4.685912
TARA_076_MES_0.45-0.8 TARA_Y100000815 ERS490643 ERR594333 4.685912
T A R A_076_S R F_0.22-3 TARA_B100000513 ERS490542 ERR599010 23.3484
T A R A_076_S R F_0.22-3 TARA_B100000513 ERS490542 ERR599126 23.3484
TARA_076_SRF_0.22-0.45 TARA_Y100000816 ERS490547 ERR594310 23.3484
52


Table SI cont'd.
SAMPLE_LABEL SAMPLEJD SAMPLE_ACC SRA_ACC TEMP
T A R A_076_S R F_0.45-0.8 TARA_Y100000813 ERS490548 ERR594286 23.3484
T A R A_076_S R F_< -0.22 TARA_R100001126 ERS490557 ERR594354 23.3484
TARA_078_DCM_0.22-3 TARA_B100000530 ERS490691 ERR599046 19.287683
TARA_078_DCM_0.22-3 TARA_B100000530 ERS490691 ERR599101 19.287683
TARA_078_DCM_0.22-0.45 TARA_Y100000996 ERS490696 ERR594336 19.287683
TARA_078_DCM_0.45-0.8 TARA_Y100000994 ERS490697 ERR594303 19.287683
TARA_078_MES_0.22-3 TARA_B100000745 ERS490714 ERR599124 5.7955
TARA_078_MES_0.22-3 TARA_B100000745 ERS490714 ERR599159 5.7955
TARA_078_MES_0.45-0.8 TARA_Y100001001 ERS490722 ERR594289 5.7955
TARA_078_SRF_0.22-3 TARA_B100000524 ERS490659 ERR599006 19.853083
TARA_078_SRF_0.22-3 TARA_B100000524 ERS490659 ERR599022 19.853083
TARA_078_SRF_0.22-0.45 TARA_Y100000992 ERS490664 ERR594340 19.853083
T A R A_078_S R F_0.45-0.8 TARA_Y100000991 ERS490665 ERR594332 19.853083
T A R A_078_S R F_< -0.22 TARA_R100001224 ERS490676 ERR594411 19.853083
TARA_082_DCM_0.22-3 TARA_B100000767 ERS490928 ERR599027 6.971806
TARA_082_DCM_0.22-3 TARA_B100000767 ERS490928 ERR599122 6.971806
TARA_082_DCM_<-0.22 TARA_R100000544 ERS490953 ERR594409 6.971806
TARA_082_SRF_0.22-3 TARA_B100000768 ERS490885 ERR599009 7.321
TARA_082_SRF_0.22-3 TARA_B100000768 ERS490885 ERR599035 7.321
TARA_084_SRF_0.22-3 TARA_B100000780 ERS491001 ERR598945 1.84374
TARA_084_SRF_0.22-3 TARA_B100000780 ERS491001 ERR599059 1.84374
TARA_085_DCM_0.22-3 TARA_B100000795 ERS491095 ERR599104 -0.784051
TARA_085_DCM_0.22-3 TARA_B100000795 ERS491095 ERR599121 -0.784051
TARA_085_DCM_<-0.22 TARA_R100001377 ERS491107 ERR594377 -0.784051
TARA_085_MES_0.22-3 TARA_B100000809 ERS491110 ERR599008 0.42244
TARA_085_MES_0.22-3 TARA_B100000809 ERS491110 ERR599125 0.42244
TARA_085_SRF_0.22-3 TARA_B100000787 ERS491044 ERR599090 0.67084
TARA_085_SRF_0.22-3 TARA_B100000787 ERS491044 ERR599176 0.67084
TARA_093_DCM_0.22-3 TARA_B100001059 ERS491463 ERR598965 16.395725
TARA_093_SRF_0.22-3 TARA_B100001063 ERS491421 ERR599064 18.013592
TARA_094_SRF_0.22-3 TARA_B100001057 ERS491492 ERR599050 21.130913
TARA_096_SRF_0.22-3 TARA_B100000989 ERS491525 ERR598967 23.79795
TARA_098_DCM_0.22-3 TARA_B100001029 ERS491740 ERR599042 20.137131
TARA_098_DCM_0.22-3 TARA_B100001029 ERS491740 ERR599079 20.137131
TARA_098_MES_0.22-3 TARA_B100001013 ERS491767 ERR599071 8.063246
TARA_098_MES_0.22-3 TARA_B100001013 ERS491767 ERR599085 8.063246
TARA_098_SRF_0.22-3 TARA_B100001027 ERS491699 ERR599093 25.14755
TARA_098_SRF_0.22-3 TARA_B100001027 ERS491699 ERR599120 25.14755
TARA_099_SRF_0.22-3 TARA_B100000886 ERS491804 ERR599024 23.787692
TARA_100_DCM_0.22-3 TARA_B100000965 ERS491874 ERR599081 20.63862
TARA_100_DCM_0.22-3 TARA_B100000965 ERS491874 ERR599113 20.63862
TARA_100_SRF_0.22-3 TARA_B100000963 ERS491836 ERR599063 25.249967
53


Table SI cont'd.
SAMPLE_LABEL SAMPLEJD SAMPLE_ACC SRA_ACC TEMP
TARA_100_SRF_0.22-3 TARA_B100000963 ERS491836 ERR599163 25.249967
TARA_100_SRF_0.22-3 TARA_B100000963 ERS491836 ERR599169 25.249967
TARA_102_DCM_0.22-3 TARA_B100000902 ERS492012 ERR598962 19.557984
TARA_102_DCM_0.22-3 TARA_B100000902 ERS492012 ERR599007 19.557984
TARA_102_DCM_0.22-3 TARA_B100000902 ERS492012 ERR599168 19.557984
TARA_102_MES_0.22-3 TARA_B100000953 ERS491980 ERR599055 9.142475
TARA_102_MES_0.22-3 TARA_B100000953 ERS491980 ERR599128 9.142475
TARA_102_MES_0.22-3 TARA_B100000953 ERS491980 ERR599132 9.142475
TARA_102_SRF_0.22-3 TARA_B100000900 ERS491938 ERR598943 24.941942
TARA_102_SRF_0.22-3 TARA_B100000900 ERS491938 ERR598978 24.941942
TARA_109_DCM_0.22-3 TARA_B100000927 ERS492177 ERR598952 26.525285
TARA_109_DCM_0.22-3 TARA_B100000927 ERS492177 ERR599065 26.525285
TARA_109_DCM_0.22-3 TARA_B100000927 ERS492177 ERR599108 26.525285
TARA_109_DCM_<-0.22 TARA_R100001510 ERS492198 ERR594357 26.525285
TARA_109_DCM_<-0.22 TARA_R100001510 ERS492198 ERR594380 26.525285
TARA_109_DCM_<-0.22 TARA_R100001510 ERS492198 ERR594381 26.525285
TARA_109_DCM_<-0.22 TARA_R100001510 ERS492198 ERR594383 26.525285
TARA_109_DCM_<-0.22 TARA_R100001510 ERS492198 ERR594387 26.525285
TARA_109_MES_0.22-3 TARA_B100000929 ERS492205 ERR598971 11.267967
TARA_109_MES_0.22-3 TARA_B100000929 ERS492205 ERR599067 11.267967
TARA_109_SRF_0.22-3 TARA_B100000925 ERS492145 ERR598997 27.6163
TARA_109_SRF_0.22-3 TARA_B100000925 ERS492145 ERR599118 27.6163
TARA_109_SRF_<-0.22 TARA_R100001509 ERS492160 ERR594412 27.6163
TARA_110_DCM_0.22-3 TARA_B100001113 ERS492264 ERR599014 21.80499
TARA_110_MES_0.22-3 TARA_B100001079 ERS492294 ERR599020 10.213385
TARA_110_SRF_0.22-3 TARA_B100001109 ERS492228 ERR599039 23.867142
TARA_lll_DCM_0.22-3 TARA_B100000579 ERS492357 ERR598961 19.864927
TARA_lll_MES_0.22-3 TARA_B100000586 ERS492381 ERR599086 10.850148
TARA_lll_SRF_0.22-3 TARA_B100000575 ERS492321 ERR599077 22.76754
TARA_112_DCM_0.22-3 TARA_B100000945 ERS492445 ERR598957 22.230474
TARA_112_MES_0.22-3 TARA_B100000949 ERS492471 ERR599072 5.731142
TARA_112_SRF_0.22-3 TARA_B100000941 ERS492408 ERR598954 24.236317
TARA_122_DCM_0.1-0.22 TARA_Y100001973 ERS492703 ERR594284 24.67165
TARA_122_DCM_0.22-3 TARA_B100000700 ERS492699 ERR598948 24.67165
TARA_122_DCM_0.22-0.45 TARA_Y100001970 ERS492704 ERR594304 24.67165
TARA_122_DCM_0.45-0.8 TARA_Y100001968 ERS492705 ERR594301 24.67165
TARA_122_M ES_0.1-0.22 TARA_Y100001951 ERS492683 ERR594309 7.153937
TARA_122_MES_0.22-3 TARA_B100000678 ERS492680 ERR598999 7.153937
TARA_122_MES_0.22-3 TARA_B100000678 ERS492680 ERR599033 7.153937
TARA_122_MES_0.22-3 TARA_B100000678 ERS492680 ERR599083 7.153937
TARA_122_MES_0.22-3 TARA_B100000678 ERS492680 ERR599096 7.153937
TARA_122_MES_0.22-0.45 TARA_Y100001949 ERS492684 ERR594305 7.153937
54


Table SI cont'd.
SAMPLE_LABEL SAMPLEJD SAMPLE_ACC SRA_ACC TEMP
TARA_122_MES_0.45-0.8 TARA_Y100001947 ERS492685 ERR594322 7.153937
TARA_122_SRF_0.1-0.22 TARA_Y100001972 ERS492646 ERR594292 26.54279
TARA_122_SRF_0.22-3 TARA_B100001115 ERS492642 ERR598992 26.54279
TARA_122_SRF_0.22-0.45 TARA_Y100001980 ERS492647 ERR594307 26.54279
TARA_122_SRF_0.45-0.8 TARA_Y100001978 ERS492648 ERR594306 26.54279
TARA_123_M IX_0.1-0.22 TARA_Y100001963 ERS492781 ERR594293 22.08542
TARA_123_MIX_0.22-3 TARA_B100000686 ERS492778 ERR598956 22.08542
TARA_123_MIX_0.22-3 TARA_B100000686 ERS492778 ERR598998 22.08542
TARA_123_MIX_0.22-3 TARA_B100000686 ERS492778 ERR599117 22.08542
TARA_123_MIX_0.22-3 TARA_B100000686 ERS492778 ERR599157 22.08542
TARA_123_MIX_0.22-0.45 TARA_Y100001960 ERS492782 ERR594337 22.08542
TARA_123_MIX_0.45-0.8 TARA_Y100001956 ERS492783 ERR594319 22.08542
TARA_123_SRF_0.22-3 TARA_B100000683 ERS492733 ERR599160 26.574917
TARA_123_SRF_0.22-0.45 TARA_Y100001958 ERS492738 ERR594326 26.574917
TARA_123_SRF_0.45-0.8 TARA_Y100001954 ERS492739 ERR594347 26.574917
TARA_124_M IX_0.1-0.22 TARA_Y100001938 ERS492866 ERR594285 25.195762
TARA_124_MIX_0.22-3 TARA_B100000676 ERS492863 ERR598988 25.195762
TARA_124_MIX_0.22-3 TARA_B100000676 ERS492863 ERR599084 25.195762
TARA_124_MIX_0.22-3 TARA_B100000676 ERS492863 ERR599089 25.195762
TARA_124_MIX_0.22-3 TARA_B100000676 ERS492863 ERR599161 25.195762
TARA_124_MIX_0.22-0.45 TARA_Y100001936 ERS492867 ERR594343 25.195762
TARA_124_MIX_0.45-0.8 TARA_Y100001934 ERS492868 ERR594338 25.195762
TARA_124_SRF_0.1-0.22 TARA_Y100001937 ERS492818 ERR594287 26.516693
TARA_124_SRF_0.22-3 TARA_B100000674 ERS492821 ERR588857 26.516693
TARA_124_SRF_0.22-3 TARA_B100000674 ERS492814 ERR599036 26.516693
TARA_124_SRF_0.22-3 TARA_B100000674 ERS492814 ERR599069 26.516693
TARA_124_SRF_0.22-3 TARA_B100000674 ERS492814 ERR599080 26.516693
TARA_124_SRF_0.22-3 TARA_B100000674 ERS492814 ERR599151 26.516693
TARA_124_SRF_0.22-0.45 TARA_Y100001935 ERS492819 ERR594311 26.516693
TARA_124_SRF_0.45-0.8 TARA_Y100001933 ERS492820 ERR594296 26.516693
TARA_125_MIX_0.22-3 TARA_B100001123 ERS492926 ERR599156 23.677375
TARA_125_SRF_0.1-0.22 TARA_Y100000592 ERS492892 ERR594344 26.778917
TARA_125_SRF_0.22-3 TARA_B100001121 ERS492888 ERR599066 26.778917
TARA_125_SRF_0.22-3 TARA_B100001121 ERS492888 ERR599091 26.778917
TARA_125_SRF_0.22-3 TARA_B100001121 ERS492888 ERR599114 26.778917
TARA_125_SRF_0.22-3 TARA_B100001121 ERS492888 ERR599119 26.778917
TARA_125_SRF_0.22-0.45 TARA_Y100000590 ERS492893 ERR594339 26.778917
TARA_125_SRF_0.45-0.8 TARA_Y100000588 ERS492894 ERR594323 26.778917
55


Full Text

PAGE 1

!"#$%&'%()*(+*!"(,-".('%&*(!'%/0/*1"(2'3*'#/!#"-'0"#*4-5#$*()* 1#)(/%&*-)$*!"('#(/%&*+#-'0"#5 67 /-88%,-*%.#" 495:9;*5<=>?@>6<>*!ABCD*!BED*0E>=D@F>?7;*GHIJ * -*?ADF>F*FB6K>??DL*?M*?AD* +<:BC?7*MN*?AD*1@=D@F>?7*MN*&MCM@E*O<@?>CCKDE?* MN*?AD*@DPB>@DKDE?F*NM@*?AD*LDQ@DD*MN* /DE:DF* %E?DQ@DE:DF*!@MQ@
PAGE 2

* >> 'A>F ?ADF>F*NM@*?AD*/DE:DF*LDQ@DD*67* /S<*%7D@ ADE:DF*!@MQ@F?MOAD@*59*/>CCD@;*&A<>@ 3<>*8>E />:A
PAGE 3

* >>> %7D@;*/S<*W/%5;*%E?DQ@DE:DF*!@MQ@:?>ME*MN*!@MS<@7M?>:*(O?>KBK*1@MY?A*'DKOD@:*:*+DF*$>@D:?DL*67* -FF>F?F?MOAD@*59*/>CCD@ "#$%&"'% 4<:?D@><*LD*@QA*F*SEMYE*<6MB? YAMEF*DF*A<=D* ?@>DL*?M*>LDE?>N7* ?AD*QDEMK>:* :*MEF*?AFKF* ?M*Q@MY*DF*A<=D*CDL*?M*?AD*LD=DCMOKDE?*MN*<*O@DL>:?M@*MN* MO?>KBK*Q@MY?A* ?DKOD@F*<::B@LD*@QAC>QA?>EQ*>E*O<@?>:BC<@*?AD*SEMYCDLQD*QE*MB@*BELD@F?EQ* MN*YAME* MOD@E?D@KDL>DF*YD@D* OD@NM@KDL YADE*QDEMK>:*L:F*F?BL7;*YD*LMEF* @DOM@?DL* >E*?ADFD* F?BL>DF*6D?YDDE*QDE MK>:]O@M?DMK>:*NDKBK*Q@ MY?A*?DKOD@:=D@FD* KMLD@E*QDEMK>:* LED* CD<@E>EQ*ED*?ADFD*NDE?M*<*FBOD@ Z O@DL>:?M@*MN*MO?>KBK* Q@MY?A*?DKOD@EL*?A?A*MO?>KBK*Q@MY?A*?DKOD@E*MB@*D\OME F <@D F?@MEQD@*QAD@*?DKOD@EME*MN* NDQE>N>:EL>=>LBE*O@DL>:?>EQ* MO?>KBK* Q@MY?A*?DKOD @?A*A>QA*<::B@<:79* +>EF*F?BL7*?M* FAM?QBE* KD?:*LEQ*?YM*MN*?AD*ND:*@DEQ*?AD*:M@@DCME*6D?YDDE*?AD*ND:*FEQ*?DKOD@
PAGE 4

* >= (B@*F?BL7*MNND@F*EDY*>EF>QA? F*>E?M*?AD*FDCD:?>=D*O@DFFB@DF*ME*QDEMK>:*:* ND:*MO?>KBK*Q@MY?A*?DKOD@
PAGE 5

* = "'()*+,-./-0-)%$! 'A>F*O@M^D:?*YMBCL*EM?*A<=D*6DDE*OMFF>6CD*Y>?AMB?*?AD*ADCO LSD*?M*?AF ?MOAD@ />CCD@;*K7*FM@;*NM@*KDE?M@>EQ* KD*?A@MBQAMB?*?AD*LB@ME*MN*?AD*O@M^D:?*EQ*KD*OB@FBD*K7*QME*>E?D@L>F:>OC>E<@7*F:>DE:D 9*%*YMBCL*SD*?M*?AED*$L>EQ*?AD*>E>?>KBK*Q@MY?A*?DKOD@ME*>ENM@KME*?A?A*MO?>KBK*Q@MY?A*?DKOD@SD*?M*?A?>PB>EQ*K7*YM@S*L>EQ*BFDNBC*NDDL6<:S9*+>ESD*?M*?AC7*YAM*K?* OMFF>6CD*NM@*KD*?M*OB@FBD*Q@DF* ME*MN*K7*F?BL>DF9

PAGE 6

* I %"#,! *1!'*)%-)%$ &3-!'#"5 ***** %9* ***** %)'"($0&'%() 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 99999999999 G ***** %%9* *** /-'#"%-85*-)$*/#'3($ 5 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 9999999999999999999 R ***** %%%9* ** "#508'5*-)$*$%5&055% () 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 999999999999999999 IJ ***** %_ 9***&()&805%()5 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 9999999999999 `V "#+#"#) 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999 aI -!!#)$%b*-T*50!!8#/# )'-".*/#'3($5 99999999999999999999999999999999 99999999999999999999999999999999 99999999999999999999999999999999 aJ -!!#)$%b*4T 50!!8#/#)'-".*+%10"# 5*-)$*'-48#5 99999999999999999999999999999999 99999999999999999999999999999999 999999 aR

PAGE 7

* G '2"3%-&!4 4)%&*.5'%4*) !@MS<@7M?DF*<@D*SEMYE*?M*Q@MY*LD*@EQ* N@MK*QA*?*>F*EM?* BELD@F?MML YAME*NM@*FOD:>N>:*QDEMK>:* M@*O@M?DMK>:*NDQA?*F?<6>C>cD*KMCD:BC<@*:MEF?>?BDE?F*MN*?AD*:DCC*NM@ >E:@DND*M@*F?<6>C>?7*?AF*MO?>KFK*Q@MY?A*?DKOD@F* F ?<6>C>?7*:C7;*>?* AC>?7* :END@@DL*N@MK*QDEMK>:*:*FDPBDE:D*>ENM@KME;*FB:A*E*QDEDF* W # X ;* M@*N@DPBDE:7*MN*:D@?<>E*EM*<:>LF W $ ;* % X 9*** /DF*A<=D*D\EDL*?AD*@DCMEFA>O*6D?YDDE*=<@>MBF*QDEMK>:*:*ND:*MO?>KBK*Q@MY?A*?DKOD@DF*D\E>EQ*QDEMK>:*ND?A*(1'* W X 9* 'A>F*:EDL*67*?AD* N<:?*?A@F*<@D*6MELDL*67*?A@DD*A7L@MQDE*6MELF ;* YA>:A*K@F ;*YA>:A*<@D* 6MELDL*67*?YM*A7L@MQDE*6MELF 9* 3>QAD@*DE=>@MEKDE?*?DKOD@@D*Q@DC>?7*?M*O@D=DE?* LDEME*MN*")-*FD:MEL<@7*F?@B:?B@DF9* 1&*:ME?DE?*AEDL*>E* :D@?<>E*QDEDF*:?*(1'* W # X 9 5MKD*O@M?DMK>:*NDKBK* Q@MY?A*?DKOD@FBCN>LD*6MELF*>E*O@M?D>EF*LD@>=DL*N@MK*?AD@KMOA>C>:*M@QFKF9* $>FBCN>LD*6MELF*<@D* F?<6>C>c>EQ*6MELF NM@KDL*6D?YDDE*O<>@F*MN*:7F? D>ED*@DF>LBDF ?AE:@DC>?7 QAD@*?DKOD@
PAGE 8

* ` _<@EDL*?AD*F?@B:?B@NND@DE:DF*6D?YDDE*<*FD?*MN*KDFMOA>C>:*C>:*O@M?D>EF*LQDF]>ME*O<>@F ;*FBQQDF?>EQ*?AE?D@<:?>MEF*C>cD* O@M?D>EF*QA*?DKOD@DF*A<=D*NND@DE?* EM*<:>L* Q@MBOF*WOMC<@;*:A<@QDL;*A7L@MOAM6>:*Q@MBOFX*A<=D*L>NND@DE?*N@DPBDE:>DF*>E*?AD* FD:MEL<@7*F?@B:?B@DF*MN*?AD@KMOA>C>:;*KDFMOA>C>:*C>:*O@M?D>EF* W $ ;* % ;* X 9*'A>F*:MBCL*6D*LBD*?M*?AD*N<:?*?AC>:*O@M?D>EF*@DPB>@D*>E:@DC>?7*MN*?AD>@*FD:MEL<@7*F?@B:?B@DF;*YA>CD*OF7:A@MOA>C>:*O@M?D>EF*@DPB>@D* KM@D*NCD\>6>C>?7*?M*ME>EQ*:A*D?*?*YME*MN*%_.2"#8*EM* <:>LF*>E*<*O@M?DMKD*:M@@DC?A*?AD*(1'*MN*?AD*M@QFK9*'A>F* :M@@DCME*YEQ*?AD*:MK6>EDL*N@DPBDE:7*MN*6CD* :MK6>EMEF*MN*EM*<:>LF*?M*LD?D@K>ED*YA>:A AME9*%E* ?*YODO?>LD*N@DPBDE: >DF*:MBCL*6D*BFDL* ?M*O@DL>:? O@M?D>E*KDC?>EQ*?DKOD@F* F?BL7;*,B*D?*=DE*L>ODO?>LD*?M*M::B@*>E*<*A>QA*'K*O@M?D>E*E9*4F;*FMKD*L>ODO?>LDF*YD@D* >END@@DL* ?M*:ME?@>6B?D*?M*A>QAD@* O@M?D>E*KDC?>EQ*?DKOD@END@@DL* ?M*@DLB:D*O@M?D>E*KDC?>EQ* ?DKOD@EQ*?AD*N@DPBDE:7*MN*M::B@@DE:D*MN*?ADFD*L>ODO?>LDF;*E*:MBCL*6D*:CN>DL*QA*'Kh*M@*gCMY*'Kh*O@M?D>E9*' AD* OD@:DE?QA*'K*O@M?D>EF*>E*?AD*O@M?DMKD*MN*FK*:MBCL*?ADE* 6D*BFDL*?M*:CN7*?AD*M@QFK*C>:*M@*KDFMOA>C>:9*&AE*?AD>@*F?BL7*ME*<*F?@B:?B@FK*>E* +,-./01 ;*?AC>?7*MN*<*NDY*SD7*O@M?D>EF*LD?D@K>EDL*?AD*(1'*MN*?AD* +,-./01 9* -C?AMBQA*>?*>F :B@@DE?C7*MEC7*OMFF>6CD*?M*>END@*FB:A*<*KMLDC*NM@*YDCC Z F?BL>DL*M@QFKF*C>SD* +,-./01 ;*

PAGE 9

* a ? A>F*FBQQDF?F* <*KD:AFK*67*YA>:A* ?AD*'K*MN*?AD*O@M?D>EF*>E*FK* K>QA?* L7E::?ECC*D?*E*:A<>E*CDEQ?A*L>F?@>6B?>ME*MN*<*O@M?DMKD*:MBCL*6D*BFDL*?M*O@DL>:?*?AD*(1'*MN* ?AD*M@QFK9*%E*:MK6>EME*Y>?A*:MEF*MN*?AD*N@DD*DED@Q7*MN*NMCL>EQ*MN* ?AD@KMOA>C>:*C>:*O@M?D>EF;*?AD*:A<>E*CDEQ? A*L>F?@>6B?>ME*Y:?*?AD*Q@MY?A*@NND@DE?*?DKOD@\*KDFMOA>CDF* \ ?AD@KMOA>CDF9** -C?AMBQA*?ADFD*F?BL>DF*NMBEL <*@DCMEFA>O*6D?YDDE*=<@>MBF* QDEMK>:]O@M?DMK>:*ND:?M@F*MN*(1'*<:@MFF*<*6@MDF*?@>DL*<*:MK6>EME*MN*KM@D*?A:?*(1'9*/<:A>ED* CD<@E>EQ*E=MC=>EQ*<*:MK6>EME*MN*NDE*M?AD@*6>M>ENM@K:F*O@M6CDKF* W !% ;* !' X 9* +M@*D\ E*?AD*$"#-/J* :AEQ*Q@MBOF*BFDL*L>NND@DE?*END@*?AD*?@O?>MEME*LED* ?AD*OD@NM@K?7*E=MC=>EQ*<*:MK6>EME*MN*?AD*L>NND@DE?* KD?AMLF*BFDL*>E*?AD*:A?7*O@DL>:?M@* OD@NM@KDL*6D?? D@*?AEL>=>LBK>C<@C7;*<*:MK6>EME*MN*ND:?*O@MS<@7M?>:*MO?>KBK*Q@MY?A*?DKOD@?A*A>QAD@* <::B@<:7*?AEL>=>LB:?>ME*MN*(1 '*:MBCL DL*?M*?AD*F?BL7*MN*K>:@M6>?>DF9*/>:@M6>?>DF*:MEF>F?*MN*Q@MBOF*MN* M@QFKF* Y>?A*L>NND@DE?* (1'F*Q@MY>EQ*?MQD?AD@*>E*<*:MKKME*DE=>@MEKDE?9*&MKKBE>?7*F?@B:?B@D*?7*L7E:F;*YA>:A*>E*?B@E*>ENCBDE:DF*?AD* D E=>@MEKDE?* W !) X 9* /
PAGE 10

* J DE=>@MEKDE?ENCBDE:D*:MKKBE>?7*F?@B:?B@D;*Y>?A*DE=>@MEKDE?* ?DKOD@EQ* <*OMYD@NBC*MED W !) d !& X 9*-*O@DL>:?M@*MN*O@MS<@7M?>:*(1'*:MBCL*6D* BFDL*?M*O@DL>:?*?AD*gMO?>KBK*Q@MY?A*?DKOD@:@M6>?7*6:*@DF*O@DL>:?M@ :MBCL* >E* ?B@E* 6D*BFDL*?M* O@DL>:?*?7*F?@B:?B@D* K>QA?*:A?A*<*FA>N?*>E*DE=>@MEKDE?*?DKOD@DF*:MELB:?DL*O@D=>MBFC7*?M*>LDE?>N7*QDEMK>:*:* LD?D@K>E:?A* ?AD*ME*FDPBDE:>EQ*?D:AEMCMQ>DF;*?AD*EBK6D@*MN*QDEMKDF*>E* OB6C>:*LE:@DQB@D*I X 9 %E*?A>F*F?BL7;*YD*BFDL*<*C<@QD ;*OA7CMQDED?>:=D@FD L?A*(1'*?M* D\ED*LMBFC7* @ DOM@?DL* :M@@DCMEF9*2D* ?ADE*BFDL*<*K<:A>ED*CD<@E>EQ*ED ?ADFD*ND:?*(1' ME*>EL>=>LBMEF*NMBEL*>E*D<@C>D@* "!!!! #!!!! $!!!! %&&'"!!! "!!'"!%! "!%' ()*+ ,-.*/01234)+0-506)1-3)7 16789:! ; %<=>?!@8AB:9!=>B>D:! G !" H BI!I:>9J! 'AD* ME*FDPBDE:>EQ* ?D:AEMCMQ7*AE:@DE*?AD*EBK6D@*MN*QDEMKDF*>E*?AD*L
PAGE 11

* e F?BL>DF*YD@D* @DO@MLB:DL*MEC7*YDLD@DL9*3MYD=D@;*KMF?*NDMEF*QAD@*?DKOD@EDL*BF>EQ*<* 5BOOM@?*_D:?M@*/<:A>ED*?AK;*?AD*:MK6>EDL*O@DL>:?M@*MB?OD@NM@KDL*EL>=>LBE*O@DL >:?>EQ*(1'9 ***

PAGE 12

* R '2"3%-&!44 0"%-&4",$!").!0-%2*.$ 1DEMK>:* $E>EQ*MO?>KBK*Q@MY?A*?DKOD@ME*EBK6D@F*?M*OB6C>:C7*<=<>C<6CD*QDEMKDF*YEDL*N@MK*?AD* $DO<@?KDE?*MN*#ED@Q7* 1DEMKDF*(EC>ED*$:MEX 9*#<:A*O@MS<@7M?D*>E*?AD*?<6CD* APBD*1(8$*%$9*1(8$*%$F*YD@D*NB@?AD@*C>ESDL*?M*MED M@*KM@D*1DE6KBK*Q@MY?A* ?DKOD@F*D>?AD@*@DOM@?DL*>E*?AD*C>?D@E=DF?>Q?*QDEMKDF;*EDL*F>EQCD* ?DKOD@EQ*?AD*KDLDL9*'AD*1DE6LDL*NM@* D<:A*1(8$*%$*YD@D*BFDL*?M*M6?<>E*1DE6ME*EBK6D@F* N@MK*?AD* )MEM?D:AEMCMQ7*W )&4% X 1DE6MEF*YD @D*?ADE*BFDL*?M*LMYECMCDF*MN*")-*FDPBDE:DF NM@*D<:A* M@QFK =><*?AD*)&4%*+'! F>?D 9 1DE6MEF*YD@D*BFDL*YADE*"DNFDP*<::DFF>MEF*YD@D*EM?*<=<>C<6CD9* 'AD*?M?FKF*Y>?A*(1'*LEQ*" DFMB@:DF T -CC* EB\* F:>DE?>N>:* :MKOB?>EQ*:CBF?D@9*'AD*F?F?>:MEF*YD@D*:MEL B:?DL*BF>EQ*"5?BL>M*=D@F>ME* H9VV9aVI9 &ME*MN* )ME Z 5?@B:?B@E*?AD* LEL>=>LBFK9*5 MK D*1(8$*%$F*AOCD*1DE6OCD*QDEMKDF]O@M?DMKD F]")-*N>CDFX*?A*>?*@DO@DFDE?>EQ;*NM@*

PAGE 13

* [ D\NND@DE?*:A@MKMFMKDF*M@*OCLF9 + DEQ* CDF*?A*D<:A*1(8$*%$9 &ME*MN*%_.2"#8*N@<:?>MET -*:BF?MK* O7?AME* F:@>O?* KEQ*BFD*MN* 4>MO7?AME W "# X YED*CDF*WE?M*MED*N>CD;*ME*MN*EM*<:>LF*?ALDE?>N>DL*EM*<:>LF*WFB:A*ME*DE?>@DC7;*ENM@KME* <6MB?* ?AD %_.2"#8*N@DPBDE:7 MN*?AD*O@M?DMKD 9 &ME*MN*?AD*3>QA*'K*!@M?D>E*!D@:DE??A*<*F>EQCD*1(8$*%$*YD@D*:MK6>EDL*NM@*?A>F*:ME9*'AD*' K*%ELD\* W'%X*YD>QA?F*QEDL*NM@*D<:A*L>ODO?>LD*67*,B*D?*E*?AD*:ME;* ?A*?AD*NM@KBC<* O@DFDE?DL*67*,B*D?*EM*<:>LF*>E*?AD*FDPBDE:D >F*<*FOD:>N>:*L>ODO?>LD*>E*?AD*FDPBDE:DX $>ODO?>LDF*:ME?<>E>EQ*BE>LDE?>N>DL*EM*<:>LF*YD@D* CDN?*MB ?*DE?>@DC7*N@MK* ?AD*:ME*MEF*:QA?F*NM@* FB:A*L>ODO?>LDF9 1&*:ME?DE?*MN*Ie5*@")-*F?DKFT ")-*N>CDF*DEL>EQ*>E*g@E:9NE<9Qch* LMYECM<*?AD*)&4%*+'!*YD@D*BFDL*NM@*?A>F*:ME 9* + O?>MEF* YD@D*FD<@:ADL* NM@*?AD*F?@>EQ*gIe5*@>6MFMKLDE?>N7* CDF*?A*D<:A*1(8$*%$9*'ADFD*YD@D*?ADE* :MK6>EDL*>E?M*<*F>EQCD*N>CD ;*YA>:A* YQEDL* Y>?A*5%)-* W "$ X ?M*?AD*5%8_-*5 50*@")-*

PAGE 14

* V LME*IG[X* W "% X BF>EQ*?AD*NMCCMY>EQ*:MKKED O<@ -intype=fas ta o -outtype=fasta ptdb /home/data/silva/SSURef_NR99_128_SILVA_07_09_16_opt.arb 'AD*5%8_-*QEKDE?*EF* ME*MN* ?AD*Ie5*@")-*ADC>\* F?@B:?B@D;*YA>:A*YEDL*EQ* KEQ*?AD* NBCC*Ie5*QEKDE?* :MB@?DF7* $@9*#CK<@*!@BDFFD WOD@FME:MEX 9*'A>F* F?@>EQ*YE*MB@* 5%)Z O@MLB:DL* Ie5*FDPBDE:D*QEKDE?*?A:MEF9* %N*<*1(8$*%$*AOCD*Ie5*FDPBDE:DF;*N M@*D<:A*ADC>\*OMF>?>ME;*YD* :ME*MN*1&*6QEKDE?*WFM*?A\*OMF>?>ME*AMEX9*'AD*KDMEF*YEME*MN*1&*6E*?AD*ADC>:MEF*MN*?A D Ie5*FDPBDE:D F*OD@*1(8$* %$ 9 1&*:ME?DE?*MN*J5*@")* EQ*?AD* @")-*N>CD*KDE?>ME DL*<6M=D*NM@* NO?>MEF*NM@*?AD*F?@>EQ*gJ5*@>6MFMKLDE?>N>DL*CDF*?A*D<:A*1(8$*%$9*'ADFD*YD@D*?ADE* :MK6>EDL*>E?M*<*F>EQCD*N>CD OD@*1(8$*%$ ;*YA>:A*YME*M N*1&*6E*<* F>K>C<@*Y<7;*D\:DO?*?AO?>MEF*YD@D*O<@FDL*NM@*?AD*F?@>EQ* g?")-9h !@DL>:?>ME* MN*!@M?D>E*5?@B:?B@DFT /($!%!#* W "' X Y:?* F?@B:?B@DF*NM@*EF9*'A>F*>F*ODC>ED*?AE*F?@B:?B@DF N@MK*O@>K<@7*FDPBDE:D F?>EQ*AMKMCMQMBF* ?DKOCKD*YKE ME*MB@*A<@LY<@D 9*+M@*<*LE>EQ* KEF*D<:AX;*?AD*?M?KD*Y
PAGE 15

* IH 7D<@F*WYADE*@BE*ME*IHH*&!0F*F> KBC?LDL*?M*KMLDC*MEC7* <*FB6FD?*MN*:MKKMEC7*NMBEL*O@M?D>EF*NM@*D<:A*M@QFK W1(8$*%$X 9 3404.:1/9-/;-:<4-=5/:419-7>?74:@ 3>LLDE*/<@SM=*/MLDCF*W3//FX*YD@D* LMYECM EQCD Z :MO7*6<:?D@>E:D*?AD*!@M?D>E*$E*F?@B:?B@D*?DKOCE*?AD* NMCCMY>EQ*KMLDC>EQ*F?DO;*YD*LD:>LDL*?M*>LDE?>N7*IH*F>EQCD*:MO7*6<:?D@>C<6CD*>E*?AD !$4* W "& X 9*%E*M@LD@*?M*LM*?A>F;*YD* BFDL*?AD*NMCCMY>EQ*AKKFD<@:A*:MKKED*N@MK ?AD*3//D@*FB>?D*MN*?MMCF W =D@F>ME*`9I6G X W #( X EF?*<*L?D* ME*G[*UBED*GHIe9 hmmsearch tblout cut_tc o essential.out /home/iyer/research/modtest/analyses/essential.hmm /home/software/modpipe 2.2.0/database/PDB95/db/pdb_95.fasta 2D*?ADE*BFDL*?AD*FEF*<@D*NMBEL*>E*MB@*LB?:! ; )8AB:9!EF*<@D*EM?*NMBEL*>E*?* Y< F*C<6CD*NM@*?@<>E>EQ*MB@*O@DL>:?>ME*KMLDC*YMBCL*

PAGE 16

* II MB?YD>QA*?AD*>ENM@KME*CMF?*N@MK*?AD*C<:S*MN*FMKD*MN*?ADFD*O@M?D>EF*>E*?ADFD* O@M?DMKDF9 A/B4019C-/;-=5/:419-7:5>.:>547 89B-7404.:1/9-/;-?47:-D/B40 @ 5>E:D*D<:A* 1(8$*%$* ACD F*?A*>?;*YD*BFDL*AKKFD<@:A* W #( X ?M* >LDE?>N7*?AD*6DF?*A>?*NM@*D<:A*MN*?AD*?DE*O@M?D>EF*<:@MFF*?A*D<:A*1(8$*%$9*5?@B:?B@DF*NM@*?AD*6DF?*A>?F*YD@D*?ADE*KMLDCDL*BF>EQ*?AD* NMCCMY>E Q*/($!%!#*:MKK -sequence _file ~/ ModPipe.pl -conf_file -sequence_id -hits_mode 1000 /($!%!#* >F*<*?DKOCE*F?@B:?B@D*O@DL>:?>ME*O@MQ@OCD*KMLDCF*NM@*D<:A KMLDCDL*O@M?D>E*6NND@DE?*OMFF>6CD* ?DKOCE>?>F:@D?D*(O?>K>cDL*!@M?D>E*#ED@Q7* W $(!# X F:M@D*YF*:* L>F?F?>:?A*?AD*CMYDF?*$(!#*F:M@D* W #! X YE*ME*MN*?AD*F?@B:?B@ME*MN*5 D :MEL<@7*5 ?@B:?B@DF LBDFh NM@*4DF?*/ MLDCFT $55!* W #" X YE*O@DL>:?DL*O@M?D>E* F?@B:?B@DF;*=><*?AD*!$49 $55!* KMLBCD* >E*4>MO7?AME W "# ;* ## X 9*$55!*BFDF* ?AD* : MLDF* 4 =DC79*'ADFD*YD@D*?@DEQCD FD:MEL<@7*F?@B:?B@D ?7OD 9*$55!*BFDF*?AD*:MLDF* '* =DC79*'ADFD*YD@D*?@DEQCD* FD:MEL<@7*F?@B:?B@D ?7OD WCMMOFX 9* 3DC>:DF*YD@D*>LDE?>N>DL*67*?AD*$55!*:MLDF 3;*%*LBD*YLD@DL*

PAGE 17

* IG D\OMFDL >N >?F*@DC=D*FMC=DE?*<::DFF>6>C >?7*YCSD*=ED*D<:A*@DF>LBDkF*<::DFF>6CD*FB@N<:D*<@D<9 &M E*MN*-K>EM*-:>L*1@MBO*+@DPBDE:>DF*>E*5D:MEL<@7*5 ?@B:?B@DF LBDFT -*:BF?MK*F:@>O?*BF>EQ*?AD*4>MO7?AME*!$4*KMLBCD* W "# ;* ## X YDF*MN*?AD*NMCCMY>EQ*EM*<:>L*Q@MBOF*>E*ADC>:D F;*6D?<* FADD?F]F?@LBDF T !MF>?>=DC7*:A<@QDL*EM*<:>LFT*C7F>ED;*<@Q>E>ED*F?>L>ED )DQ= DC7*:A<@QDL*EM*<:>LFT*EM*<:>LFT QCB?ED;*ED;*FD@>ED;*?A@DME>ED;* ED 37L@MOAM6>:*EM*<:>LFT ED;*=ED;*C7F>ED;*>FMCDB:>ED;*KD?A>ME>ED;* OADE7CED;*?7@MF>ED;*DF*NM@*O@MC> ED*ED*YD@D*:EL>=>LBDF*MN* @>Q>L>?7*6>C>?7*@DFOD:?>=DC7*?M*O@M?D>E*F?@B:?B@DF9* 0E>LDE?>N>DL*EM*<:>LF*WFB:A*QEM@DL* :MKOCD?DC7*N@MK*?AD*:MEF;* LD*EM*>ENM@KME*<6MB?*?AD* N@DPBDE:>DF*MN*?AD*Q@MBOF*LDN>EDL*<6M=D9 $>FBCN>LD*6MEL*@>:AEDFFT $>FBCN>LD*6MEL*@>:AEDFF*YEDL*FBCN>LD*6MELF*OD@*@DF>LBD*>E*?AD*O@M?DMKD9*'YM*:7F?D>ED* @DF>LBDF YD@D* :MEF>LD@DL*?M*NM@K*<*L> FBCN>LD*6MEL*>N*?AD>@*FBCNB@*F?O?*BF>EQ*4>M9!$4 W "# ;* ## X YF NM@* O@DL>:?DL*F?@B:?B@DF 9*'AD*EBK6D@*MN*L>FBCN>LD*6MELF*Y=DE*M@QFK*=>LDL*67*?AD*EBK6D@*MN*@DF>LBDF*>E*?AD*KMLDC9* 'ADFD*N@<:?>MEF*YD@D*?ADE*<=D@EF*NM@*?AD*M@QFK9*

PAGE 18

* I` 5LQD*@>:AEDFFT 5LQD*@>:AEDFF*YEDL*LQDF*OD@*@DF>LBD*>E* D<:A O@M?D> E9*2D* BFDL*?AD*O@MQ@ME*I9V9` X W #' X ?M*:LQDF*>E*D <:A*KMLDC9*'AD*:MKK type pdb package require saltbr saltbr sel [atomselect top all] upsel yes frames 0:1:0 ondist 4.0 comdist none writefiles no 'AD* EBK6D@* MN*FLQDF* Y=>LDL*67*?AD*EBK6D@*MN*@DF>LBDF*>E*?AD*KMLDC ; MEF*:EF OD@*1(8$*%$ 9 /<:A>ED*CD<@E>EQ*>E?DQ@ME*MN*NDED*KOCDKDE?DL*67*F:>S>? Z CD<@E* W=D@F>ME*H9IJ9H X* W #) X ?M* :MK6>ED*?AD*ND:?*(1'9* 2D* >KOCDKDE?DL*5BOOM@?*_D:?M@*"DQ@DFF>ME* N>:ME;*BF>EQ*<*EDF?DL*:@MFF Z NMCL*=LME*F*Y@D*LEQ* ?M*?DKOD@QA*?DKOD@DL*?ADFD*KD?AMLF*?M*<*FB6FE>EQ*<* @FKF*N@MK*D<:A*IH M &*6>E*N@MK*H Z IHH M &9 $D?<>CDL*KD?AMLF*:E*\*-*WFBOOCDKDE?<@7*KD?AMLFX9 /D?:*L\*4X*C>F?F*?AD*<::DFF>ME* >ENM@KME*ME*EBK6D@F*YD@D*BFDL*?M*LMYECM=D W $! X ;*BF>EQ*?AD*NMCCMY>EQ*O@DND?:A* W=D@F>ME*G9[9HX* ME* G9[9HX* :MKK?* W $" X T* prefetch fastq dump -gzip -skip technical -readids -dumpbase -split files -clip

PAGE 19

* Ia -EEM?EQ*QDEDF*>E*?AD*@DCDF*?M*NCDF* YD@D*BFDLX9*+@ o < scratch file> w 0 r < train dir> t illumina_5 p 8 m 12288 e 1 d 0 'AD*O@M?D>E*FDPBDE:D*N>CDF*O@MLB:DL*67*+ @ME*MN*%_.2"#8*ME*KD?:*@DMEF* ?A*>?9*2D*:MK6>EDL*?AD*@DMEF*NM@*<*Q>=DE*FME*MN*EM*<:>LF*?ALDE?>N> DL* EM*<:>LF*YD@D*>QEM@DL*:MKOCD?DC7*N@MK*?AD*:ME9 &ME*MN*?AD*3'!!*N@MK*KD?:*@DEDL*?AD*@DME F NM@*<*Q>=DE*FQA*'K*O@M?D>EF*YEQ*?AD*F
PAGE 20

* IJ '2"3%-&!444 &-$5,%$!").!.4$'5$$4*) 2D* :B@END@@DL* O@M?DMKDF*?A*MO?>KBK*Q@MY?A*?DKOD@C<6CD*>E*?AD*1DEMKD*(EC>ED*$?M@>DF9 2D*?ADE*BFDL*?A>F*LED*LMEF*O@D=>MBFC7*NMBEL*>E*?AD*C>?D@MBF*QDEMK>: :*NDME*MN*%_.2"#8*EM*<:>LF*>E*?AD*O@M?DMKDT fDCLM=>:A*D?*ME*MN*%_.2"#8*EM*<:>LF*>E*?AD*O@M?DMKD*MN*FK*>F* :M@@DC?A*>?F*MO?>K BK*Q@MY?A*?DKOD@ME*MN*EM*<:>LF*?AC<6CD*)&4%* O@M?DMKDF*NM@*D<:A*M@QFK9 %?*YME*MN*%_.2"#8*FAMYDL*ME*Y>?A*(1' W@*l*H9RJX ;*Y>?A*<*F?@MEQD@*:M@@DCME*NM@*?AD*LE?F*<6M=D*aH M &* W@*l*H9[aX*W+>QB@D*G X 9 * !"#"$%&' $%() $%*+ $%*' $%*, $+' '$&' -$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?"@ABCDEF < : 16789:!O 19>P=6<@!A6@P6FD!6@!=E:!K9<=:
PAGE 21

* Ie * * %E*?AD>@*M@>Q>E:ME;* fDCLM=>:A*D?*ME* 6D?YDDE*(1'*6CD*:MK6>EMEF*MN*EM*<:>LF*EME*AME9* 2D*NMBEL*?A>F*?M*6D*MED*MN*MB@* F?@MEQDF?*>EL>=>LB:?M@F*MN*(1'*BF>EQ*:*LMOA7F>:ME*YLDL*NM@*?A>F;*>?*:EM*<:>LF* :ME?<>EF*?AD*C<@QDF?*A7L@MOAM6>:*EM*<:>LF*W%;*_;*.;* !"#"$%&' $%() $%*$ $%*' &$'$ ($*$ +,-./0/"1!23-4"56/,6!7-0!6" 2 8" 9!7:-.2;"2<"=>?@ABC !"#"$%&' $%()* $%'$$ $%'+* $%'*$ $%')* '$ ,$ &$ -$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?"@ABCDEF 6 : 16789:! O 19>P=6<@!A6@P6FD!6@!=E:!K9<=:ME*FAMYF*<*F?@MEQ*:M@@DCME*?M*(1'*M=D@*?AD*DE?>@D*?DKOD@ME*>F*YD
PAGE 22

* IR 2;*8X*?>=DC7*:A<@Q DL*W"X*=DC7*:A<@QDL*W# X*EM*<:>LF9 %?*>F*OMFF>6CD*?AF*:MK6>EME*K<\>K>cDF*6M?A*A7L@MOAM6>: >E?D@<:?>MEF E?D@<:?>MEF >E*?AD*O@M?DMKD*QAD@*?DKOD@EQ*?M*>E:@DC>?79* 3MYD=D@;*NB?B@D*F?BL>DF*:MBCL*MEF*6D?YDDE* (1'*EM*<:>LF*:MK6>EMEF*WFB:A*EM*<:>LFT*",3$#X ;* YA>:A*:MBCL*A<=D*<*F?@MEQD@*6>MOA7F>:F9* 1&*:ME?DE?*MN*Ie5*@")-*F?DKF;*J5*@")-*D@*ME*MN*1&*6E*?AD*ADC>:MEF*MN*Ie5*@")-*F?DKF;*J5*@")-* E*FK*>F*:M@@DC?A* MO?>KBK*Q@MY?A*?DKOD@@ F* <@D* 6MELDL* Y>?A*`*A7L@MQDE*6MELF;* YA>CD*-'*6@ F*<@D* C>ESDL*Y>?A*G* A7L@MQDE*6MELF 9*'AD@DNM@D;*E:@DME*:MBCL*:MEND@*?>MEC>?7*?M*")-*FD:MEL<@7*F?@B:?B@DF*QAD@*?DKOD@<*KM@D*A7L@MQDE* 6MELF*OD@*6@ 9*3MYD=D@;*YADE*Y D*:MEF*NM@*?AD*M@Q FKF* >E*MB@*LME*NM@*E;*?ADFD*:M@@DCMEF*YD@D*F?@MEQD@*QAD@*?DKOD@F?DE?*Y>?A*?AD*>LD<*?A?>MEC>?7*>F*MEC7* FDCD:?DL*NM@*Y>?A*>E:@DF*C>??CD*FDCD:?>ME*O@DFFB@D*?M* K<>E?<>E*NDYD@*A7L@MQDE*6MELF*>E*")-*FD:MEL<@7*F?@B:?B@D*6DCMY*<*:D@?<>E*Q@MY?A* ?DKOD@D@*FKF;*YA>CD*MB@*LFKF;*YA>:A*:MBCL* <::MBE?*NM@*?AD*L>F:@DODF*>E*?AD*@DFBC?F*MN*?AD*?YM*F?BL>DF9

PAGE 23

* I[ * $ >ODO?>LD*N@DPBDE:7T ,B*D?*NND@DE?* L>ODO?>LDF* >E*<*O@M?D>E* :MBCL*FD@=D*?F*KDC?>EQ*?DKOD@F* F?BL7;*,B*D?*EDL*?AD*N@DPBDE:7*MN*M::B@@DE:D*MN*EMEF*MN* L>ODO?>LDF*>E*<* FD?* MN* Ie* A>QA*'K*O@M?D>EF*E*<* FD?* MN*IV CMY*'K*O@M?D>EF9*4@*M::B@@DE:D*>E*D<:A*FD?;*D<:A*L>ODO?>LD*YQEDL*<*YD>QA?9*47*:EQ* !"#"$%&' $%($ $%() $%*$ $%*) $%+$ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%(? $%() $%*$ $%*) $%+$ &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&( $%)$ $%)) $%?$ $%?) $%($ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%() $%)) $%?$ $%?) $%($ $%() &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%,& $%& $%) $%? $%( $%* $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%), $%) $%? $%( $%* &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&' $%($ $%() $%*$ $%*) $%+$ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%(? $%() $%*$ $%*) $%+$ &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&( $%)$ $%)) $%?$ $%?) $%($ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%() $%)) $%?$ $%?) $%($ $%() &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%,& $%& $%) $%? $%( $%* $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%), $%) $%? $%( $%* &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&' $%($ $%() $%*$ $%*) $%+$ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%(? $%() $%*$ $%*) $%+$ &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&( $%)$ $%)) $%?$ $%?) $%($ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%() $%)) $%?$ $%?) $%($ $%() &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%,& $%& $%) $%? $%( $%* $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%), $%) $%? $%( $%* &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&' $%($ $%() $%*$ $%*) $%+$ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%(? $%() $%*$ $%*) $%+$ &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&( $%)$ $%)) $%?$ $%?) $%($ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%() $%)) $%?$ $%?) $%($ $%() &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%,& $%& $%) $%? $%( $%* $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%), $%) $%? $%( $%* &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: < 6 L : !"#"$%&' $%($ $%() $%*$ $%*) $%+$ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%(? $%() $%*$ $%*) $%+$ &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&( $%)$ $%)) $%?$ $%?) $%($ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%() $%)) $%?$ $%?) $%($ $%() &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%,& $%& $%) $%? $%( $%* $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%), $%) $%? $%( $%* &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&' $%($ $%() $%*$ $%*) $%+$ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%(? $%() $%*$ $%*) $%+$ &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%&( $%)$ $%)) $%?$ $%?) $%($ $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%() $%)) $%?$ $%?) $%($ $%() &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%,& $%& $%) $%? $%( $%* $,) )$() '$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: !"#"$%), $%) $%? $%( $%* &$?$ *$'$$ -./0121"3!45/6"781.8!9/2!8" 4 :" ;!9"3: D N 16789:! T '<99:?>=6<@!B:=L::@!/'!P<@=:@=!@F!*/%J! WF* YD?A*(1'*<:@MFF*?AD*DE?>@D*@F*:M@@DCME*>F*KB:A*F?@MEQD@* ?A*@*l* H9Re9*W:X*'AD*1&*:ME?DE?*MN*J5*@")-*>F*:M@@DC?A*(1'*<:@MFF*?AD*DE?>@D*@F*:M@@DCME*>F*KB:A*F?@MEQD@*F* =D @7 YD?A*(1'*YADE* ?AD*DE?>@D*@F*:MEF>LD@DL W@*l*H9GaX 9*WLX*'AD*:M@@DCME*>F*F?@MEQD@* YADE*MEC7*?DKOD@LD@DL W@*l*H9JGX 9

PAGE 24

* IV ?AD* N@DPBDE:7*MN*M::B@@DE:D*MN*D<:A*L>ODO?>LD*>E*<*O@M?D>E*Y>?A*BESEMYE*'K;*?AD7* YD@D*?ADE*<6CD*?M*:CN7*?AD*O@M?D>E*QA*'K*M@*CMY*'K ;*Y>?A*IHHr*<::B@<:7 9* 'AD*OD@:DE?QA*'K*O@M?D>EF*>E*<*O@M?DMKD* W3'!!X* Y:?* YAD?AD@* <* O@MS<@7 M?D*YC>:*M@*?AD@KMOA>C>:9* +B@?AD@KM @ D;*>E*?*YEQ*?DKOD@:*DEc7KDF9*'A>F*FBQQDF?F*?AEF*>E*<*O@M?DMKD ;*>N*OMFF>6CD*?M*>END@*N@MK*O@>K<@7*FDPBDE:D*NM@*D=DE*<* NDY*SD7*O@M?D>EF; : MBCL*FD@=D*EL>:KBK*Q@MY?A*?DKOD@KOCDKDE?DL*?AD*FME ED*?AD* :M@@DCME*6D?YDDE*?AD*3'!!*MN*<*O@M?DMKD*DL*?M* MB@*C<@QD@ ; KMLD@E* LME*YME*YQAD@*?DKOD@EQ*?AF*YMBCL*6D*<*QMML* >EL>:QAD@*?DKOD@QB@D*aX 9 * < !"#"$%&' $%& $%( $%) $%* $%+ $,) )$+) -$$ ./0" 1 2" 3044 16789:!U!'<99:?>=6<@!B:=L::@!=E:!267E!%A!39<=:6@!3:9P:@=>7:!>@F!*/%J

PAGE 25

* GH * * 6 : 16789:! U '<99:?>=6<@!B:=L::@!=E:!267E!%A!39<=:6@!3:9P:@=>7:!>@F!*/%J! WME*6D?YDDE*3'!!*F*YD@D*@F*:MEF>LD@DL9*'A>F*:MBCL*6D*6D:E?F*6DCMY*aH M &;* YA >:A*FAMY*<*=D@7*YDME*YADE*:MEF>LD@DL*FDO<@E?F*<6M=D*aH M &*FAMY*<*KB:A*F?@MEQD@*:M@@DCME;*Y>?A*@*lH9eI9 !"#"$%&& $%' $%( $%) $%* &$+$ '$($ ,-." / 0" 1.22 !"#"$%&' $%( $%) $%* $%& $%+ )$ &$ ,$ '$$ -./" 0 1" 2/33

PAGE 26

* GI 'AD*O@M?D>EF*BFDL*67*,B*D?*E*?AD>@*F*:MENM@KDL*?M*FOD:>N>:* :A<@<:?D@>F?>:F;*FB:A*EQ*<*F>EQCD*?@?>ME*F?EQ*?AD@KME9* +B@?AD@ KM@D ;*?AD7*BFDL*@>:=DL*NM@KBC<*6F*L:?*?AD*'K*MN*EF9*2D*BFDL*?AD*FE*MB@*F9* 3MYD=D@;*?AD*O@M?D>EF*>E*MB@*LC7*:MENM@K*?M*?AD*FF?>:F*E*?AD* M@>Q>EF*:MBCL*D\OC<>E*YA7*?AD*:M@@DCME* N<>CF*?M*AMCL*BO*EF?*<*KM@D*L>=D@FD*L?A*L>NND@DE?*O@M?D>EF*MN*>E?D@DF? 9 !@DL>:?>ME*MN*O@M?D>E*F?@B:?B@DFT /:*ND@DL* O@M?D>E*F? @B:?B@DF9*2D*BFDL* /($!%!# E"'F ?M*KMLDC*F?@B:?B@DF*NM@*?DE*O@M?D>EF*>E*E*MB@* LEF <@D* >E*<*C<@QD@*FD?*MN* F>EQCD*:MO7*O@M?D>EF*NMBEL*>E* VJ r*MN*EF* >E*?A>F*FD?*?AQADF?* EBK6D@*MN* AMKMCMQF*LDOM F>?DL*E* F?@B:?B@DF >E*?AD*!@M?D>E*$EF <@D*A>QAC7* :MEFD@=DL*LD@DL*?M*6D*KOM@?EF*NMBEL*>E* <*?7O>:?<6CD* :LEQ*MEF*?M*Q@MY?A*NND@DE?* ?DKOD@LD*@F?F*?AD*EEF9 %>B?:! O ,6D=!@F!%4/&1"0!4.DJ! s'AD*!+-/*%$*>F*C>F?DL*NM@* !1, C<6CD*NM@*>?9 !"('#%)*)-/# '%1"+-/*%$ FD@>ED* ?")Z C>QEE*?@? '%1"HHVe` A>F?>L>ED*?")Z C>Q? '%1"HHee`* O@M?D>E*"D:** '%1"HGHIG* F>QE?>ME*O<@?>:CD Z LM:S>EQ*O@M?D>E '%1"HHHea ?7@MF>ED*?")Z C>Q@D:?DL*")-*OMC7KD@? '%1"HG`[e OAMFOAMQC7:D@E
PAGE 27

* GG /EF* YD*KMLDCDL* <@D*@DPB>@DL*NM@*$)-* @DOC>:ME]K<>E?DEF;*YA>:A*<@D*:@B:>MEF*>E*NND@DE?*?")-*C>QF?DL9*'ADFD*?A@DD*C>QQLDE?>:=D@F>?7*>E*?AD*FD?*MN* ?DE* O@M?D>EF 9*'7@*?")-*C>QNND@DE?*F?@B:?B@D*?AF*?")-*C>QQK>C<@*:M@D*LMK<>E*F?@ B:?B@D9* 'ABF;*YD* :AMFD*?M*BFD Q:*>EL>:EM*<:>L*N@DPBDE:>DF*>E*FD:MEL<@7*F?@B:?B@DF LBDF T 5?BL>DF*:MELB:?DL* 67*&CCD* W $ X ;*/D?OEM*<:>LF*N@DPBDE:>DF;*O<@?>:BC<@C7*>E* FD:MEL<@7*F?@B:?B@DF*MN*O@M?D>EF;*L>NND@*6D?YDDE*KDFMOA>C>:*C>:* O@MS<@7M?DF9 +M@*D\ CCD W $ X NMBEL*E:@DE*87F*E*1CE*>E*?AD@KMOA>C>:*O@M?D>EF;*YADE*:MKO<@DL*?M*KDFMOA>C>:* O@M?D>EF9 2D*D\EDL*?AD*N@DPBDE:7*MN*OMF>?>=DC7*:A<@QDL* =DC7*:A<@QDL* EM*<:>LF*>E*ADC>:DF;*6D?<*FADD?F]F?@LBDF*YE:@DE*ADC>:DF*Y>?A* E:@DE* (1' W+>QB@D*J*<*F*FBQQDF?F*?A:*F?<6>C>cME*MN*ADC>:DF*>F*MED*KMLD*MN* ?AD@KMF?<6>C>?7*EF QA*?DKOD@E*OMC<@*@DF>LBDF*>E*?A* >E:@DQB@D* J:;*e:*K>C<@*?M*?AD*@DFBC?F*NMBEL*>E*?AD*?YM*O@D=>MBF*F?BL>DF W $ ;* % X 9 2D*D\EDL*?AD* N@DPBDE:7*MN*QC7:>ED*ED*@DF>LBDF FDO<@ED* :MEND@F* >E:@DME6>C>?7*YA>CD*O@MC>ED* :MEND@F* >E:@DQ>L>?7 9*/D?OED*@DF>LBDF*<@D*O@DND@@DL*>E*OF7:A@MOA>C>:*

PAGE 28

* G` O@M?D>EF;*ED*@DF>LBDF*<@D*<=M>LDL;*< F*:MKO<@DL*?M*KDFMOA>C>:*O@M?D>EF9* 3MYD=D@;*>E*MB@*S>EQ*?@DELF* @DCEQ QC7>E:D*M@*O@MC>ED* N@DPBDE:>DF* >E*ED*>E*ADC>:DFX9* 'A>F*:MBCL*FBQQDF?*?ALBDF*KOM@?E*LD?D@K>E>EQ*?AD*F? <6>C>?7]NCD\>6>C>?7*MN*O@M?D>EF M@*?AFKF* BFD*?ADFD*MEF* >E*:MK6>EME*Y>?A*M?AD@*KM@D*N@DPB DE?*MEF 9 * !"#"$%&' $%()* $%(*$ $%(+* $%)$$ $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3>/./?7@A" ";58!B7C"!7>/C17> !"#"$%&) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="<7B8./?7@A" ";58!B7C"!7>/C17> !"#"D$%E) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3@8!"!7>/C17> !"#"D$%$$& $%E$ $%E* $%*$ $%** $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53F/;"!7>/C17> !"#"D$%&E $%$) $%$E $%$G $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "B@A;/<7"!7>/C17> !"#"D$%($ $%$( $%$) $%$& $%$E $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-!3@/<7"!7>/C17> !"#"$%&' $%()* $%(*$ $%(+* $%)$$ $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3>/./?7@A" ";58!B7C"!7>/C17> !"#"$%&) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="<7B8./?7@A" ";58!B7C"!7>/C17> !"#"D$%E) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3@8!"!7>/C17> !"#"D$%$$& $%E$ $%E* $%*$ $%** $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53F/;"!7>/C17> !"#"D$%&E $%$) $%$E $%$G $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "B@A;/<7"!7>/C17> !"#"D$%($ $%$( $%$) $%$& $%$E $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-!3@/<7"!7>/C17> !"#"$%&' $%()* $%(*$ $%(+* $%)$$ $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3>/./?7@A" ";58!B7C"!7>/C17> !"#"$%&) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="<7B8./?7@A" ";58!B7C"!7>/C17> !"#"D$%E) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3@8!"!7>/C17> !"#"D$%$$& $%E$ $%E* $%*$ $%** $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53F/;"!7>/C17> !"#"D$%&E $%$) $%$E $%$G $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "B@A;/<7"!7>/C17> !"#"D$%($ $%$( $%$) $%$& $%$E $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-!3@/<7"!7>/C17> !"#"$%&' $%()* $%(*$ $%(+* $%)$$ $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3>/./?7@A" ";58!B7C"!7>/C17> !"#"$%&) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="<7B8./?7@A" ";58!B7C"!7>/C17> !"#"D$%E) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3@8!"!7>/C17> !"#"D$%$$& $%E$ $%E* $%*$ $%** $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53F/;"!7>/C17> !"#"D$%&E $%$) $%$E $%$G $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "B@A;/<7"!7>/C17> !"#"D$%($ $%$( $%$) $%$& $%$E $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-!3@/<7"!7>/C17> !"#"$%&' $%()* $%(*$ $%(+* $%)$$ $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3>/./?7@A" ";58!B7C"!7>/C17> !"#"$%&) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="<7B8./?7@A" ";58!B7C"!7>/C17> !"#"D$%E) $%($ $%(* $%)$ $%)* $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3@8!"!7>/C17> !"#"D$%$$& $%E$ $%E* $%*$ $%** $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53F/;"!7>/C17> !"#"D$%&E $%$) $%$E $%$G $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "B@A;/<7"!7>/C17> !"#"D$%($ $%$( $%$) $%$& $%$E $)* *$+* ($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-!3@/<7"!7>/C17> 6 < 6 : L N D < 16789:! V '<99:?>=6<@!B:=L::@!C9>P=6<@!A6@P6F!79<8KD!6@!E:?6P:D!>@F! */%J! 'AD*N@<:?>ME*MN*OMF>?>=DC7*:A<@QDL*W=DC7*:A<@QDL*W6X*EM*<:>LF* >E:@D?A* >E:@DEQ*(1' 9 W:X*'AD*N@<:?>ME*MN*OMC<@*EM*LD:@D?A* >E:@DEQ*(1'9*WLX*'AD@D*>F*EM*F?@MEQ*?@DEL*6D?YDDE*?AD*N@<:?>ME*MN*A7L@MOAM6>:* EM*<:>LF*ME*MN*QC7:>ED*@DF>LBDF*LD:@D?A*>E:@DEQ* (1' 9 WNX*)M*?@DEL*>F*M6FD@=D L*NM@*?AD*N@<:?>ME*MN*O@MC>ED*@DF>LBDF9

PAGE 29

* Ga * !"#"$%&' $%$( $%$) $%*+ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?"/5@101A9BC" "=7:!D9E"!9@1E39@ !"#"$%$+ $%$F $%$) $%*$ $%*+ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?">9D:01A9BC" "=7:!D9E"!9@1E39@ !"#"G$%,' $%$, $%*$ $%*, $%+$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "/5B:!"!9@1E39@ !"#"$%(+ $%,$ $%,, $%F$ $%F, $%-$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "7CE!5/75H1="!9@1E39@ !"#"G$%$$%$+ $%$( $%$F $%$) $%*$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "DBC=1>9"!9@1E39@ !"#"G$%** $%$$ $%$+ $%$( $%$F $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "/!5B1>9"!9@1E39@ !"#"$%&' $%$( $%$) $%*+ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?"/5@101A9BC" "=7:!D9E"!9@1E39@ !"#"$%$+ $%$F $%$) $%*$ $%*+ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?">9D:01A9BC" "=7:!D9E"!9@1E39@ !"#"G$%,' $%$, $%*$ $%*, $%+$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "/5B:!"!9@1E39@ !"#"$%(+ $%,$ $%,, $%F$ $%F, $%-$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "7CE!5/75H1="!9@1E39@ !"#"G$%$$%$+ $%$( $%$F $%$) $%*$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "DBC=1>9"!9@1E39@ !"#"G$%** $%$$ $%$+ $%$( $%$F $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "/!5B1>9"!9@1E39@ !"#"$%&' $%$( $%$) $%*+ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?"/5@101A9BC" "=7:!D9E"!9@1E39@ !"#"$%$+ $%$F $%$) $%*$ $%*+ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?">9D:01A9BC" "=7:!D9E"!9@1E39@ !"#"G$%,' $%$, $%*$ $%*, $%+$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "/5B:!"!9@1E39@ !"#"$%(+ $%,$ $%,, $%F$ $%F, $%-$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "7CE!5/75H1="!9@1E39@ !"#"G$%$$%$+ $%$( $%$F $%$) $%*$ $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "DBC=1>9"!9@1E39@ !"#"G$%** $%$$ $%$+ $%$( $%$F $+, ,$-, *$$ ./01232"4!5607"892/9!:03!9" 5 ;" "5?" "/!5B1>9"!9@1E39@ : L D N 16789:! W %E:!C9>P=6<@!A6@P6F!79<8KD!6@!B:=>!DE::=D!SDJ!*/%J! 'AD* N@<:?>ME*MN*OMF>?>=DC7*:A<@QDL*EM*<:>LF*WE:@D?A* (1'*YA>CD* EDQ=DC7*:A<@QDL*EM*<:>LF*W6X*FAMY*EM*?@DEL9*'AD*N@<:?>ME*MN*OMC<@*EM* <:>LF*W:X*LD:@D?A*>E:@DEQ*(1'*:*EM*<:>LF* WLX*>E:@D?A*>E:@DEQ*(1'9*'AD*N@<:?>ME*MN*QC7:>ED*WDX*ED*WNX* @DF>LBDF*FAMY EM*?@DEL*Y>?A*(1'9 < 6

PAGE 30

* GJ * &CCD* W $ X NND@DE:DF Y>?A*L>NND@>EQ*(1' >E*?AD* EM*<:>L*N@DPBDE:>DF*MN*FB@N<:D*@DF>LBDF]D\OMFDL*@DF>LBDF9*'AD7*NMBEL*E:@DE*?AD*Y6CD*FB@N<:D*MN*:A<@QDL*EM*<:>LF*EM*<:>LF*>E*A7OD@?AD@KMOA>CDF;*YADE*:MKO<@DL*?M*KDFMOA>CDF9* !"#"$%&' $%'$ $%'( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;"+1<-,-=5>?" "936!@5A"!5<-A/5< !"#"B$%$( $%'( $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;":5@6,-=5>?" "936!@5A"!5<-A/5< !"#"B$%'( $%'( $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "+1>6!"!5<-A/5< !"#"$%'$ $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "3?A!1+31D-9"!5<-A/5< !"#"B$%$C $%'$ $%'( $%&$ $%&( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "@>?9-:5"!5<-A/5< !"#"B$%$' $%$&( $%$($ $%$)( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "+!1>-:5"!5<-A/5< !"#"$%&' $%'$ $%'( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;"+1<-,-=5>?" "936!@5A"!5<-A/5< !"#"B$%$( $%'( $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;":5@6,-=5>?" "936!@5A"!5<-A/5< !"#"B$%'( $%'( $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "+1>6!"!5<-A/5< !"#"$%'$ $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "3?A!1+31D-9"!5<-A/5< !"#"B$%$C $%'$ $%'( $%&$ $%&( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "@>?9-:5"!5<-A/5< !"#"B$%$' $%$&( $%$($ $%$)( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "+!1>-:5"!5<-A/5< : L D N !"#"$%&' $%'$ $%'( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;"+1<-,-=5>?" "936!@5A"!5<-A/5< !"#"B$%$( $%'( $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;":5@6,-=5>?" "936!@5A"!5<-A/5< !"#"B$%'( $%'( $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "+1>6!"!5<-A/5< !"#"$%'$ $%&$ $%&( $%C$ $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "3?A!1+31D-9"!5<-A/5< !"#"B$%$C $%'$ $%'( $%&$ $%&( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "@>?9-:5"!5<-A/5< !"#"B$%$' $%$&( $%$($ $%$)( $&( ($)( '$$ *+,-./."0!12,3"45.+5!6,/!5" 1 7" 8!69,-1:"1;" "+!1>-:5"!5<-A/5< < 6 16789:! X 19>P=6<@!A6@P6F!79<8KD!6@!?<F*M6FD@=DL* NM@*?AD*N@<:?>MEF*MN*OMF>?>=DC7*:A<@QDL*EM*<:>LF*W=DC7*:A<@QDL*EM* <:>LF*W6X;*OMC<@*EM*<:>LF*W:X;*A7L@MOAM6>:*EM*<:>LF*WLX;*QC7:>ED*@DF>LBDF*WDX* M@*O@MC>ED*@DF>LBDF*WNX9*

PAGE 31

* Ge &AE:@DE*:A<@QDL*@DF>LBDF*E*?AD@KMOA>C>:*O@M?D>EF9 2D*M6FD@=DL*F>K>C<@*?@DELF9*'AD*N@DPBDE:7*MN* OMF>?>=DC7*:A<@QDL*=DC7*:A<@QDL*EM*<:>LF*:MEF?>?B?>EQ*D\OMFDL* @DF>LBDF*>E:@D?A*(1' ;*YA>CD*?AD*N@DPBDE:7*MN*OMC<@*@DF>LBDF*LD:@D:*@DF>LBDF*L>L*EM?*FAMY*?A*?DKOD@E* ?AD*FD:MEL<@7*F?@B:?B@DF*MN*O@M?D>EF;*D\:DO?*>E*6D?<*FADD?F*YAD@D*?AD7*E:@D?A*?DKOD@F>EQC7;*?AD7*L*EM?*FAMY*E*?AD* N@<:?>ME*MN*D\OMFDL*@DF>LBDF9* %?*AC>:*O@M?D>EF*FAMY* Q@DEQ;*YA>:A*:MBCL*<::MBE?*NM@*?AD>@*>E:@DC>?7 W $' ;* $) X 9* 'A>F*>E:@DEQ*YMBCL CD:*DNND:?;*YA>:A*YD* D\OD:?DL*?M*FDD*E*?AD*N@<:?>ME*MN*D\OMFDL*A7L@MOAM6>:*@DF>LBDF W $) X 9* G9/D80/>7-B8:8-=/19:7@ 'AD*N>QB@DF*NM@*OMF>?>=DC7*:A<@QDL*=DC7* :A<@QDL*@DF>LBDF*>E*ADC>:DF*MEF;*=DC7*:A<@QDL*@DF>LBDF*>E* CMMOF*FAMYDL*<*FD?*MN*LE?F*?AE*?AD*OCM?F W+>QB@D*J<;*J6;*R6;*[<;*[6X 9*2D*?@<:DL*?ADFD*OM>E?F*6<:S* ?M*?AD>@*1(8$*%$F*FKF*KEQ*BO*?ADFD*OM>E?F* YD@D*AC>:*M@QFKF9*3CDF*C>=>EQ*>E*A>QA*FMEF*D\OD@ >DE:D* L>NND@DE?*FDCD:?>=D*O@DFFB@DF*ME*?AD>@*O@M?D>EF;*FKF*A<=D*E?<>E*L>NND@DE?*DCD:?@MF?:*O@MOD@?>DF*>E*?AD>@* O@M?D>EF*?AC>cD*?ADK*>E*A>QAC7*FED*DE=>@MEKDE?F W $* X 9* +M@*D\?*ACDF;*FB:A* D 7 QA*FEk*KD:AFK*MN*ME9*'ADFD*M@QFKF*<::BKBCQA* :ME:DE?@ME*MN*FE*?AD>@*:DCCF*@*O@ M?D>EF* YD@D*NMBEL*?M*6D*DE@>:ADL >E* <:>L>:*EM*<:>LF;*:*EM*<:>LF W $& X 9* 'A>F*?@DEL*>F*E*MB@*@DFBC?F;* E?F9

PAGE 32

* GR * $>FBCN>LD*6MEL*@>:AEDFFT 2D*:FBCN>LD*6MELF*>E*?AD* FD?*MN*?DE*O@M?D>EF*NM@*D<:A*M@QFK9*'A>F*EBK6D@*YcDL*67*O@M?D>E*:A<>E* CDEQ?A;*EF* A<=D*KM@D*MOOM@?BE>?7*?M*NM@K L>FBCN>LD*6MELF*? AEF9*$>FBCN>LD*6MELF*<@D*F?@MEQ*:M=C>cD*O@M?D>E F?@B:?B@DF9*4DD67*D?*CDF*AME*MN* L>FBCN>LD*6MELF 9* 4DD67*D?*FBCN>LD*6MELF* 6 < !"#"$%&' $%() $%(* $%)$ $%)& $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3>/./?7@A";58!B7C"!7>/C17> !"#"$%)D $%(+ $%)$ $%)+ $%E$ $%E+ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "<7B8./?7@A";58!B7C"!7>/C17> !"#"F$%&$ $%($ $%(+ $%)$ $%)+ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3@8!"!7>/C17> !"#"F$%$G $%&$ $%&+ $%+$ $%++ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53H/;"!7>/C17> !"#"$%&' $%() $%(* $%)$ $%)& $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3>/./?7@A";58!B7C"!7>/C17> !"#"$%)D $%(+ $%)$ $%)+ $%E$ $%E+ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "<7B8./?7@A";58!B7C"!7>/C17> !"#"F$%&$ $%($ $%(+ $%)$ $%)+ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3@8!"!7>/C17> !"#"F$%$G $%&$ $%&+ $%+$ $%++ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53H/;"!7>/C17> !"#"$%&' $%() $%(* $%)$ $%)& $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3>/./?7@A";58!B7C"!7>/C17> !"#"$%)D $%(+ $%)$ $%)+ $%E$ $%E+ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "<7B8./?7@A";58!B7C"!7>/C17> !"#"F$%&$ $%($ $%(+ $%)$ $%)+ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3@8!"!7>/C17> !"#"F$%$G $%&$ $%&+ $%+$ $%++ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53H/;"!7>/C17> !"#"$%&' $%() $%(* $%)$ $%)& $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "-3>/./?7@A";58!B7C"!7>/C17> !"#"$%)D $%(+ $%)$ $%)+ $%E$ $%E+ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "<7B8./?7@A";58!B7C"!7>/C17> !"#"F$%&$ $%($ $%(+ $%)$ $%)+ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3="-3@8!"!7>/C17> !"#"F$%$G $%&$ $%&+ $%+$ $%++ $)++$ '+($$ ,-./010"2!34.5"670-7!8.1!7" 3 9" :!8;./3<"3=" "5AC!3-53H/;"!7>/C17> L L : 16789:! Y 19>P=6<@!P:!>A6@P6F!79<8KD!SDJ!*/%J! 'AD* N@<:?>ME*MN*D\OMFDL*OMF>?>=DC7*:A<@QDL*W=DC7*:A<@QDL*W6X*EM* <:>LF*>E:@D?A*>E:@DEQ*(1';*YA>CD*?AD*N@<:?>ME*MN*OMC<@*@DF>LBDF*W:X* LD:@DME*MN*D\OMFDL*A7L@MOAM6>:*@DF>LBDF WLX FAMYF*EM*?@DEL* Y>?A*(1'9

PAGE 33

* G[ :MBCL* D=DE* 6D*NMBEL*>E*>E?@<:DCCBC<@*O@M?D>EF;*3MYD=D@;*YD*NMBEL*EM*?@DEL*>E* L>FBCN>LD* 6MEL*@>:AEDFF*KBK*Q@MY?A*?DKOD@F*:MBCL*6D*6D:NND@DE?*KD?@>:F*BFDL*NM@*LDN>E>EQ*L>FBCN>LD*6MEL*@>:AEDFF9*4DD67*D?*K>?7*MN*&7F Z & 7F*@DF>LDF*YLD@DL*?A>F*?M*6D*EL>:ME*MN*>E:@DFBCN>LD*6MEL>EQ9 UM@L<*D?*ME*MN* :7F?D>ED*@DF>LBDF*>E*<*O@M ?D>E*?AFBCN>LD*6MELDL9* 16789:! \ '<99:?>=6<@!B:=L::@!F6D8?C6F:!B<@F!96PE@:DD!>@F!*/%J 'AD*EBK6D@*MN* L>FBCN>LD*6MELF*OD@*@DF>LBD*>E*?AD*O@M?DMKD*MN*FK* FAMYF*EM*:M@@DCME* Y>?A*?AD*M@QFKkF*(1'9 5LQD*@>:AEDFFT &AC>:*O@M?D>EF*NM@K*KM@D*FLQDF*?A@*KDFMOA>C>:*:MBE?D@O<@?F9* 'AD>@*F?BL7*LQDF*YD@D*CM:E*ADC>:DF;*E9* 2D*:LQDF*OD@*@DF>LBD*>E*?AD* ?DE KMLDCDL O@M?D>EF*NM@*D<:A*M@QFK*ME*MN*FLQDF* >E:@D?A*?DKOD@F*EM?*F?@MEQ W@*l*H9`J X 9 r = 0.10 0.000 0.001 0.002 0.003 0.004 0 25 50 75 100 Optimum Growth Temperature o C Number of disulfide bonds/residue

PAGE 34

* GV * * * 5LQD*F?<6>C>?7*>F*?*AME*MN*FLQDF*:C>c>EQ* W %! X C>c>EQ W %" X 9* r = 0.42 0.03 0.04 0.05 0.06 0.07 40 60 80 100 Optimum Growth Temperature o C Number of salt bridges/residue < r = 0.35 0.03 0.04 0.05 0.06 0.07 0 25 50 75 100 Optimum Growth Temperature o C Number of salt bridges/residue r = 0.24 0.03 0.04 0.05 0.06 0.07 10 20 30 40 Optimum Growth Temperature o C Number of salt bridges/residue 6 : 16789:! ;M '<99:?>=6<@!B:=L::@!D>?=!B96F7:!96PE@:DD!>@F!*/%J! WLQDF*OD@*@DF>LBD*>E*<*O@M?DMKD* >E:@D?A*(1'9*W6X*'A>F* ?@DEL*>F*EM?*M6FD@=DL*6DCMY*aH M &*W@*l*H9GaX;*6B?*:
PAGE 35

* `H 3MYD=D@;* >?*YQAD@*?DKOD@LQDF*<@D* F?<6>C>c>EQ;*LBD*?M*<*LD:@DE*?AD*LDFMC=ME*ODEF*YMBCL*D\OC<>E*?AD* ?@DEL*FDDE*>E*MB@*F?BL7*MN*E:@DLQDF*QAD@* ?DKOD@E D*CD<@E>EQ*>E?DQ@ME*MN*NDE?DQ@6DL*<6M=D*BF>EQ*?YM*K<:A>ED*CD<@E>EQ*ME* N>:ME9*'AD*DE?>@D*O>ODC>ED*YEL>=>LBEDL9*" G F:M@DF*YD @D*@DOM@?DL*NM@*?AD*5_"* W+ >QB@D II QB@D*5I;*\*4X* ME* MN*<::B@N>DL*M@QF KFX*YD@D*@DOM@?DL*NM@*?AD*5_&*W+ >QB@D IG X9* 'AD 5_"* MB?OD@NM@KDL*EL>=>LBDL*?M*?AD*CMY*?DKOD@< ?B@D*LQB@D*IGQA*?DKOD@QB@D*IG6X;*:MKOCD?D*LQB@D*IG:X* QB@D*IGLX9* 'AD*EDQ=D*" G =F*<*6D??D@*DF?>KEDL*O@DL>:?M@*OD@NM@KDL*6DF? ME*?AD*A>QA*?DKOD@F*YQA*?DKOD@EL>=>LBMEF*QAD@*?DKOD@KO@M =DKDE?*M=D@*>EL>=>LBE*6M?A*?AD* CMY* QA* ?DKOD@?*:K6E*N<=M@*MN* CMYD@*?DKOD@L*EM?*@DLB:D]6>DL* ?M*?AD*DE?>@D*LQB@D*IID*FAMY F*O@DL>:?DL*(1'*W:?DL*67*?AD*5_"*ME* ?AD*:MKOCD?D*LME*>F*NMBEL*?M*6D*H9[J;*>EL>:EQ* ?AQA*<::B@<:7*>E*O@DL>:?>EQ*(1'*<:@MFF*?AD*@
PAGE 36

* `I * * * !"#$ "#" "#$ "#% &'()*+,-*++-.,/-+0*(/,1 0,23+,&-4 5,6*-/ &'()*+,-*++-071(840853+ 2884&-/ 2884&-4 0,23+,&-.,/-+0*(/,1 .8(9*23:,1-13&'2)31,-58.1& 5,6*-.,/-+0*(/,1 5,6*-4 0,23+,&-071(840853+ 2884&-482*( 2884&-.,/-+0*(/,1 2884&-071(840853+ ;(.*-<&-/+ .8(9*23:,1-&*26-5(31/,& 0,23+,&-482*( 2884&-48&-+0*(/,1 5,6*-071(840853+ 0644 0,23+,&-48&-+0*(/,1 &'()*+,-*++-482*( 5,6*-48&-+0*(/,1 0,23+,&-/ &'()*+,-*++-48&-+0*(/,1 5,6*-482*( =(.*-/+ ;(.*->?&-/+ )-3@7A(,2 +8953.,1 ),*6'(,& ( B &@(CDECF$"+CGH=H < 6 !"#$ !"#% !"#& "#" '()*+,-.+,,./-0.,1+)0-2 3445'.54'.,1+)0-2 1-36,-'.54'.,1+)0-2 1-36,-'.5 *.6789)-3 :-;+./-0.,1+)0-2 1-36,-'./-0.,1+)0-2 /4)<+36=-2.26'(3*62-.:4/2' 1-36,-'.182)4514:6, 3445'./-0.,1+)0-2 :-;+.543+) :-;+.54'.,1+)0-2 1;55 '()*+,-.+,,.543+) 1-36,-'.543+) 3445'.0 >)/+.?'.0, '()*+,-.+,,.54'.,1+)0-2 :-;+.182)4514:6, :-;+.5 >)/+.@$'.0, 3445'.5 3445'.182)4514:6, '()*+,-.+,,.182)4514:6, 3445'.543+) :-;+.0 A)/+.0, 1-36,-'.0 /4)<+36=-2.'+3;.:)620-' ,4<:6/-2 *-+;()-' ) & '7)BCDBE%",BFGAG 16789:! ;; $8KK<9=!Q:P=<9!&:79:DD6<@J

PAGE 37

* `G * * * !"! !"# !"$ !"% &''()*+,-.'(+'/01 &''()*('&2. +3&013)*+,-.'(+'/01 &''()*4 +3&013)*( &''()*(')*1+2.43/352*634*1+2.436'.72&083-*-0)9&:0-3*/'6-) /352*( &''()*634*1+2.43&''()*( )9.:213*211*('&2. +3&013)*634*1+2.43)9.:213*211*634*1+2.43/352*4 )9.:213*211*+,-.'(+'/01 +3&013)*('&2. +5(( /352*(')*1+2.436'.72&083-*)2&5*/.0-43) ;.62*<)*41 +3&013)*(')*1+2.43;.62*=%)*41 +3&013)*4 )9.:213*211*(')*1+2.43/352*+,-.'(+'/01 >.62*41 /352*('&2. :*0?,@.3& 1'7/063:3259.3) # )?.ABCADEEA>FGHAID>DAJK>LA;MNAOF;CFE L : D !" !# $ %&'()* +,-.(/01&2)2034/502&)%,+23 %&'()+&6)78(-6&2 34-5(7&)(77)+&6)78(-6&2 /,,*3)* 8&/07&3)892-,*8,%07 8&/07&3)* /,,*3)6 %&'()6 /,,*3)+&6)78(-6&2 34-5(7&)(77)892-,*8,%07 8&/07&3)+&6)78(-6&2 8'** /,,*3)*,3)78(-6&2 %&'()*,3)78(-6&2 /,,*3)892-,*8,%07 8&/07&3)6 +,-.(/01&2)3(/')%-026&3 /,,*3)*,/(%&'()892-,*8,%07 8&/07&3)*,3)78(-6&2 :-+();3)67 34-5(7&)(77)*,3)78(-6&2 8&/07&3)*,/(%&'()*,/(34-5(7&)(77)*,/(<-+()67 :-+()=>3)67 5)0?9@-&/ 7,.%0+&2 5&('4-&3 # 3?-ABCADEFDGHIJKLALG
PAGE 38

* `` * 16789:! ;T $8KK<9=!Q:P=<9!&:79:DD6<@J 2D*>E?DQ@EL>=>LBEQ* <*5_"*KD?AML9*'AD*:MK6>EDL*O@DL>:?M@*MB?OD@NM@KDL*EL>=>LB:?M@F*YADE* DL*?M*CMY*?DKOD@QA*?DKOD@@D*LL*EM?*OD@NM@K*YDCC*M E*?AD*FB6F:?DL*(1'*W:?DL*67*?AD*5_"*ME*?AD*:MKOCD?D*L ME*MN H9[J WDX 9 'AD*CDKO@M=DKDE?*>E*OD@NM@KQB@D*IILX9*'A>F*LEQ*Ia*M@QFKF*N@MK*D<:A*IH M &* 6>E*@EQ*N@MK*H Z VH M &*FKF*N@MK*VH Z IHH M &;*@DFBC?>EQ*>E*<*?M?FKF9*'AD*M@QFKF*>E*?AD*`H Z aH M &*6>EF*FAMYDL*?AD*Q@DME*NM@* D<:A*MN*?AD*NDEDL9*'AD*M@QFKF*?A NND@DE?*=:A*:MBCL*D\OC<>E*YA7* KMF?*MN*?AD*>EL>=>LB=D*" G =E*?A>F*OCM?;*EDL*O@DL>:?M@*N<>CDL*?M*OD@NM@K*YDCC*ME*?A>F*L
PAGE 39

* `a * * * * * * !"#! !"#$ !"%! !"%$ &'()*+ &'()*,-./01,0&23 &'()*4'+*3,)/+'. &'()*1 &'()*105)/ &'()*106*3,)/+'. 7*28-9/'5 ,'523'6*+ ,'523'6*,-./01,0&23 ,'523'6*4'+*3,)/+'. ,'523'6*1 ,'523'6*105)/ ,'523'6*106*3,)/+'. ,(11 50016*+ 50016*,-./01,0&23 50016*4'+*3,)/+'. 50016*1 50016*105)/ 50016*106*3,)/+'. 40/:)52;'.*.26<572.'*&04.6 40/:)52;'.*6)5(*&/2.+'6 6#6*+3 =/4)*$6*+3 ?/4)*+3 30:&24'. 7')(1.+,-/ =1.+,?'6,-/ 0*33 0)45/)6,:,5@7A1)4 691:+/),+//,386,/0+1-)2 /8;(5.)2 :)+*91)6 +BBC=DBEF6BG=H 6@/FGIFJ%!/FKD>D < 6 16789:! ;O $8KK<9=!Q:P=<9!'?>DD6C6P>=6<@J

PAGE 40

* `J * * * * !"## !"$! !"$# !"%! &'()*+ &'()*,-./01,0&23 &'()*4'+*3,)/+'. &'()*1 &'()*105*3,)/+'. ,'623'5*+ ,'623'5*,-./01,0&23 ,'623'5*4'+*3,)/+'. ,'623'5*1 ,'623'5*105*3,)/+'. ,(11 60015*+ 60015*,-./01,0&23 60015*4'+*3,)/+'. 60015*1 60015*106)/ 60015*105*3,)/+'. 40/7)628'.*.2596:2.'*&04.5 40/7)628'.*5)6(*&/2.+'5 59/:)3'*)33*,-./01,0&23 59/:)3'*)33*4'+*3,)/+'. 59/:)3'*)33*105*3,)/+'. ;/4)*#5*+3 ,'623'5*106)/ 59/:)3'*)33*106)/ ;/4)*<$5*+3 &'()*106)/ =/4)*+3 :*2>-?/'6 307&24'. :')(9/'5 )@@A;B@CD5@E;F 5>3DEGDBHHD=FIJDKB=B !"! !"# !"$ !"% !"& !"' ()*+,)-.(/0123(24+, *223-.5)6.,(716)0 *223-.3 -8197,).7,,.32-.,(716)0 4):7.6 ()*+,)-.5)6.,(716)0 -8197,).7,,.5)6.,(716)0 *223-.6 521;7*+<)0.0+-8*9+0).4250*223-.32-.,(716)0 4):7.3 4):7.(/0123(24+, =157.6, 521;7*+<)0.-7*:.41+06)()*+,)-.6 *223-.32*71 ()*+,)-.3 -8197,).7,,.32*71 ()*+,)-.32-.,(716)0 -8197,).7,,.(/0123(24+, 4):7.32-.,(716)0 (:33 4):7.5)6.,(716)0 ()*+,)-.32*71 >157.'-.6, ,2;4+5)0 *223-.(/0123(24+, >157.#?-.6, 4):7.32*71 9.+@/A1)* 9)7:81)7BBC>DBEF-BG>H -@,FGIFJCKJDLMNHOFOD=D : L 16789:! ;U $8KK<9=!Q:P=<9!'?>DD6C6P>=6<@J! 2D*>E?DQ@EL>=>LBEQ*<*5_"*KD?AML9*'AD*:MK6>EDL*O@DL>:?M@*L>L*EM?*OD@NM@K*YDCC* 6DCMY*aH M &*W
PAGE 41

* `e AD*FEQ*<*5_&*?M*:CN7*D<:A*M@QFK* >E?M*<*IH M &*?DKOD@E9*3MYD=D@;*?AD*:MK6>EDL*O@DL>:?M@*L>L*EM?*OD@NM@K* KB:A*6D??D@*?AEL>=>LBQB@D*IG<;*6;*:;*LX9*%E*N<:?;* YADE*DL*?M*?AD FB6F?*YEL>=>LBME*MN*M@QFKF*?AN>DL9*3MYD=D@;*?A>F*KD?@>:*LMDF*EM?*@DNCD:?*?AD*M@L>EEF9*+M@* D\N*FK*Y>?A*N>DL*>E*?AD*eH Z [H M &;*>?*YMBCL* 6D*:MEF>LD@DL*E:M@@D:?*N7>EQ*>?*>E*?AD*`H Z aH M &*6>E9*'A>F*:MBCL*D\OC<>E* YA7*?AD*:MK6>EDL*O@DL>:?M@*L*EM?*OD@NM@K*YDCC*ME*=D*:MK6>EME*MN*NDDL*O@D=>MBFC7*>E* ?AD*O@DL>:?>ME*MN*(1'*67*QDEMK>:]O@M?DMK>:*NDME*MN*NMB@*QDEDF*?M*O@DL>:?*(1'*NM@*O@MS<@7M?DF;*BF>EQ*<* 5_/*?M*6B>CL*?AD>@*O@DL>:?M@*N@MK*?AD*NMB@*QDEDFk*1&* :ME?DE?9 'AD>@*O@DL>:?M@*A:?>ME*<::B@<:7*MN*[a9HVr9*3MYD=D@;*?AD>@*O@DL>:?M@*YN7* M@QFKF*?AD@*KDFMOA>C>:*M@*?AD@KM]A7OD@?AD@KMOA>C>:;*E?M*N>ED@* ?DKOD@EF9*3MYD=D@;*YADE*YD*BFDL*3//F*?M*N>EL*?ADFD*QDEDF*>E*MB@* QDEMK DF;*YD*NMBEL*?AL*EM?*A<=D*F?F*?AD*L>F?@>6B?>ME*MN*?AD*EBK6D@*MN*QDEMKDF*Y>?A*cD@M;*MED;*?YM;*?A@DD* F*NDF* CDFF*QDED@ c<6CD*?M*O@DL>:?>ME*MN*(1'*<:@MFF*<*6@MFKF9*

PAGE 42

* `R %>B?:! T )8AB:9!@F!>??!U!?J! G # H J )0/4#"*(+*1#)#5*W\X )0/4#"*(+* 1#)(/#5*2%'3* \*1#)#5 H IJV I Iae[ G Va` ` eee a GHR %? >F*>KOM@?:?M@*Y?AMB?*SEMYCDLQD* MN*?AD*O A7CMQDE7*MN*?AD*M@QFKF*BFDL9*LL>EQ*OA7CMQDE7*?M*?AD*NDKO@M=D*?AD*<::B@<:7*MN*O@DL>:?>MEF 9 &ME*MN*%_.2"#8*:*@D:* LE*:ML>EQ*@DQ>MEF*YD@D*BFDL*?M*:ME* ME*Y?AD@*MN*?ADFD*NDEQ*?DKOD@QB@D* I` X9 'AD*FEQ*?DKOD@E*?AD*H Z `H M &*@ME*:?M@ F*NM@* KDFMOA>C>:*(1'F9*'A>F* :MBCL D\OC<>E*?AD*C<:S*MN*:M@@DCME*NMBEL*NM@*?AD* KD?:*L:?*(1'F*NM@*M@QFKF*6DCMY* aH M &;*MEC7*?AD*:MK6>EDL*O@DL>:?M@*A?>=D*" G =EL>=>LBC7*O@DL>:?DL*ME*KD?:*@DF*FBQQDF?F* ?AE*M@LD@*?M*O@DL>:?*KD?:*FEQ*?DKOD@EQ*LEME*MN*>EL>=>LB:?M@F*?@<>EDL*ME*?AD*FDPBDE:>EQ L:?M@9 0F>EQ*>ENM@KME*ME*?AD* ?<\MEMK>:*<6BELE*?AD*F:?M@*MN* FEQ*?DKOD@EQ*?A>F*NDQA?*NB@?AD@*6MMF?*?AD*<::B@<:7*MN*

PAGE 43

* `[ MB@*FEQ*?DKOD@KBK*Q@MY?A*?DKOD@:?>MEF9 * * !"#"$%$&' $%& $%( $%) $%' $%* $%+ &$ ($ )$ ,-./0123"45./5!-67!5" 8 9" :!-;6182"8<"=>?@ABC !"#"$%&'( %&') %&*% %&*+ '% *% ,% -./01234"56/06!.78!6" 9 :" ;5<< < 6 16789:! ;V 19>P=6<@!H!>@F!2%33!GBH!SDJ!D>AK?6@7! =:AK:9>=89:J! )M*:M@@DCMEF*YD@D*M6FD@=DL*NM@*D>?AD@*ND
PAGE 44

* `V '2"3%-&!4Q '*)',5$4*)$ /DF*A<=D*6DDE*:MELB:?DL*?M*?@7*:*:*LD?D@K>E:*MO?>KBK*Q@MY?A*?DKOD@MEF*6D?YDDE*=<@>MBF*NDDF*YD@D*:MELB:?DL*J Z IH*7D<@F*E*F>cD9*%E* ?A>F*F?BL7;*YD*LMEF*ME*ED*CD<@E>EQ*ED*?AD*@DFBC?F*MN*EL>=>LB:?*(1';*YA>:A*ADL*O@D=>MBFC79* 2D*NMBEL*?AMEF*YD@D*MEC7*YDDL* ?M*?AD*DE?>@D*@QA*?DKOD@LD@DL9*)MED*MN*?AD*>EL>=>LB:?M@F*MN*(1'*>E*?AD* H Z aH M &*@EDL*O@DL>:?M@*MB? Z OD@NM@KDL*EL>=>LBQAC> QA?>EQ*?AD*B?>C>?7*MN* FB:A*DL*?M*?AD*CMY*?DKOD@:?>ME*<::B@<:7*YQA*?DKOD@F*A>QAC>QA?F*?AD*QE*MB@*BELD@F?EQ*MN*YAME*O@DFFB@DF*<:?*DF*KDE?>MEDL* O@D=>MBFC7*D\EDL*NDFKF*?M*Q@MY*QA*?DKOD@LD<*?A :*:* MEF*YMBCL*<:?*?M*F?<6>C>cD*K<:@MKMCD:BCDF*QAD@* ?DKOD@DF*A<=D*D\EDL* CMY*?DKOD@MEF* W ;* %% X 9* 'AD@DNM@D;*NB?B@D*F?BL>DF*:MBCL*6D*<>KDL*EQ*MB@*BELD@F?EQ*MN* YAE*O<@?>:BC<@;*YAMEF*L>F?>EQB>FA*KDFMOA>CDF*N@MK*D<:A*M?AD@9*'A>F*YMBCL*>KO@M=D*MB@*<6>C>?7*

PAGE 45

* aH ?M*O@DL>:?*(1'F*>E*? A>F*@6B?D*?M*MB@* BELD@F?EQ*MN* K<:@MKMCD:BC<@*F?<6>C>?79

PAGE 46

* aI &-1-&-)'-$ I9* -9*'M@F?DEFFME* 4:-80, ;*!A7F>:M:ADK>:F?* :MKKBE>?7*:MKOMF>?>ME*=D@F>?7*>E*-E?<@: ?>:*FD<*>:D9* +9615/9,-A1.5/?1/0, ;X ;*`[eV d `[[I*WGHIJX9 G9* !9*U>;*29*U9*"AMEQ*CL>EQ*OCBK6>EQ* K>:@M6>MKD9* I3A+-J, ;; ;*I`I[ d I``H*WGHIRX9 `9* 39*fADEQ;*39*2B;*1DED Z :DE?@>:*ME*F*NM@*?AD*:M@@DCME*6D?YDDE* ?AD*QBED Z :7?MF>ED*:ME?DE?*CD=DCF*?>MEF*MN* O@MS<@7M?>:*FOD:>DF9* KAL-K1/19;/5D8:1.7 9* ;;!$8KK?!; ;*5R*WGHIHX9 a9* &9*&CCD;*5?@B:? B@:*:M@@DCC>?79* J,-K1/0,-L<4D, OXV ;*`G`[` d `G`[e*WGHHHX9 J9* 59*&ALME*MN*+<:?M@F*"DFOMEF>6CD*NM@* #EAC>?7*MN*!@M?D>EF T -*5 ?@B:?B@:F*4D@;*U9*"9*8M6@7;*"DCMEFA>OF*4D?YDDE*1DEMK>:*1q&*&ME?DE?;*")-* 5D:MEL<@7*5?@B:?B@DF;*KE*!@MS<@7M?DF9* J,A/0,-+6/0, UU ;*e`G d e`e*WIVVRX9 R9* U9*UM@L<;*'9*(9*.DLDFO@DFBCN>LD*6MEL>EQ*>E*O@M?D>EF*N@MK* ?AD@KMOA>C>:*<@:ATIH9IIJJ]GHII]aHVIJe9 [9* "9*!9*"9*/D?O=D*O@M?DMKD*F*MN* OF7:A@MOA>C>:*=D@FBF*KDFMOA>C>:*6<:?D@>DFT*%EF>QA?F*>E?M*?AD*KMCD:BC<@* 6F*MN*:MCL*ME*MN*O@M?D>EF9* KAL-N49/D1.7 9* ;M ;*II*WGHHVX9 V9* %9*)9*4D@DcM=FS7;*,9*49*fDCLM=>:A;*#9*%9*5A:A;*!MF>?>=D*=D* LDF>QE*>E*F?<6>C>?7*ME*MN*EEF9* OP/3-L/D=>:,K1/0, T ;*HaV[ d HJHR*WGHHRX9 IH9* /9*4DD67* 4:-80, ;*'AD*QDEMK>:F*MN*L>FBCN>LD*6MEL>EQ*E*F?<6>C>cME*>E* ?AD@KMOA>CDF9* OP/3-K1/0, T ;*IJaV d IJJ[*WGHHJX9 II9* '9*,B* 4:-80, ;*!@DL>:?>EQ*KDC?>EQ*?DKOD@@D:?C7*N@MK*O@M?D>E*FDPBDE:DF9* L/D=>:,-K1/0,-L<4D, TT ;*aaJ d aJH*WGHHVX9 IG9* "9*89*&AMCMQ7*D=ME*MN*KD?<6MC>:* ?AD@KM?MCD@E*#F:AD@>:A><*:MC>9* 3.1,-EQ4R-S/5T2-QSF 9* TUM ;*IGGH d IGG`* WGHI`X9 I`9* 89*5C>:*O@M?D>EF*?AF?QA*?DKOD@
PAGE 47

* aG Ia9* ,9*-9*$>CC;*,9*1AMFA;*U9*$9*5:AK>?;*!A7F>:K>?F*MN*:DCCF*FLMK*MN*:@MYLF*NM@*@M6BF?*QDED*ED?YM@S*>END@DE:D9* Q8:A4:?7*>E*&-5![*BF>EQ*!:MEF*.:,-U>9.:,K1/19;/5D8, XX ;*IeR d IRG*WGHHVX9 IR9* "9*/<:SDCO@:*F*MN* <*OD@K:@M6>?7*@D=DL*@DFOMEFD*?M*?A54 9* >FS>@P:!<@ ;*`e[ d RI* WGHIIX9 I[9* &9*#9*5A<@O* 4:-80, ;*3BK6MCL?kF*FO:@M6>=D@F>?7*>F*:ME?@MCCDL*67* ?DKOD@E*QDM?AD@K@MEKDE?F9* I3A+-J, Y ;*IIee d Ra*WGHIaX9 IV9* ,9*$9*,MAC;*U9*.@MEKDE?:@M6>?>DF*MN*?ED*$.041.-G.1B7 V47, UV ;*$aae d $aJe*WGHIRX9 GI9* $9*-9*4DEFME* 4:-80, ;*1DE4.041.-G.1B7-V47, U; ;*`e d aG*WGHI`X9 GG9* ,9*$9*!@B>??;*'9*'O?F*EF9* Q>.041.-G.1B7-V47, TV ;*eI d eJ*WGHHRX9 G`9* !9*U9*-9*&M:S* 4:-80, ;*4>MO7?AMET*+@DDC7*<=<>C<6CD*!7?AME*?MMCF*NM@*:MKOB?MEMCMQ7*M>ENM@K:F9* K1/19;/5D8:1.7 9* OV ;*IaGG d IaG`*WGHHVX9 Ga9* #9*!@BDFFD;*U9*!DOC>DF;*+9*(9*1Cw:SED@;*5% )-T*-::B@QA Z ?A@MBQAOB?*KBC?>OCD* FDPBDE:D*QEKDE?*MN*@>6MFMK=D*MEC>ED*@DFMB@:D*NM@*PB?7*:AD:SDL* QEDL*@>6MFMK6CD*Y> ?A*-"49* Q>.041.-G.1B7V47, TV ;*RI[[ d RIVe*WGHHRX9 Ge9* A??OFT]]FC<69M@Q]KMLO>OD] GR9* -9*5;*'9*89*4CBELDCC;*&MKO<@=D*O@M?D>E*KMLDCC>EQ*67*FFN<:?>ME*MN FOE?F9* J,-A/0,-K1/0, OTU ;* RRV d [IJ WIVV`X 9 G[9* /9*-C6D@?FDE* 4:-80, ;*1DEMKD*FDPBDE:DF*MN*@<@D;*BE:BC?B@DL*6<:?D@><*M6?<>EDL*67* L>NND@DE?>EE>EQ*MN*KBC?>OCD*KD?
PAGE 48

* a` GV9* 39*/9*4D@KE*L.041.-G.1B7-V47, OY ;*G`J d GaG* WGHHHX9 `H9* 39*$D=DCM OKDE?*'DLD;*IGH*WGHIJX9 `I9* /9*.>*5ADE ;*-9*5;*5?F?>::?>ME*MN* O@M?D>E*F?@B:?B@DF9* O5/:419-3.1, ;*GJHR d GJGa*WGHHeX9 `G9* 29*,<6F:A;*&9*5:?>ME<@7*MN*O@M?D>E*FD:MEL<@7*F?@B:?B@DT*!?>ME*MN*A7L@MQDE Z 6MELDL*::S;*!$4*N>CD*O<@FD@*KOCDKDE?DL* >E*!7?AME9* K1/19;/5D8:1.7 9* ;\ ;*G`H[ d G`IH*WGHH`X9 `a9* /9*f9*'>DE;*-9*19*/D7D@;*$9*,9*57L7SM=<;*59*U9*5O>DCKCSD;*/<\>KBK* 6>C>?DF*MN*@DF>LBDF*>E*O @M?D>EF9* OP/3-W94 9* Y WGHI`X;* LM>TIH9I`RI]^MB@E:F*MN*$>FBCN>LD*4MEL>EQ*E*5?<6>C>cME*>E* 'AD@KMOA>CDF9* OP/3-K1/0, T ;*D`HV*WGHHJX9 `e9* 29*3BKOA@D7;*-9*$FB:F9* J,-A/0,N58=<, ;U ;*`` d `[*WIVVeX9 `R9* +9*!DL@DQMF<* 4:-80, ;*5:>S>? Z CD<@ET*/<:A>ED*8D<@E>EQ*>E*!7?AME9* ;O ;*G[GJ d G[`H* WGHIGX9 `[9* #9*,<@FDE?>* 4:-80, ;*-*AMC>F?>:*ED*#:M Z F7F?DKF*6>MCMQ79* OP/3-K1/0, \ ;*R d II*WGHIIX9 `V9* '<@ <*(:DBK;*&MM@L>E?>ME;*!<@?>:>OF?@7*MN*FDCD:?DL*F?>ME*WGHHV Z GHI`X9*LM>TIH9IJVa]!-)1-#-9[aHRGI aH9* 59* &AL>;*+9*Lt(=>L>M;*59*5OD>:A;*59*-BL>:;*59*$D*/ME?D;*$9*%BL>:ME D;*/9* !>:AD@BK*&MM@L>E?>ME*!<@?>:>O@MEKDE??>ME*WGHHV Z GHI`X9* LM>TIH9IJVa]!-)1-#-9[aHRI[ aI9* "9*8D>EMEDE;*39*5BQ=D9* Q>.041.G.1B7-V47, T\ ;*GHIH d GHIG*WGHIIX9 aG9* 5DPBDE:D*"D=D*5B6K>FF>MEF*5?EQ*?AD*5"-*'MMCS>?*?M*:ME=D@?* 9F@<*N>CDF*>E?M*M?AD@*NM@KMEM?D:AEMCMQ7*%ENM@KME*W05Xx*GHII Z 9 -=<>C<6CD* N@MKT*A??OFT]]YYY9E:6>9ECK9E>A9QM=]6MMSF])4,IJ[VHH]

PAGE 49

* aa a`9* $9*,>K* 4:-80, ;*+@QA Z ?A@MBQAOB?*FAM@? Z @DEQ*N@:?>ME9* "(!%-I+++-L/9;,-L/D=>:,-I9:400,-K1/19;/5D 8,-L/D=>:,K1/0,-LIKLK-"(!% ;*I d [*WGHIJX9 aa9* '9*49*,9*"DLL7* 4:-80, ;*'AD*1DEMKDF*(E8>ED*$N>:ME9* Q>.041.-G.1B7-V47, UT ;*$IHVV d $IIHe*WGHIJX9 aJ9* ,9*49*fDCLM=>:A;*%9*)9*4D@DcM=FS7;*#9*%9*5A:A;*!@M?D>E*EC>:*ME9* OP/3-L/D=>:,-K1/0, T ;*HHeG d HHRG* WGHHRX9 ae9* !9*U9*3ME*FME*MN*O@M?D>E* FDPBDE:DF*N@MK*KD FMOA>C>:*C>:*/D?ADF9* O5/.,-Q8:0,-G.8B,-3.1,-X,-3,-G, \W ;*`JR[ d [`*WIVVVX9 aR9* '9*5EDE 4:-80, 2 -E*BEBFBC>?7*L>F:CMFDL*67*?AD* :MKO<@>FME*MN* Y<45D>7-:<45D/=<10>7 EM@Q:* O7@MOAMFOAEQ*3DC>:LD*6>?M@9* J,-A/0,-K1/0, TXY ;*JeJ d J[H*WGHH[X9 aV9* "9*"D>F??>ME*E*MN*D\?@DKDC7* AC>:*6<:?D@><9* G5.<, A1T5/?1/0, X; ;*`J` d `eH*WIVRHX9 JH9* 59*,BK<@;*"9*)BFF>EM=;*5LQD*F?<6>C>?7*>E*KMEMKD@>:*O@M?D>EF9* J,-A/0,-K1/0, O\T ;*IGaI d JJ*WIVVVX9 JI9* 59*,BK<@;*&9*U9*'F<>;*49*/<;*"9*)BFF>EM=;*&ME?@>6B?>ME*MN*F LQDF*?MY<@L* O@M?D>E*?AD@KMF?<6>C>?79* J,-K1/D/0,-3:5>.:,-ZM9, ;X ;*RV d [J*WGHHHX9 JG9* f9*59*3DELF:A;*49*'>LM@;*$M*FLQDF*F?<6>C>cD*O@M?D>EFu*-*:ME?>EBBK* DCD:?@MF?:*F9* O5/:419-3.1, T ;*GII Z GGe*WIVVaX9 J`9* -9*39*#C:M:S;*'AD*F?<6>C>?7*MN*FLQDF*QA*?DKOD@KOC>:MEF*NM@* A7OD@?AD@KMOA>C>:*O@M?D>EF9* J/>5980-/;-A/04.>085-K1/0/CM, OYU_O ;*a[V Z JHG* WIVV[X9 Ja9* 59*5BEME*MN*?AD*QCM6:@M6>MKD9* 3.149.4 TU Y ;*IGeI`JV d IGeI`JV*WGHIJX9 JJ9* 49*-9*/D?AD* 4:-80, ;*'AD*OF7:A@MOA>C>:*C>NDF?7CD*<*OF7:A@D@7?A@:*:* 9ECK9E>A9QM=]EB::M@D

PAGE 50

* aJ "33-).4`!"_!$533,-0-)%"&R! 0-%2*.$ 5BOOM@?*_D:?M@*"DQ@DFF>MET 'AD*L@F?*>KOB?DL*FM*FF>EQ*=?A*?AD*KDKOB?DL*LcDL*W?AD* KD=>LDL*67*?AD*F?MEX9*'AD*L?*> E?M*<*?@<>E>EQ*FD? EQ*?AD*?@<>Em?DF?mFOC>?*NBE:?>ME9* train_test_split(X_all_scaled, y, test_size=0.5, random_state=0) -*Q@>L*FD<@:A*YE>EQ*FD?*BF>EQ*IH Z NMCL*:@MFF*=LME EQ*NBE:?> ME ;*Y>?A*?AD*NMCCMY>EQ*A7OD@O<@QA*?DKOD@ED<@****&T*H9HHI;*H9HI;*H9I;*I;*IH;*IHH;*IHHH +M@*?AD*CMY* ?DKOD@ED<@****&T*H9HHI;*H9HI;*H9I;*I 'AD*6DF?*DF?>KL*FD<@:A*Y?*ME*?AD*?DF?*FD?*BF>EQ*IH Z NMCL*:@MF F*=LME O?F*YD@D*@BE;*MED*BF>EQ*<*"4+*SD@EDC*EQ*<*C>ED<@*SD@EDC9** 'A>F*YKD9 'AD*L@F?*>KOB?DL*FM*FF>EQ*=?A*?AD*KDKOB?DL*LcDL*W?AD*KD=>LDL*67*?AD*F?MEX9*'AD*L?*>E ?M*<*?@<>E>EQ*FD?;*?DF?*FD?*WJHr*D<:AX;*BF>EQ*?AD*?@<>Em?DF?mFOC>?* NBE:?>ME9* 'AD*Q@>L*FD<@:A*YEQ*IH Z NMCL*:@MFF*=LME* EQ* NBE:?>ME;* Y>?A*?AD*NMCCMY>EQ*A7OD@O<@ED<@****&T*H9HHI;*H9HI;*H9I;*I;*IH;*IHH;*IHHH 'AD*6DF?*DF?>K? ?M*?AD*?DF?*FD?;*Y>?A*IH Z NMCL*:@MFF*=LME N>:MET 'AD*L?*>E?M*<*?@<>E>EQ*FD?*EQ*?AD*5?@N>DL,+MCL*NBE:?>ME9*(EC7*MED*NMCL*Y
PAGE 51

* ae -*Q@>L*FD<@:A*YEQ*J Z NMCL*5?@N>DL,+MCL* :@MFF*=LME EQ*NBE:?>ME ;*M=D@*?AD*NMCCMY>EQ* A7OD@O<@ED<@****&T*H9HHI;*H9HI;*H9I;*I +M@*?AD*A>QA*?DKOD@ED<@****&T*H9HHI;*H9HI;*H9I;*I;*IH;*IHH;*IHHH 'AD*6DF?*DF?>K?*?M*?AD*?DF?*FD?;*BF>EQ*J Z NMCL*5 ?@N>DL,+MCL*:@MFF* =LME*
PAGE 52

* aR "33-).4`!#_!$533,-0-)%"&R! 14/5&$!").!%"#,-$ * * * !"! !"# !"$ !"% &'()*+,-./0/,12+3,/.04'&/1 +''51067/('56'4,8 +''5105'+*( +''5105'1086*(9./ 4.:*0&.9086*(9./ 6.+,8.105 ;(&*098 <(&*0=1098 +''5109 +''510&.9086*(9./ 6.+,8.1067/('56'4,8 <(&*0>%1098 12(3*8.0*8805'+*( 6.+,8.105'1086*(9./ 6.+,8.105'+*( +''5105 12(3*8.0*88067/('56'4,8 4.:*05 6.+,8.10&.9086*(9./ 4.:*09 12(3*8.0*880&.9086*(9./ 6:55 &'()*+,-./01*+:04(,/9.1 4.:*05'1086*(9./ 12(3*8.0*8805'1086*(9./ 6.+,8.109 4.:*067/('56'4,8 4.:*05'+*( 30,?7@(.+ 8')4,&./ 3.*:2(.1 ( # 1?(ABCADEEA;FGHAID;DAJKAELCFDPE!6@F6S6F8>?!C:>=89: >@F!C<9!=E:!PEDL*O@DL>:?M@*MB?OD@NM@KF*EL>=>LBME*>F*:MKO<@<6CD9*

PAGE 53

* a[ -=#*>%?>=$%> -=#*>%?,@ -=#*>%?=AA -&=?=AA +%#* +=&=?334?@A#?3B66 C 5B9 +=&=?D33333379; %&-4;:<79 %&&8<;<83 59B6433<9 +=&=?334?@A#?3B66 C 5B9 +=&=?D33333379; %&-4;:<79 %&&8<<3<8 59B6433<9 +=&=?334?-&)?3B66 C 5B9 +=&=?E633333336 %&-4;:;<< %&&8<;<88 63B8676 +=&=?334?-&)?3B66 C 5B9 +=&=?E633333336 %&-4;:;<< %&&8<<337 63B8676 +=&=?33:?-&)?3B66 C 5B9 +=&=?=633333557 %&-4::<75 %&&758;8: 67B;6458 +=&=?33B?:!$;! 0:=>7:@=>D:=!6@C<9A>=6<@ J! 5ME;*5"-*<::DFF>ME*EQ* ?DKOD@ENM@KME9

PAGE 54

* aV +GHIJ'-5'KLMNOPB -=#*>%?>=$%> -=#*>%?,@ -=#*>%?=AA -&=?=AA +%#* +=&=?375?-&)?F C 3B66 +=&=?=5333357<5 %&-4;;88; %&&8<4435 68B393<3; +=&=?375?-&)?F C 3B66 +=&=?=5333357<5 %&-4;;88; %&&8<4453 68B393<3; +=&=?376?@A#?3B66 C 5B9 +=&=?=53333537: %&-4;;8<< %&&8<<395 69B373; +=&=?376?@A#?3B66 C 5B9 +=&=?=53333537: %&-4;;8<< %&&8<<3<: 69B373; +=&=?376?@A#?F C 3B66 +=&=?=53333585; %&-4;;957 %&&8<4793 69B373; +=&=?376?-&)?3B66 C 5B9 +=&=?=533335378 %&-4;;89< %&&8<<345 68B;6755: +=&=?376?-&)?3B66 C 5B9 +=&=?=533335378 %&-4;;89< %&&8<<559 68B;6755: +=&=?376?-&)?3B66 C 5B9 +=&=?=533335378 %&-4;;89< %&&8<<588 68B;6755: +=&=?376?-&)?F C 3B66 +=&=?=533335858 %&-4;;8;< %&&8<47<7 68B;6755: +=&=?377?-&)?3B66 C 5B9 +=&=?=533335674 %&-4;;965 %&&8<<34< 6:B76;68 +=&=?377?-&)?3B66 C 5B9 +=&=?=533335674 %&-4;;965 %&&8<<574 6:B76;68 +=&=?374?@A#?3B66 C 5B9 +=&=?$53333336< %&-4;;9;8 %&&8<;<:8 6:B8:898; +=&=?374?@A#?3B66 C 5B9 +=&=?$53333336< %&-4;;9;8 %&&8<<555 6:B8:898; +=&=?374?@A#?F C 3B66 +=&=?&53333333; %&-4;;:35 %&&8<47<3 6:B8:898; +=&=?374?-&)?3B5 C 3B66 +=&=?E533333334 %&-4;;98; %&&8<476; 6:B8<:78 +=&=?374?-&)?3B66 C 5B9 +=&=?$533333337 %&-4;;94< %&&8<;<8< 6:B8<:78 +=&=?374?-&)?3B66 C 5B9 +=&=?$533333337 %&-4;;94< %&&8<;<<5 6:B8<:78 +=&=?374?-&)?F C 3B66 +=&=?&533333338 %&-4;;9:7 %&&8<479; 6:B8<:78 +=&=?374?-&)?F C 3B66 +=&=?&533333338 %&-4;;9:7 %&&8<47:3 6:B8<:78 +=&=?379?@A#?3B66 C 5B9 +=&=?$533333378 %&-4;;:4: %&&8<;<:4 68B7:33;7 +=&=?379?@A#?3B66 C 5B9 +=&=?$533333378 %&-4;;:4: %&&8<<36; 68B7:33;7 +=&=?379?@A#?F C 3B66 +=&=?&533333373 %&-4;;:8: %&&8<4436 68B7:33;7 +=&=?379?-&)?3B5 C 3B66 +=&=?E533333358 %&-4;;:66 %&&8<4774 68B8;36 +=&=?379?-&)?3B66 C 5B9 +=&=?E533333366 %&-4;;:54 %&&8<;<99 68B8;36 +=&=?379?-&)?3B66 C 5B9 +=&=?E533333366 %&-4;;:54 %&&8<<547 68B8;36 +=&=?379?-&)?F C 3B66 +=&=?&53333336: %&-4;;:7: %&&8<479< 68B8;36 +=&=?37:?#%-?3B5 C 3B66 +=&=?E533333753 %&-4;;::7 %&&8<46<3 55B<;8:55 +=&=?37:?#%-?3B5 C 3B66 +=&=?E533333753 %&-4;;::7 %&&8<4748 55B<;8:55 +=&=?37:?#%-?3B66 C 5B9 +=&=?$533333758 %&-4;;:9< %&&8<<375 55B<;8:55 +=&=?37:?#%-?3B66 C 5B9 +=&=?$533333758 %&-4;;:9< %&&8<<396 55B<;8:55 +=&=?37;?@A#?3B66 C 5B9 +=&=?$5333333:7 %&-4;;;73 %&&8<;<4< 68B4;;:8; +=&=?37;?@A#?3B66 C 5B9 +=&=?$5333333:7 %&-4;;;73 %&&8<<3;6 68B4;;:8; +=&=?37;?@A#?F C 3B66 +=&=?&5333333;4 %&-4;;;79 %&&8<47;9 68B4;;:8; +=&=?37;?@A#?F C 3B66 +=&=?&5333333;4 %&-4;;;79 %&&8<47<< 68B4;;:8; +=&=?37;?#%-?3B5 C 3B66 +=&=?E5333336<9 %&-4;;;87 %&&8<4756 54B<784<; +=&=?37;?#%-?3B66 C 5B9 +=&=?E5333336<4 %&-4;;;4< %&&8<<53< 54B<784<; +=&=?37;?#%-?3B66 C 5B9 +=&=?E5333336<4 %&-4;;;4< %&&8<<59: 54B<784<; +=&=?37;?-&)?3B5 C 3B66 +=&=?E5333336;< %&-4;;;37 %&&8<4773 69B67358 +=&=?37;?-&)?3B66 C 5B9 +=&=?E5333336;: %&-4;;:<< %&&8<<536 69B67358 +=&=?37;?-&)?3B66 C 5B9 +=&=?E5333336;: %&-4;;:<< %&&8<<58; 69B67358 +=&=?37;?-&)?F C 3B66 +=&=?&5333333;5 %&-4;;;57 %&&8<47:4 69B67358 +=&=?37;?-&)?F C 3B66 +=&=?&5333333;5 %&-4;;;57 %&&8<4433 69B67358

PAGE 55

* JH +GHIJ'-5'KLMNOPB -=#*>%?>=$%> -=#*>%?,@ -=#*>%?=AA -&=?=AA +%#* +=&=?37
PAGE 56

* JI +GHIJ'-5'KLMNOPB -=#*>%?>=$%> -=#*>%?,@ -=#*>%?=AA -&=?=AA +%#* +=&=?396?-&)?3B66 C 7 +=&=?$333333876 %&-4;<;:: %&&8<<356 68B3:;<;7 +=&=?394?@A#?3B5 C 3B66 +=&=?E533333435 %&-4<3353 %&&8<4764 66B67:876 +=&=?394?@A#?3B66 C 7 +=&=?$533333438 %&-4<3336 %&&8<;<:6 66B67:876 +=&=?394?@A#?3B66 C 7 +=&=?$533333438 %&-4<3336 %&&8<<367 66B67:876 +=&=?394?@A#?3B66 C 7 +=&=?$533333438 %&-4<3336 %&&8<<368 66B67:876 +=&=?394?@A#?F C 3B66 +=&=?&533333758 %&-4<3369 %&&8<47;8 66B67:876 +=&=?394?#%-?3B66 C 7 +=&=?$53333343; %&-4;<<;: %&&8<<365 :B84:58 +=&=?394?#%-?3B66 C 7 +=&=?$53333343; %&-4;<<;: %&&8<<594 :B84:58 +=&=?394?-&)?3B66 C 7 +=&=?$533333435 %&-4;<<5: %&&8<;<:3 66B5944;7 +=&=?394?-&)?3B66 C 7 +=&=?$533333435 %&-4;<<5: %&&8<<3;; 66B5944;7 +=&=?394?-&)?3B66 C 7 +=&=?$533333435 %&-4;<<5: %&&8<<583 66B5944;7 +=&=?394?-&)?F C 3B66 +=&=?&533333766 %&-4;<<47 %&&8<47<6 66B5944;7 +=&=?398?@A#?3B5 C 3B66 +=&=?E533333795 %&-4<33;< %&&8<46<5 65B;3<6:8 +=&=?398?@A#?3B66 C 7 +=&=?$333333445 %&-4<33;8 %&&8<;<<3 65B;3<6:8 +=&=?398?@A#?3B66 C 7 +=&=?$333333445 %&-4<33;8 %&&8<<35; 65B;3<6:8 +=&=?398?@A#?3B66 C 7 +=&=?$333333445 %&-4<33;8 %&&8<<553 65B;3<6:8 +=&=?398?@A#?F C 3B66 +=&=?&5333355<; %&-4<3563 %&&8<47;6 65B;3<6:8 +=&=?398?@A#?F C 3B66 +=&=?&5333355<; %&-4<3563 %&&8<4454 65B;3<6:8 +=&=?398?#%-?3B66 C 7 +=&=?$333333493 %&-4<3398 %&&8<;<93 ;B7658 +=&=?398?#%-?3B66 C 7 +=&=?$333333493 %&-4<3398 %&&8<<374 ;B7658 +=&=?398?-&)?3B5 C 3B66 +=&=?E533333789 %&-4<3378 %&&8<4763 65B;589< +=&=?398?-&)?3B66 C 7 +=&=?$33333347: %&-4<336< %&&8<;<:< 65B;589< +=&=?398?-&)?3B66 C 7 +=&=?$33333347: %&-4<336< %&&8<<549 65B;589< +=&=?398?-&)?F C 3B66 +=&=?&533335673 %&-4<3387 %&&8<478< 65B;589< +=&=?398?-&)?F C 3B66 +=&=?&533335673 %&-4<3387 %&&8<4795 65B;589< +=&=?399?@A#?3B66 C 7 +=&=?$3333334:: %&-4<3597 %&&8<;<;6 58B35488 +=&=?399?@A#?3B66 C 7 +=&=?$3333334:: %&-4<3597 %&&8<<53: 58B35488 +=&=?399?@A#?F C 3B66 +=&=?&533333<3; %&-4<35;3 %&&8<47;< 58B35488 +=&=?399?-&)?3B66 C 7 +=&=?$3333334:8 %&-4<3564 %&&8<;<:7 58B376:3; +=&=?399?-&)?3B66 C 7 +=&=?$3333334:8 %&-4<3564 %&&8<<39; 58B376:3; +=&=?399?-&)?3B66 C 7 +=&=?$3333334:8 %&-4<3564 %&&8<<5:7 58B376:3; +=&=?399?-&)?F C 3B66 +=&=?&533333<33 %&-4<3546 %&&8<4796 58B376:3; +=&=?39:?-&)?3B66 C 7 +=&=?$5333334<: %&-4<35;7 %&&8<;<<4 56B;77:3; +=&=?39:?-&)?3B66 C 7 +=&=?$5333334<: %&-4<35;7 %&&8<<544 56B;77:3; +=&=?39:?-&)?3B66 C 3B48 +=&=?E5333337;< %&-4<35<6 %&&8<4757 56B;77:3; +=&=?39:?-&)?3B48 C 3B; +=&=?E5333337;8 %&-4<35<7 %&&8<4768 56B;77:3; +=&=?39:?-&)?F C 3B66 +=&=?&533333<85 %&-4<3634 %&&8<47<8 56B;77:3; +=&=?39:?-&)?F C 3B66 +=&=?&533333<85 %&-4<3634 %&&8<4434 56B;77:3; +=&=?39;?@A#?3B66 C 7 +=&=?$5333334;6 %&-4<36<9 %&&8<<35: 59B:;3::6 +=&=?39;?@A#?3B66 C 7 +=&=?$5333334;6 %&-4<36<9 %&&8<<389 59B:;3::6 +=&=?39;?@A#?3B66 C 7 +=&=?$5333334;6 %&-4<36<9 %&&8<<537 59B:;3::6 +=&=?39;?@A#?3B66 C 3B48 +=&=?E533333:4; %&-4<3737 %&&8<46<4 59B:;3::6

PAGE 57

* JG +GHIJ'-5'KLMNOPB -=#*>%?>=$%> -=#*>%?,@ -=#*>%?=AA -&=?=AA +%#* +=&=?39;?@A#?3B48 C 3B; +=&=?E533333:49 %&-4<3734 %&&8<474; 59B:;3::6 +=&=?39;?@A#?F C 3B66 +=&=?&533333<<8 %&-4<3763 %&&8<4458 59B:;3::6 +=&=?39;?#%-?3B66 C 7 +=&=?$5333334:3 %&-4<3673 %&&8<;<4: 9B<89977 +=&=?39;?#%-?3B66 C 7 +=&=?$5333334:3 %&-4<3673 %&&8<<575 9B<89977 +=&=?39;?#%-?3B48 C 3B; +=&=?E533333:8; %&-4<367; %&&8<4736 9B<89977 +=&=?39;?-&)?3B66 C 7 +=&=?$5333334:8 %&-4<3698 %&&8<<56< 59B;7558 +=&=?39;?-&)?3B66 C 7 +=&=?$5333334:8 %&-4<3698 %&&8<<5:5 59B;7558 +=&=?39;?-&)?3B66 C 7 +=&=?$5333334:8 %&-4<3698 %&&8<<5:4 59B;7558 +=&=?39;?-&)?3B66 C 3B48 +=&=?E533333:45 %&-4<36:6 %&&8<475; 59B;7558 +=&=?39;?-&)?3B48 C 3B; +=&=?E533333:7< %&-4<36:7 %&&8<46<: 59B;7558 +=&=?39;?-&)?F C 3B66 +=&=?&533333<;; %&-4<36;8 %&&8<47<5 59B;7558 +=&=?3:3?#%-?3B66 C 7 +=&=?$533333449 %&-4<37:7 %&&8<<344 4B5896;7 +=&=?3:3?#%-?3B66 C 7 +=&=?$533333449 %&-4<37:7 %&&8<<54< 4B5896;7 +=&=?3:3?#%-?3B66 C 3B48 +=&=?E533333:;6 %&-4<37;3 %&&8<46<< 4B5896;7 +=&=?3:3?#%-?3B66 C 3B48 +=&=?E533333:;6 %&-4<37;3 %&&8<473; 4B5896;7 +=&=?3:3?#%-?3B48 C 3B; +=&=?E533333:;3 %&-4<37;6 %&&8<4775 4B5896;7 +=&=?3:3?#%-?F C 3B66 +=&=?&53333537< %&-4<37;; %&&8<443: 4B5896;7 +=&=?3:3?-&)?3B66 C 7 +=&=?$53333348< %&-4<376: %&&8<<578 5
PAGE 58

* J` +GHIJ'-5'KLMNOPB -=#*>%?>=$%> -=#*>%?,@ -=#*>%?=AA -&=?=AA +%#* +=&=?3:9?-&)?3B48 C 3B; +=&=?E533333;57 %&-4<384; %&&8<46;9 67B74;4 +=&=?3:9?-&)?F C 3B66 +=&=?&533335569 %&-4<388: %&&8<4784 67B74;4 +=&=?3:;?@A#?3B66 C 7 +=&=?$533333873 %&-4<39<5 %&&8<<349 5
PAGE 59

* Ja +GHIJ'-5'KLMNOPB' -=#*>%?>=$%> -=#*>%?,@ -=#*>%?=AA -&=?=AA +%#* +=&=?533?-&)?3B66 C 7 +=&=?$533333<97 %&-4<5;79 %&&8<<597 68B64<<9: +=&=?533?-&)?3B66 C 7 +=&=?$533333<97 %&-4<5;79 %&&8<<59< 68B64<<9: +=&=?536?@A#?3B66 C 7 +=&=?$533333<36 %&-4<6356 %&&8<;<96 5
PAGE 60

* JJ +GHIJ'-5'KLMNOPB' -=#*>%?>=$%> -=#*>%?,@ -=#*>%?=AA -&=?=AA +%#* +=&=?566?#%-?3B48 C 3B; +=&=?E533335<4: %&-4<69;8 %&&8<4766 :B587<7: +=&=?566?-&)?3B5 C 3B66 +=&=?E533335<:6 %&-4<6949 %&&8<46<6 69B846:< +=&=?566?-&)?3B66 C 7 +=&=?$533335558 %&-4<6946 %&&8<;<<6 69B846:< +=&=?566?-&)?3B66 C 3B48 +=&=?E533335<;3 %&-4<694: %&&8<473: 69B846:< +=&=?566?-&)?3B48 C 3B; +=&=?E533335<:; %&-4<694; %&&8<4739 69B846:< +=&=?567?#,D?3B5 C 3B66 +=&=?E533335<97 %&-4<6:;5 %&&8<46<7 66B3;846 +=&=?567?#,D?3B66 C 7 +=&=?$5333339;9 %&-4<6::; %&&8<;<89 66B3;846 +=&=?567?#,D?3B66 C 7 +=&=?$5333339;9 %&-4<6::; %&&8<;<<; 66B3;846 +=&=?567?#,D?3B66 C 7 +=&=?$5333339;9 %&-4<6::; %&&8<<55: 66B3;846 +=&=?567?#,D?3B66 C 7 +=&=?$5333339;9 %&-4<6::; %&&8<<58: 66B3;846 +=&=?567?#,D?3B66 C 3B48 +=&=?E533335<93 %&-4<6:;6 %&&8<477: 66B3;846 +=&=?567?#,D?3B48 C 3B; +=&=?E533335<89 %&-4<6:;7 %&&8<475< 66B3;846 +=&=?567?-&)?3B66 C 7 +=&=?$5333339;7 %&-4<6:77 %&&8<<593 69B8:4<5: +=&=?567?-&)?3B66 C 3B48 +=&=?E533335<8; %&-4<6:7; %&&8<4769 69B8:4<5: +=&=?567?-&)?3B48 C 3B; +=&=?E533335<84 %&-4<6:7< %&&8<474: 69B8:4<5: +=&=?564?#,D?3B5 C 3B66 +=&=?E533335<7; %&-4<6;99 %&&8<46;8 68B5<8:96 +=&=?564?#,D?3B66 C 7 +=&=?$5333339:9 %&-4<6;97 %&&8<;<;; 68B5<8:96 +=&=?564?#,D?3B66 C 7 +=&=?$5333339:9 %&-4<6;97 %&&8<<3;4 68B5<8:96 +=&=?564?#,D?3B66 C 7 +=&=?$5333339:9 %&-4<6;97 %&&8<<3;< 68B5<8:96 +=&=?564?#,D?3B66 C 7 +=&=?$5333339:9 %&-4<6;97 %&&8<<595 68B5<8:96 +=&=?564?#,D?3B66 C 3B48 +=&=?E533335<79 %&-4<6;9: %&&8<4747 68B5<8:96 +=&=?564?#,D?3B48 C 3B; +=&=?E533335<74 %&-4<6;9; %&&8<477; 68B5<8:96 +=&=?564?-&)?3B5 C 3B66 +=&=?E533335<7: %&-4<6;5; %&&8<46;: 69B8599<7 +=&=?564?-&)?3B66 C 7 +=&=?$5333339:4 %&-4<6;65 %&&8;;;8: 69B8599<7 +=&=?564?-&)?3B66 C 7 +=&=?$5333339:4 %&-4<6;54 %&&8<<379 69B8599<7 +=&=?564?-&)?3B66 C 7 +=&=?$5333339:4 %&-4<6;54 %&&8<<39< 69B8599<7 +=&=?564?-&)?3B66 C 7 +=&=?$5333339:4 %&-4<6;54 %&&8<<3;3 69B8599<7 +=&=?564?-&)?3B66 C 7 +=&=?$5333339:4 %&-4<6;54 %&&8<<585 69B8599<7 +=&=?564?-&)?3B66 C 3B48 +=&=?E533335<78 %&-4<6;5< %&&8<4755 69B8599<7 +=&=?564?-&)?3B48 C 3B; +=&=?E533335<77 %&-4<6;63 %&&8<46<9 69B8599<7 +=&=?568?#,D?3B66 C 7 +=&=?$533335567 %&-4<6<69 %&&8<<589 67B9::7:8 +=&=?568?-&)?3B5 C 3B66 +=&=?E5333338<6 %&-4<6;<6 %&&8<4744 69B::;<5: +=&=?568?-&)?3B66 C 7 +=&=?$533335565 %&-4<6;;; %&&8<<399 69B::;<5: +=&=?568?-&)?3B66 C 7 +=&=?$533335565 %&-4<6;;; %&&8<<3<5 69B::;<5: +=&=?568?-&)?3B66 C 7 +=&=?$533335565 %&-4<6;;; %&&8<<554 69B::;<5: +=&=?568?-&)?3B66 C 7 +=&=?$533335565 %&-4<6;;; %&&8<<55< 69B::;<5: +=&=?568?-&)?3B66 C 3B48 +=&=?E5333338<3 %&-4<6;<7 %&&8<477< 69B::;<5: +=&=?568?-&)?3B48 C 3B; +=&=?E5333338;; %&-4<6;<4 %&&8<4767 69B::;<5: