Citation
Identification of proteins by detecting post-translational modifications

Material Information

Title:
Identification of proteins by detecting post-translational modifications
Creator:
Nanjundaswamy, Archana
Publication Date:
Language:
English
Physical Description:
ix, 57 leaves : ; 28 cm

Subjects

Subjects / Keywords:
Mass spectrometry ( lcsh )
Post-translational modification ( lcsh )
Mass spectrometry ( fast )
Post-translational modification ( fast )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 53-57).
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Archana Nanjundaswamy.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
63787690 ( OCLC )
ocm63787690
Classification:
LD1193.E52 2004m N36 ( lcc )

Full Text
IDENTIFICATION OF PROTEINS BY
DETECTING POST-TRANSLATIONAL MODIFICATIONS
Archana Nanjundaswamy
B. E., Bangalore University, India, 2000
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
by
Master of Science
Computer Science
2004


This thesis for the Master of Science
degree by
Archana Nanjundaswamy
has been approved
Ellen Gethner
Tom Altman
Ilkyeun Ra
06 16

Date


Nanjundaswamy, Archana (M.S., Computer Science)
Identification of Proteins by Detecting Post-Translational Modifications
Thesis directed by Professor Krzysztof J. Cios
ABSTRACT
Mass Spectrometry (MS) is the first choice methodology for
identification of proteins. Mass Spectrometry is the analysis of proteins, which
depends on the ionization technique creating ions from biomolecules. MS
measures mass-to-charge ratio, giving the molecular weight and the pattern of
peptides derived from the protein. MS is therefore a general method for all
modifications that change the molecular weight of the peptides. To identify the
Proteins the molecular weights of peptides are subjected to Peptide Mass
Mapping, which is means of identifying proteins by comparing observed mass
with predicted masses of digested proteins contained in a database e.g., Mascot
or MSFIT. However, this method does not always give an explanation for all
the mass sequences; a few unmatched masses can arise as a result of
contaminated proteins, unexpected cleavages, or common chemical
modifications. Post-translational modifications or chemical modifications are
the consequence of covalent linkage of a chemical group to specific amino
acids of the protein. Few common post-translational modifications are
phosphorylation, acetylation, glucosylation etc. The thesis hypothesis is to
develop a software solution to enhance the ability to identify proteins which
are subjected to post-translational modifications and to give an explanation for
the peptide fragments which did not match the database.
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
Signed
rzysztof J. Cios
in


ACKNOWLEDGEMENTS
I would like to take this opportunity to thank my advisor Dr. Krzysztof (Krys)
Cios for his encouragement, support, guidance and comments.
I would also like to thank Allison Gehrke, Srdjan Askovic, and Dr. Mark
Duncan for helping me in understanding the biological knowledge, and my
committee members Drs. Tom Altman, Ilkyeun Ra and Ellen Gethner.


CONTENTS
Figures................................................. vii
Tables.................................................. ix
CHAPTER
1. INTRODUCTION........................................ 1
2. BACKGROUND.......................................... 3
Proteome...................................... 3
Proteins and Amino Acids...................... 3
Mass Spectrometry............................. 6
Gel Electrophoresis........................... 8
One-Dimensional Gel Electrophoresis.. 8
Two-Dimensional Gel Electrophoresis.. 9
Comparison between 1-D and 2-D Gel 12
Electrophoresis......................
Protein Identification Using Peptide Mass 13
Fingerprinting.............................
Mascot Database/Program.................... 14
3. POST-TRANSLATIONAL MODIFICATIONS................... 17
Phosphorylation.............................. 19
v


Acetylation............................... 21
Glucosylation (Glycation)................. 22
Deamidation............................... 23
Hydroxylation............................. 24
O-GlcNAc.................................. 25
4. PROCEDURE....................................... 27
5. RESULTS......................................... 33
Input Data................................ 33
Implementation............................ 34
Results after Implementation.............. 39
6. FUTURE WORK AND CONCLUSION...................... 45
APPENDIX
A. AMINO ACIDS.................................... 47
BIBLIOGRAPHY...............................................53
vi


FIGURES
Figure
2.1. The Structure of Amino Acid............................ 4
2.2. Perseptive Biosystems Voyager DE-PRO MALDI-TOF .. 6
2.3. Mass Spectrometry...................................... 7
2.4. 1 -D Gel Electrophoresis............................... 9
2.5. 2-D Gel Electrophoresis............................... 11
2.6. Mascot Search Results Page 1.......................... 15
2.7. Mascot Search Results Page 2.......................... 16
3.1. Phosphorylation....................................... 20
3.2. Acetylation........................................... 21
3.3. Glucosylation......................................... 22
3.4. Deamidation........................................... 24
3.5. Hydroxylation......................................... 25
3.6. O-GlcNAc.............................................. 26
5.1. Mascot Search Screen.................................. 34
5.2. Results HTML File................................... 35
5.3. Results Peptide Sequence............................ 36
vii


5.4. Results Unmatched Masses......................... 37
5.5. Results Unmatched Sequences...................... 38
5.6. Results 1 ......................................... 41
5.7. Results 2.......................................... 42
5.8. Results 3.......................................... 43


TABLES
Tables
2.1 Theoretical mass values of the 20 amino acids................ 5
3.1 Some common and important post-translational modifications.. 18
5.1 Some results obtained after the test against the PTMs........ 44
IX


CHAPTER 1
INTRODUCTION
Proteomics is a scientific discipline, which detects proteins that are
associated with a disease by means of their altered levels of expression
between control and disease states. It also helps in defining the functions and
interrelationships of proteins in an organism. It enables correlations to be
drawn between the range of proteins produced by a cell or tissue and the
initiation or progression of a disease state. Mass Spectrometry is one of the
techniques used to identify proteins. However, most of the protein samples
exhibit some modifications. These modifications could be chemical
modifications that occur either co-translational or post-translational to modify
the functions of the proteins. Modifications may change both chemical and
physical properties of the proteins like their behavior, mass and ionization
efficiency. Modifications include proteolytic cleavage, oxidation of some
amino acids or cross-linking events.
Post-Translational Modifications (PTMs) are processing events that
change the properties of a protein by proteolytic cleavage or by addition of a
modified group to one or more amino acids. The goal of the project is detect
such modifications and enhance the ability of identifying the proteins. [M
Mann et al., 2003]
1


The process of identifying the proteins can be outlined as follows:
Complex protein mixtures are separated based on physical techniques such
as two-dimensional gel electrophoresis.
The protein is removed from the gel and digested using an enzyme, usually
trypsin.
The masses of the tryptic fragments are experimentally determined by mass
spectrometry.
For each known protein (i.e., all entries in the protein database) the
computer performs an in silico digestion to yield the set of predicted
masses for each entry.
Submit the mass sequences to Mascot. The experimentally determined
molecular weights for each protein are compared to the predicted
molecular weights of every protein in the database to determine the best
match.
This returns a positive protein IDs and a set of unmatched sequences.
Possible protein IDs are sorted based on a number of criteria including how
well the predicted molecular weights match the measured molecular
weights, average mass errors and number of peptides matching.
The unidentified masses are tested against the PTMs and checked whether
the unidentified masses belongs the positively identified protein. And thus
give a justification to the unidentified masses and identify the proteins.
2


CHAPTER 2
BACKGROUND
Proteomics enables correlations to be drawn between the range of
proteins produced by a cell or tissue and the initiation or progression of a
disease state.
Proteome
One of the challenges in proteomics is to identify and quantify the
component of the protein called the proteome. The proteome is very dynamic
and many protein types can be identified from a single sequence. The
collection of proteins in a cell type is called the cellular proteome. The
proteome for an organism can be defined as the complete set of proteins from
all of the various cellular proteomes. The term proteome has also been used to
refer to the collection of proteins in certain sub-cellular biological systems.
Proteins and Amino Acids
Proteins are the end products of genes and are the molecules that
perform all functions in cells and tissues. Changes in proteins in cells and
tissues can result in disease. That is why we often compare normal and disease
tissues to determine which proteins are present. The differences in protein
3


expression that are observed can lead to better understanding of the disease
which in turn leads to improvements in diagnostics and therapy. Proteins are
comprised of 20 different amino acids that are joined together in a string. The
average size of a protein is 316 amino acids. Most commonly, proteins are
represented using single letter codes in a string that represents the order in
which amino acids are bound together. In reality, proteins are more
complicated than just a string of amino acids. An amino acid can be
represented by either a single letter code or a single letter. For example, alanine
is represented by Ala or A, proline by Pro or P, arginine by Arg or R
etc.Amino acids are organic molecules that have a -COOH and -NH2 groups.
This is important as these groups are used when amino acids are joined
together to form a protein. -OH is lost from the -COOH group, while -H is lost
from the -NH2 group, and the resulting -CO- and -NH- bind together to form
what is called the peptide bond (-CO-NH-), while OH and H will go on to form
a water molecule (H20). This is known as a water loss. [Kleinsmith, J. L., et
al., 1995]
H
H
O
\
H
I
C C
I
R
.N C C OH
Figure 2.1 The Structure of Amino Acid
(Source: http://www.rock.sr.unh.edu/ha_p/aminoacid.gif)
4


Name Abbreviation Average Mass Mono-isotopic mass
Alanine A 71.0788 71.03711
Cysteine C 103.1448 103.00919
Aspartic Acid D 115.0886 115.02694
Glutamic Acid E 129.1155 129.04259
Phenylalanine F 147.1766 147.06841
Glycine G 57.0520 57.02146
Histidine H 137.1412 137.05891
Isoleucine I 113.1595 113.08406
Lysine K 128.1742 128.09496
Leucine L 113.1595 113.08406
Methionine M 131.1986 131.04049
Asparagine N 114.1039 114.04293
Proline P 97.1167 97.05276
Glutamine Q 128.1308 128.05858
Arginine R 156.1876 156.10111
Serine S 87.0782 87.03203
Threonine T 101.1051 101.04786
Valine V 99.1326 99.06841
Tryptophan W 186.2133 186.07931
Tyrosine Y 163.1760 163.06333
Table 2.1 Theoretical mass values of the 20 amino acids
5


Mass Spectrometry
Mass Spectrometry (MS) allows for analysis of peptides and proteins.
It depends on the ionization technique which creates ions from biomolecules.
MS investigates the peptide sequences because the peptides can be easily
removed from the 2DE gel and it provides information for identification with
MS. MS includes 3 devices: an ionization device, a mass analyzer and a
detector. A common ionization technique is the Matrix-Assisted Laser
Desorption Ionization (MALDI) and electrospray ionization. The MALDI
technique has been used to identifying the proteins by peptide mass
fingerprinting. [Rolan S A., 1995; Aebersold R et al., 2003, Jonscher K R et
al., 1997, J R Yates, 1998]
Figure 2.2 Perseptive Biosystems Voyager DE-PRO MALDI-TOF
(Source: http://biomol.uchsc.edu/researchFacilities/MSCore/instrumentation.html)
6


The MALDI-Time of Flight MS (MALDI-TOF MS) is the most
common technology used to identify of proteins. MALDI utilizes the energy
from the lasers ionize biomolecules. The standard configuration consists of
ion-mirror i.e., the reflector, the delayed extraction and the post-source decay
capability. 1-D gel electrophoresis and 2-D gel electrophoresis gives separate
proteins from the sample of tissues or cells. A band in 1-D gel and a spot in 2-
D gel represent a protein. Trypsin is then used to digest the protein in-gel into
peptides. This sample is then spotted onto a steel MALDI plate and subjected
to MALDI-TOF MS. [Aebersol R, Mann M., 2003; Ong S et al., 2003, Perkel
J. M., 2001, G Cagney et al., 2003; Susanne C M et al., 2002]
Figure 2.3 Mass Spectrometry
(Source: http://keck.med.yale.edu/prochem/images/spectra.gif)
The masses obtained after the Mass Spectrometry are of the format
568.13036 615.36546 636.31313 767.37906 800.47471 833.39365 848.46861
853.47844 855.48474 870.55085 875.45598 917.49718 930.53735 946.51218
7


960.52326 975.46485
1061.53807 1072.55128
1267.60844 1281.6569
1519.3783 1544.79235
1798.91328 1905.06347
1016.5226 1021.51935
1121.58132 1124.59484
1344.71873 1390.76292
1640.87637 1684.9121
1029.56535 1056.61629
1133.58026 1220.63081
1484.78217 1518.86207
1765.89032 1792.77029
The analysis of peptides to identify proteins is not straightforward
using MALDI-TOF because the peptide mass fingerprinting does not yield the
peptide sequence information and the presence of the isobaric peptides in the
sample complicates the analysis.
Gel Electrophoresis
One-Dimensional Gel Electrophoresis
One dimensional gel electrophoresis commonly known as
electrophoresis is the technique that uses a buffer and a gel to separate proteins
using an electrical charge. The process separates the proteins based on their
size and shape. The very first methods of this technique used fans to dry pieces
of paper that had stains of the protein bands on them. In 1960, a modified
version of this technique was developed called SDS-polyacrylamide-gel
electrophoresis. This method used a gel as the matrix through which the
proteins moved. This method is the common method that cell biologists and
8


geneticists use to view protein bands. Furthermore, this method became
popular and more known during the trail of O.J. Simpson when his DNA was
tested using this technique. [H W Lahm et al., 2000]
_ fm jig)

mm*- (Mfcf J
Tli
? i :: 1 F ~
m m
wr A ^
Figure 2.4 1-D Gel Electrophoresis
(Source: http://www.kendricklabs.com/lDgel.gif)
Two-Dimensional Gel Electrophoresis
O'Farrell developed two-dimensional gel electrophoresis in 1975
(Kleinsmith, 1995). It uses a procedure called isoelectric focusing, which
separates polypeptide chains depending on the surrounding pH and the charge
of the protein (negative or positive). Also important to understand is the term
SDS. SDS stands for sodium dodecyl sulfate, which is the negatively charged
detergent that binds to hydrophobic regions of the protein molecules, causing
them to unfold into extended polypeptide chains, release form other proteins,
9


and become free in the negatively charged solution (Alberts, 1994). The
proteins in two-dimensional gel electrophoresis are separated the second time
by their molecular weight and therefore many more proteins can be seen.
The first step of two-dimensional gel electrophoresis involves
separating the proteins based on their intrinsic charge. The sample is dissolved
in a solution containing uncharged detergent (Alberts, 1994). The polypeptide
chains are then separated by a solution called isoelectric focusing (see above).
In this process, there is a characteristic pH for each protein called the
isoelectric point. At this point the protein has no net charge and therefore will
not migrate in the electric field (Alberts. 1994). In the isoelectric focusing, the
proteins are electrophoresed in a tube with polyacrylamide gel and the pH is
established by the use of buffers (Alberts, 1994). Since the isoelectric points
have been established, the proteins, when electrophoresed, will move to their
specific spot that corresponds to their specific isoelectric point. This concludes
the first part of the two-part process.
In the second part of the process, the gel containing the proteins is
separated again. However, this time they are electrophoresed in a direction at
right angles compared to those of the first step. Now, the SDS as explained
above is added and the proteins are separated according to their size (molecular
10


weight). At the end of this process, all of the proteins should be accounted for.
However, if there are proteins that are left over they will be the ones that have
the identical size and the identical isoelectric point on the gel. This is a rare
situation and is detected by using different staining procedures.
Autoradiography can also be used if the protein samples were initially labeled
with a radioisotope (Alberts). This concludes the methods for the two-
dimensional gel electrophoresis. [Stephen J F, 2001; Patterson S et al., 1995;
W Wan et al., 2003]
Figure 2.5 2-D Gel Electrophoresis
(Source: http://www.pierroton.inra.fT/genetics/2D/meganong.jpg)
11


Comparison between
1-D and 2-D Gel Electrophoresis
Though one-dimensional gel electrophoresis is popular and used widely
from the classroom to the laboratory, it limits the number of proteins that can
be seen because of overlaps and peaks of protein bands. In contrast, two-
dimensional gel electrophoresis uses two different procedures to separate the
protein bands, and more than 1000 proteins can be distinguished as opposed to
only a handful with the one-dimensional (Alberts, 1994).
12


Protein Identification Using
Peptide Mass Fingerprinting
The important method to identify proteins is by peptide mass
fingerprinting. A very common way to generate protein fingerprints is using
gel electrophoresis to separate proteins which are excised from the gel,
digested enzymatically, and subjected to mass spectrometry. The mass of
peptides obtained from an in-gel proteolytic digestion are checked and
searched against the protein database (Mascot). This method does not always
give explanation for all the mass sequence; a few unmatched masses can arise
due to the contamination of proteins, unexpected cleavage, or common
chemical modifications. Protein Identification by peptide mass fingerprinting
sometimes has its own limitations. Sometimes small acidic proteins will not
yield sufficient peptides and this leads to unambiguous protein identification.
Many different proteins can be identified from a single peptide sequence due to
the constituent of the protein known a proteome, which is very dynamic and
very complex. Protein sub-species increase because of the post-translational
modifications including Phosphorylation, glycosylation etc. More than 250
modifications are discovered till date. One single protein can be modified by
multiple PTMs.
13


Mascot Database/Program
Mascot is a search engine to identify the proteins. It searches the
primary sequence databases to identify the proteins when the mass
spectrometry data is submitted to it. The peptides which are subjected to
MALDI are outputted as series of peaks, and each of these peaks indicate
molecular mass of the peptides. The molecular weights of peptides are
subjected to Peptide Mass Mapping, which is means of identifying proteins by
comparing observed mass with predicted masses of digested proteins contained
in a database and Mascot is used for the process. [Perkins D N. et al., 1999]
The search is done in the following way:
> Submit the given mass sequences to Mascot. The mass sequences are of
the form
> The program compares experimentally determined masses with the
predicted ones and finds the best fit.
Mascot search provides
> Number of mass values searched.
> Number of masses matched
> Peptide sequences which the masses represented.
> Number of masses unmatched.
> Peptide masses that were identified
> Peptide masses that were unidentified
14


Iscience} Mascot Search Results
User
Email
Search title
Database
Timestamp
Top Score
Archana
archan_n@yahoo.com
NCBInr 20040304 <1733159 sequences; 560824170 residues)
17 Apr 2004 at 20:49:32 GMT
182 for gi|17390900, Ina protein [Mus musculus]
Probability Based Mows* Score
Ions score is -10*Log(P), where P is the probability that the observed match is a random event.
Protein scores greater than 75 are significant (p<0.05).
50 100 150 200
Probability Based Mowse Score
ail 17390900 Mass: 55349 Score : 182 Expect: 1. le-12 Queries matched: 31
Ina protein [Hus musculus]
ail 1168394 Mass: 55836 Score : 156 Expect: 4.4e10 Queries matched: 29
Alpha-internexin (Alpha-Inx) (66 kDa neurofilament protein) (Neurof1lament- 66) (NF-66)
ail 343283 68 Mass: 55353 Score : 154 Expect: 6.9e-10 Queries matched: 26
internexin neuronal intermediate filament protein, alpha [Hus musculus]
ail 1703221 Mass: 56082 Score : 131 Expect: 1.4e-07 Queries matched: 27
Alpha-internex in (Alpha-Inx)
ail 55622 Mass: 55541 Score : 121 Expect: 1.4e-06 Queries matched: 26
alpha-internexin [Rattus norvegicus]
Oil 14249342 Mass: 55357 Score : 64 Expect: 0.66 Queries matched: 22
internexin neuronal intermediate filament protein, alpha; neurofilament S (66kD); neurofilament-66,
Oil 4651181 Mass: 5030 Score: 40 Expect: 1.7e+02 Queries matched: 5
polyprotein [Hepatitis C virus]
2. qj142656056 Mass: 10356 Score: 52 Expect: 11 Queries matched: 8
hypothetical protein XP_378824 [Homo sapiens]
Search Parameters
Type of search
Enzyme
Mass values
Protein Mass
Peptide Mass Tolerance
Peptide Charge State
Max Missed Cleavages
Humber of gueries
: Peptide Mass Fingerprint
: Trypsin
: Monoisotopic
: Unrestricted
: 1 Da
: 1+
: 1
: 53
Figure 2.6 Mascot Search Results Page 1
15


{science} Mascot Search Results
Protein View
Hatch to: gi117390900 Score: 182 Expect: l.le-12
Ina protein plus musculus]
Nominal mass (Hr) : 55349; Calculated pi value: 5.35
NCBI BLAST search of criI 17390900 against nr
Unformatted sequence string for pasting into other applications
Taxonomy: Hus musculus
Cleavage by Trypsin: cuts C-term side of KR unless next residue is P
Number of mass values searched: 53
Number of mass values matched: 31
Sequence Coverage: 59%
Hatched peptides shown in Bold Red
1 HSFGSEHYLC SASSYRKVFG DSSRLSARLS GPGGSGSFRS QSLSRSNVAS
51 TAACSSASSL GLGLAYRRLP ASDGLDLSQA AARTNEYKII RTNEKEQLQG
101 LIIDRFAVFIE KVHQLETQNR ALEAELAALR QRHAEPSRVG ELFQRELREL
151 RAQLEEASSA RAQALLERDG LAEEVQRLRA RCEEESRGRE GAERALKAQQ
201 RDVDGATLAR LDLEKKVESL LDELAFVRQV HDEEVAELLA TLQASSQAAA
251 EVDVAVAKPD LTSALREIRA QYESLAAKHL QSAEEWYKSK FAHLHEQAAR
301 STEAIRASRE EIHEYRRQLQ ARTIEIEGLR GAHESLERQI LELEERHSAE
351 VAGYQDSIGQ LESDLRNTKS EHARHLREYQ DLLHVKHALD IEIAAYRKLL
401 EGEETRFSTG GLSISGLHPL PHPSXLLPPR ILSSTASKVS SAGLSLKKEE
451 EEEEEEASKE VSKKTSKVGE GFEETLGEAV ISTKKTGKSA TEESTSSSQK
501 H
Ho match to: 568.13, 636.31, 833.39, 855.48, 870.55, 960.52, 1220.63, 1281.66, 1346.72, 1684.91, 1792.77, 1905.06,
I 0.75-:-
b 0.5 --
t--1--1--1---1--1--1-1rr
1000 2000
S?1S error 113
iiii-----ri-1
3000 4000
Hass (Da)
Figure 2.7 Mascot Search Results Page 2
16


CHAPTER 3
POST-TRANSLATIONAL MODIFICATIONS
Proteins undergo a huge number of post translational modifications
(PTMs). PTMs or chemical modifications are the result of covalent linkage of
chemical group to amino acid in the protein. PTMs are very important for
correct position of proteins when related to the cell envelope, replication and
transcription. PTMs are often overlooked in proteome analysis even after their
visibility in 2DGE gels, because on an average there may be five modification
variants. PTMs of a protein can determine its activity state, localization,
turnover, and interactions with other proteins. All protein PTMs are associated
with either an increase or a decrease in mass. Only few of these modifications
are reversible, for example phosphorylation, acetylation, glucosylation etc.,
Mass changes due to some PTMs of peptides and proteins are specified in
Table 3.1 [M. Mann et al., 2003; MR. Wilkins et al., 1999; Ficarro S.B. et al.,
2002; http://us.expasy.org; Jie Zhou et al., L.J. Jensen et al., 2002, M J Dutt et
al., 2000, Qin J et al., 1997]
The experimentally determined peptide masses are matched against
database (Mascot). Protein modifications can create peptides, and these masses
will not appear in the list of theoretical masses used in the database searching.
17


The unmatched sequences are inspected for mass differences compared
with the expected peptides that correspond to a modification.
PTM tvne AMass (Da) Function and notes
Phosphorylation 80 Reversible, activation/inactivation of enzyme activity, modulation of molecular interactions, signaling
Acetylation 42 Protein stability, protection of N terminus. Regulation of protein-DNA interactions
O-linked Glycosylation 203 O-GlcNAc Reversible, Regulatory functions.
Glucosylation 162 Protein stability and protein- ligand interactions
Deamidation 1 Possible regulator of protein- protein interation.
Hydroxylation 16 Protein stability
Table 3.1 Some common and important post-translational modifications


Phosphorylation
Phosphorylation of proteins is one of the most studied PTMs because in
a typical mammalian cell as one-third of proteins are phosphorylated.
Phosphorylation is a reaction in which a phosphate group is attached to a
protein. It is a common way of regulating the activity of proteins. The
phosphate group causes a structural change in the protein so that the protein
will bind or release some other molecule. Phosphorylation on serine (S),
threonine (T) and tyrosine (Y) residues is an extremely important modulator of
protein function. The ratio of phosphorylation of the three different amino
acids is approximately 1000/100/1 for serine/threonine/tyrosine. Although the
level of tyrosine phosphorylation is minor, the importance of phosphorylation
of this amino acid is profound. Therefore, there is a great need for methods
capable of accurately explain sites of phosphorylation. This modification
suppresses the ionisation of the peptide. The analysis of phosphoproteins is
not straight forward because the phosphorylated sites on the proteins vary.
Phosphorylation is heterogenous and most phosphoprotiens go through
phosphorylation on more than one residue and because of this the molecules of
one protein are identically phosphorylated. [D.T. McLachlin et al., 2001; J.X,
Yan et al., 1998; MR. Wilkins et al., 1999; H. Kovarova et al., 2002, M Mann
et al.]
19


The aim here is to identify phosphopeptides based on the characteristic
mass shift owing to loss of phosphate (80 Da or multiples)
Figure 3.1 Phosphorylation
(Source: http://www.gravitywaves.com/chemistry)
The unmatched sequences after the database search are verified against
phosphorylation. To each predicted peptide sequence mass difference due to
phosphorylation is added to check if the peptide was phosphoralyzed. By doing
this if we can find a mass sequence which is given by Mascot as unmatched
matching with our calculated peptide, and then it is removed from the
unmatched list and placed under the matched list.
20


Acetylation
Acetylation is one of the commonly occurring PTMs. Acetylation
regulates many functions which includes the DNA recognition, protein -
protein interaction and protein stability. Acetylation of the histone proteins
alters the histone DNA interaction and results in a transcriptional regulation.
Due to acetylation the amino terminal of the protein gets modified giving the
peptide a mass shift of 42. Proteins are modified at their N-terminal. In most
cases the initiator methionine is hydrolyzed and an acetyl group is added to the
new N-terminal amino acid. This modification results in an increase in the
ionization of the peptide. [B. Kuster et al., 1998; T. Kouzarides, 2000]
Here except for the amino acids N, K, R, H, F, W, Y all other amino
acids following the N Terminal are suspected to be modified.
H20
II
0
0
acetate
Figure 3.2 Acetylation
(Source: http://www.gravitywaves.com/chemistry)
21


Glucosylation (Glycation)
Glucosylation is the Post Translational Modification in which a
nonenzymatic reaction that happens due to the addition of sugar aldehyde or
ketone to the amino group of proteins. Glucosylation promotes inter and
intraprotein linkages. It changes the biological activity; it affects the cross-
linkage, and aggregation of proteins. Due to glycation some of the functional
functions of proteins are modified. [Dennis JW et al., 1999; Ramneek Gupta et
al.; R. Blakytny et al., 1992]
H H H H H H
! I I I I I
H-C-C-C-C~C~C=0
till!
OH OH OH H OH
Glucose
H H H H H
I I I I I
h-c-c-c-c-c-c
I i I I II I
OH OH OH H O H
peptide bond
i
H II II H H N-H
[ I I I I I
+ N-C-C-C-C-C-H
i i i i i i
H H H II H C = O
I
peptide bond
Lysine (in protein)
t
peptide bond
I
II H H H H N-H
I I I I 1 I
N-C-C-C-C-C-H
I I I I I
H H H H C O
I
peptide bond
Figure 3.3 Glucosylation
(Source: www.benbest.com/lifeext/ aging.html)
22


Deamidation
Deamidation is a common post translational modification resulting in
the conversion of asparaginyl residue and glutaminyl residues. Deamidation
provides a signal for protein degradation and therefore regulates the
intracellular activity. Deamidation has been characterized and observed in
wide variety of proteins. It regulates the time dependent biological processes.
Due to deamidation the amino terminal of the protein gets modified
giving the peptide a mass shift of 0.98. Proteins are modified at their N-
terminal and Q-terminal. But for the N-terminal modification, the N residue
needs to be followed by a G residue, i.e., the proteins which has undergone
deamidation and which are the site characterized, glycine has far and away
been the most common (N + 1) neighboring residue. [N.E. Robinson et al.,
2001; N E Robinson, 2002]
23


NH,
o=i
O HjC O R O
Asparagine
OH
O H,c 0 R 0
Aspartic Add
R 0
0=0 -NH-CH-C-NH '
0 H.C 9
v/ ^-hh-cSh-c-oh
IsoAspartic Acid
Figure 3.4 Deamidation
(Source: http://www.ionsource.com/Card/Deamidation/deamidation.htm)
Hydroxyl ation
Hydroxylation is a common post translational modification, where the
protein is modified due to the attachment of at least one hydroxyl (-0H) group.
Proline and Asparagine are the amino acids which are the target of this specific
Post translational modification. Due to hydroxylation amino terminal of the
protein gets modified giving the peptide a mass shift of 15.99. Amino acids P,
K, D and N are modified due to hydroxylation. [P Masini et al., 2002; Andrew
C. Gill et al., 2000]
24


Figure 3.5 Hydroxylation
(Source: http://www.gravitywaves.com/chemistry)
O-GlcNAc
In Eukaryotic cells many proteins are glycoproteins because the
proteins contain the oligosaccharide chains that are covalently linked to the
amino acids. Glycosylation affects proteins in a lot of ways, they affects the
protein stability, biological activity, protein folding etc. The Glycosylation
affects both the intracellular and secreted proteins.
Protein glycosylation is divided into four types depending on the
linkage within the amino acid and the sugar. They are N-Linked glycosylation,
O-linked glycosylation; C-Mannosylation and GPI anchor attachments, re-
linked glycosylation takes place when the sugar gets added to the amino acid
of asparagines. It modifies the membrane and the secreted proteins. This
25


influences the protein folding. O-glycosylation occurs when the sugar gets
attached to the hydroxyl group of serine or the threonine residue. O-GlcNAc
has both the protein and the site specific influence on biochemistry and
metabolism of the cell. O-GlcNAc plays an important role in the regulated
protein-protein interaction. [Dennis JW et al., 1999; Ramneek Gupta et al.,
Keith Vosseller et al., Kelly WG et al., 1998]
Figure 3.6 O-GlcNAc
(Source: http://bric.postech.ac.kr/webzine/content/review/indivi/2003/! l/3_figl.jpg)
26


CHAPTER 4
PROCEDURE
The procedure which is followed to identify the protein by matching
the experimentally determined peptide masses against the list of peptide
masses expected by submitting it to Mascot database. Mascot does not verify
the mass sequence against the post-translational modifications effectively,
therefore after doing the search in the Mascot; we need to subject the output
(unmatched mass sequence) to check whether they have undergone any sort of
Post Translational Modification. If it has then we need to remove the Post
Translational Modifications to identify the protein.
The steps that are followed to automate the whole procedure are as
follows:
Submit the mass sequences to Mascot, which returns a positive protein and
a set of unmatched sequences.
Automate a system that retrieves the results page in the HTML format,
remove the HTML tags.
27


Parse the results page and retrieve the peptide sequences, unmatched mass,
and matched mass. Copy each of the above retrieved results into separate
text files.
Parse the peptide sequences and remove the peptide sequences that are
already matched by Mascot.
Now all we have are the unmatched masses, matched masses and the
unmatched peptide sequence.
Subject each of the unmatched peptide sequence to phosphorylation by
finding the S, T and Y amino acids in the predicted peptide sequence and
reducing the molecular weight by 80. Recalculated the weight of the
peptide sequence check whether that mass sequence is shown as
unmatched sequence. If it finds a match for any of the unmatched sequence
remove that specific sequence from the unmatched sequence list and also
remove that specific mass from the unmatched mass list.
Subject the predicted peptide sequence to acetylation by finding N amino
acid. Because except for the amino acids N, K, R, H, F, W, Y all other
amino acids following the N Terminal are suspected to be modified.
Reduce the molecular weight of the predicted peptide sequence by 42.
Recalculated the weight of the peptide sequence check whether that mass
sequence is shown as unmatched sequence. If it finds a match for any of
28


the unmatched sequence remove that specific sequence from the
unmatched sequence list.
Subject the predicted peptide sequence to glucosylation by finding N, T,
and K amino acids. All the amino acid at the N, T, and K Terminal are
suspected to be modified. Reduce the molecular weight of the predicted
peptide sequence by 162. Recalculated the weight of the peptide sequence
check whether that mass sequence is shown as unmatched sequence. If it
finds a match for any of the unmatched sequence remove that specific
sequence from the unmatched sequence list.
Subject the predicted peptide sequence to deamidation by finding N and Q
amino acids. All the amino acid at Q Terminal is suspected to be modified
and all the amino acid at N Terminal followed by amino acid G is
suspected to be modified. Reduce the molecular weight of the predicted
peptide sequence by 0.98. Recalculated the weight of the peptide sequence
check whether that mass sequence is shown as unmatched sequence. If it
finds a match for any of the unmatched sequence remove that specific
sequence from the unmatched sequence list.
Subject the predicted peptide sequence to hydroxylation by finding P, K, D
and N amino acids. All the amino acid at P, K, D and N Terminal is
suspected to be modified. Reduce the molecular weight of the predicted
peptide sequence by 16. Recalculated the weight of the peptide sequence
29


check whether that mass sequence is shown as unmatched sequence. If it
finds a match for any of the unmatched sequence remove that specific
sequence from the unmatched sequence list.
Subject the predicted peptide sequence to O-GlcNAc by finding S, T and N
amino acids. All the amino acid at S, T and N Terminal is suspected to be
modified. Reduce the molecular weight of the predicted peptide sequence
by 203. Recalculated the weight of the peptide sequence check whether
that mass sequence is shown as unmatched sequence. If it finds a match for
any of the unmatched sequence remove that specific sequence from the
unmatched sequence list.
The pseudo code for the algorithm used to automate the system to
detect the post-translational modification is as follows:
INPUT: Mass List of Proteins: P
OUTPUT: Positively identified Protein
SEQ: peptide sequences returned by Mascot
UMass: Unmatched masses
Mass: Matched masses
For each protein p in P
Result(HTML file)'* submit p to Mascot
Remove the HTML tags from the file.
UMass, SEQ, Mass < Parse the file
30


Parse the SEQ remove the matched sequences and obtain the USEQ which
gives the unmatched peptide sequence.
// Post Translational Modifications
// Phosphorylation
For each peptide sequence PS in the unmatched sequences USEQ
For each peptide S in the unmatched peptide sequence PS
seq < PhosphorylationCheck (S)
m < Calculate the mass of the peptide sequence
If m is one mass of UMass Then
Mark m as found, Remove from the list of unmatched masses
Endlf
EndFor
// Acetylation
For each peptide S in the unmatched peptide sequence PS
seq < AcetylationCheck (S)
m < Calculate the mass of the peptide sequence
If m is one mass of UMass Then
Mark m as found, Remove from the list of unmatched masses
Endlf
EndFor
// Glucosylation
For each peptide S in the unmatched peptide sequence SEQ
seq < GlucosylationCheck (S)
m < Calculate the mass of the peptide sequence
If m is one mass of UMass Then
Mark m as found, Remove from the list of unmatched masses
Endlf
EndFor
// Hydroxylation
For each peptide S in the unmatched peptide sequence SEQ
seq * HydroxylationCheck (S)
m < Calculate the mass of the peptide sequence
If m is one mass of UMass Then
Mark m as found, Remove from the list of unmatched masses
Endlf
31


EndFor
// Deamidation
For each peptide S in the unmatched peptide sequence SEQ
seq < DeamidationCheck (S)
m < Calculate the mass of the peptide sequence
If m is one mass of UMass Then
Mark m as found, Remove from the list of unmatched masses
Endlf
EndFor
// O-GlcNAc
For each peptide S in the unmatched peptide sequence SEQ
seq O-GlcNAcCheck (S)
m < Calculate the mass of the peptide sequence
If m is one mass of UMass Then
Mark m as found, Remove from the list of unmatched masses
Endlf
EndFor
EndFor
32


CHAPTER 5
RESULTS
The MS data used here was obtained from the University of Colorado
Health Sciences Center data, which we used as input to our automated system.
Input Data
The data is for the form:
550.12437 568.12278 614.41047 651.37805 666.00304 870.54967
919.46884 986.52513 997.5255 1074.56884 1172.66589 1191.64913
1210.59741 1211.58869 1228.64032 1233.62776 1316.64717
1329.62588 1430.69556 1460.76334 1467.56151 1480.76014
1512.77068 1528.74657 1544.77405 1552.78448 1566.79329
1567.31225 1588.86631 1604.86754 1659.90255 1677.8113 1709.87013
1816.00509 1836.92834 1887.97864 1932.96073 1934.02413
1974.91597 1999.08935 2016.06845 2148.99738 2163.01675
2177.95542 2224.04113 2225.10851 2283.16345
33


Implementation
The program submits the above data to Mascot which is as follows:
MASCOT Peptide Mass Fingerprint
Your name jArchana
Search title
Database NCBInr
Taxonomy i All entries
Enzyme | Trypsin
Ml
Fixed
modifications
Protein mass
kDa
Mass values MH+ Om
Email | archan_n@yahoo.com
Allow up to [lRI missed cleavages
AB old ICATdO (C) IaJ Variable AB old ICATdO (C) Uj
AB old ICATd8 (C) C3 modifications AB old ICATd8 (C) Q
Acetyl (K) Acetyl (K)
Acetyl (N-term) . Acetyl (N-term)
Amide (C-term) Amide (C-term) g£j
Peptide tol. 1.0
Da
m
Monoisotopic Average O
Data file [ [l Browse... )
Query 2365.3358 In
NB Contents 2497,34211
of this field 2560.24987
are ignored if 2807.29641
a data file 3931.24277 [M
is specified. 3946.02749
Overview Report top | AUTO tel hits
Start Search ... | [ Reset Form |
Figure 5.1 Mascot Search Screen
34


The HTML results file is retrieved, which is of the form
# hunt Halted
Me"'fidt''Format view He*


Masco1 Search Resulls: Protein Vww</TmE><br /> </HEAO><br /> <body 6ecOLOR-iwtir aunk-'mbcoit vuNK=*axnir><br /> H1><IMSSRC=* ./ima8esfl8i<31 logo whhe.giT W1DTH= W HEIGT=-3r<br /> AUGN=TQP BOROER=TF NATURALS IZEFLAG='3> Maso Seerch Reaulfs<IHt<br /> uelf<br /> *SBaC<br /> H3>P(0)eid View</H3<br /> <FORM METHOO-POST><br /> <dNPLTT TYPE='l>idden" NAME=W VALUE= ./data)03102&F(WD!J dart_><br /> <INPUT TYPE="Ndden" NAME="tiit VALUE=t "<br /> cPJPUT TYPE^tadden' NAME=*ehowair VALUE=*false><br /> <WPUT TYPE*ladden'' NAM£-prolscote" VALUE* TTs<br /> <FONT FAC'Courier N'*lCpi4nr,mon5p-ac><PRE>Match to <B8i|257<2753<r>. Score <8>165</8><br /> <0>heal shock ?0k0 protein 5 [Ratlus norys5cosJ<tB><br /> Nominal mass (M<SU6>r</SUB>J. <B>72Ki2<>'B>. Calculated pi value: <8>5.07</B><br /> NCBl BLAST search of <A<br /> HREF-Mtp:AW(r, ncti, nlm. nih govdjtastfBlasI cgi?AllGNMEOT550LAUGNMENT VIEWPifns*LAJJTQ FORMATS*fli<br /> autoACDD SEARCN=on&CL£NT=yibACOMPOSmON BASED 5TATlSTlCS=(m!,DATA8ASE=nr8eESCTPTONS=100&E<br /> NTREZ_QUERY=(nons)8£XPCT=108ftTER=L&FORMAT_8LOCK_ON_RESPAGE=N()n9&FOfiMAT_08JEC'T=Alignmerrl&<br /> FORMAT TYP6=HTML&GAPCOSTS=l1+iai THRESHOLD 1 S,tAYOUT=TwoVVSr>daws&MATRK NAME=BLOSUM62&NCBI<br /> GI==on*PAGE=P<olemsSJ>R(X5RAM=MaslptQUEfiY=MI<FTWAAAUJCAVRAEEEDI<KEOVlCTWGlDljGTTySCVGVFI<N<br /> GRVEIlANOOGNRrfPSYVAFTPEGERUGOAAKNOLTSWENTVFOAKRUGRTWMDPSVOOOIKFLPFKWEKKTKPYtQVDIGG<br /> GOTKTFAPCEISAMVLTKMKETAEAYLGKKVTHAVVTVPAYFtdAORCSSATHDAGTIAGLNVtdRIIMEPTAAAlAYGLDKREGEhJ'tl<br /> LVFDLGGGTFDVSliTONGVFEWATNGCrTHLGGEDFDGRVMERFIkLYKKKTGKDVRWDNRAVGMLRRE'itKAkcRALSSQHO<br /> ARIECSFFEGEOFSETLTRAKFEELNhCLFRSTMKPVOKVLEDSDLKKSOIDEM-VGGSTRIPKlQQLVKEFFNGFEPSRGINPO i*,]<br /> Figure 5.2 Results HTML File<br /> 35<br /> <br /><br /> This file is parsed and all the information is removed except for the<br /> following information:<br /> WsHIWBee Notepad OhB<br /> File Ed* Format View Help <br /> iSJ<br /> 1 MKRWAAAL LLLCAVRAEE EDKKEDVGTV VGIDLGTTYS CVGVFKNGRV<br /> 51 EIIANDQGNR ITPSYVAFTP EGERLIGDAA KNQLTSNPEN 7VFDAKRLIG<br /> 101 RTWNDPSVQQ DIKFLPFKW EKKTKPYIQV DIGGGQTKTF APEEISAMVL<br /> 151 TKMKETAEAY LGKKVTHAW TVPAYFNDAQ RQATKDAGTIAGLNVMRIIN<br /> 201 EPTAAAIAYG LDKREGEKNI LVFDLGGGTF DVSLLTIDNG VFEWATNGD<br /> 251 THLGGEDFDQ RVMEHFIKLY KKKTGKDVRK DNRAVQKLRR EVEKAKRALS<br /> 301 SQHQARIEIE SFFEGEDFSE TLTRAKFEEL NMDLFRSTMK PVQKVLEDSD<br /> 351 LKKSDIDEIV LVGGSTRIPKIQQLVKEFFN GKEPSRGINP DEAVAYGAAV<br /> 401 QAGVLSGDQD TGDLVLLDVC PLTLGIE7VG GVMTKLIPRN TWPTKKSQI<br /> 451 FSTASDNQPT VTIKVYEGER PLTKDNHLLG TFDLTGIPPA PRGVPQIEVT<br /> 501 FEIDVNGILR VTAEDKGTGN KNKITITNDQ NRLTPEEIER MVNDAEKFAE<br /> 551 EDKKLKERID TRNELESYAY SLKNQIGDKE KLGGKLSPED KETMEKAVEE<br /> 601 KIEWLESHQD ADIEDFKAKK KELEEIVQPIISKLYGSGGP PPTGEEDTSE<br /> 651 KDEL<br />  <br /> Residue Number <br /> Increasing Mass <br /> Decreasing Mass ,<br /> Figure 5.3 Results Peptide Sequence<br /> 36<br /> <br /><br /> The Unmatched masses are retrieved from the HTML File and stored in<br /> a text file.<br /> Figure 5.4 Results Unmatched Masses<br /> 37<br /> <br /><br /> The program removes the peptide sequences that were matched by the<br /> Mascot search engine and retrieve the unmatched peptide sequences.<br /> 'sk: unmatchseq Notepad<br /> File Edit Format View Help<br /> MK<br /> F7WAAALLLLCAVR<br /> AEEEDK<br /> K<br /> E DVG7WGID LGTTY S C VG VF K<br /> NGR<br /> LIGDAAK<br /> WEK<br /> K<br /> TFAPEEISAMVLTK<br /> MK<br /> ETAEAYLGK<br /> QATK<br /> DAGTIAGLNVMR<br /> EGEK<br /> NILVFDLGGGTFDVSLLTIDNGVFEWATNGDTHLGGEDFDQR<br /> VMEHFIK<br /> LYK<br /> K<br /> K<br /> TGK<br /> DVR<br /> K<br /> DNR<br /> Figure 5.5 Results Unmatched Sequences<br /> 38<br /> <br /><br /> We will subject the each of the above obtained unmatched sequences to<br /> the Post-Translational Modifications Test, i.e., Phosphorylation, Acetylation,<br /> Glucosylation, Deamidation, Hydroxylation, and O-GlcNAc.<br /> After each test the system will recalculate masses and check it against<br /> the unmatched mass list. If a match is found then the mass sequence and the<br /> data mass are removed from the respective files and the process continues till<br /> the check is made for all the unmatched sequences.<br /> Results after Implementation<br /> Here one of the implementation of the automation of identification of<br /> proteins after detecting the Post-Translational Modification is explained.<br /> 47 masses are given as input to Mascot. Out of 47 masses Mascot could<br /> identify 28 masses. The masses were identified as the protein gi|25742763 -<br /> heat shock 70kD protein 5.<br /> The following masses were not identified by the Mascot: 550.12,<br /> 568.12, 666, 870.55, 1172.67, 1211.59, 1233.63, 1329.63, 1467.56, 1480.76,<br /> 1528.75, 1544.77, 1552.78, 1709.87, 1932.96, 2163.02, 2224.04, 2225.11, and<br /> 2283.16.<br /> 39<br /> <br /><br /> The unidentified masses were filtered out of the peptide sequence file<br /> and subjected to the Post Translational Modification tests. After the Post<br /> Translational Modification tests we could identify the following mass<br /> sequences: 666, 1172.67, 1233.63, 1467.56, 1552.78, and 2225.11. With more<br /> matching masses now we can claim that the Protein that was identified by<br /> Mascot as correct.<br /> 40<br /> <br /><br /> The graphical representation of some of the results is as follows:<br /> Figure 5.6 Results 1<br /> Number of Masses: 51<br /> Number of Masses Matched by Mascot: 31<br /> Number of Unmatched Masses: 20<br /> Number of Masses matched after the PTM test: 7<br /> 41<br /> <br /><br /> 2500<br /> -Given Mass -<br /> Matched by Mascot <br /> -Unmatched by Mascot <br /> -Matched after PTM test<br /> Figure 5.7 Results 2<br /> Number of Masses: 47<br /> Number of Masses Matched by Mascot: 28<br /> Number of Unmatched Masses: 19<br /> Number of Masses matched after the PTM test: 6<br /> 42<br /> <br /><br /> 4500<br /> 0<br /> - Matched by Mascot *<br /> - Unmatched by Mascot Matched after PTM test |<br /> Figure 5.8 Results 3<br /> Number of Masses: 53<br /> Number of Masses Matched by Mascot: 31<br /> Number of Unmatched Masses: 22<br /> Number of Masses matched after the PTM test: 7<br /> 43<br /> <br /><br /> # of Masses Matched by Mascot Identified Protein Matched after PTM Test<br /> 53 31 gi| 17390900 7<br /> 47 28 gi|25742763 6<br /> 51 31 gi|26353794 7<br /> 47 20 gi|40254595 9<br /> 55 27 gi| 14389431 6<br /> Table 5.1 Some results obtained after the test against the PTMs<br /> 44<br /> <br /><br /> CHAPTER 6<br /> FUTURE WORK AND CONCLUSION<br /> Identification of proteins is important as it helps in detecting diseases.<br /> Since Post-Translational Modifications modify the activities of most of the<br /> eukaryote proteins they play an important role in the identification of proteins.<br /> Challenges include obtaining large amount of sequences of proteins<br /> from complex mixtures.<br /> Future work includes automated submission of large amount of masses<br /> to the search engine. This will reduce the work of checking each mass<br /> sequence for the identification of proteins. In the future we could incorporate<br /> this software, which identifies the Post-Translational Modifications, in a search<br /> engine which uses Mass Spectrometry data to identify proteins.<br /> To conclude, we have developed an automated system which submits<br /> the peptide masses to an existing search engine and retrieves the unmatched<br /> 45<br /> <br /><br /> masses and check them to see whether we can detect a few Post-Translational<br /> Modifications. In the future we plan to make a search engine which will<br /> automatically take a large amount of data as input and output more accurately<br /> identified proteins after checking against all possible Post-Translational<br /> Modifications.<br /> 46<br /> <br /><br /> APPENDIX A.<br /> AMINO ACIDS<br /> (Source: http://www.agsci.ubc.ca/courses/fnh/410/protein/l_12.htm)<br /> Amino Acid Abbreviation Structure<br /> Alanine Ala (A) coo- 1 hH3N C H CH3<br /> Arginine Arg(R) COO- 1 +H3N C H Chb 1 Chh 1 ch2 N H c=nh2+ Iih2<br /> Asparagine Asn (N) COO- +H3NC H ch2 c # \ 0 nh2<br /> 47<br /> <br /><br /> Aspartate Asp (D) coo- 1 +H3NC H ch2 C /V<br /> Cysteine Cys (C) COO- 1 +H3N C H ch2 1 SH<br /> Glutamate Glu (E) coo- 1 +H3NC H ch2 ch2 1 /V<br /> Glutamine Gin (Q) COO- +H3NC H ch2 ch2 c o^ nh2<br /> 48<br /> <br /><br /> Glycine Gly (G) C00- 1 fH3NC H H<br /> Histidine His (H) C00- +H3NC H ch2 C CH +HN llH ^ / c H<br /> Isoleucine He (I) coo- 1 +H3N C H HC CH3 ch2 ch3<br /> Leucine Leu (L) COO- 1 "fH3N C H ch2 CH /\ h3c ch3<br /> 49<br /> <br /><br /> 50<br /> <br /><br /> Proline Pro (P) C00- +H2N C H H2^ ^/CH2 XCH2<br /> Serine Ser(S) coo- +H3N C H HC OH H<br /> Threonine Thr (T) coo- +H3N C H HC OH ch3<br /> Tryptophan Trp (W) COO- hl3NC H 1 CH2 c<br /> 1 v\ / XN/ H<br /> 51<br /> <br /><br /> 52<br /> <br /><br /> BIBLIOGRAPHY<br /> Aebersold, R., & Mann, M. (2003). Mass spectrometry-based proteomics.<br /> Nature, 422 (6928), 198-207.<br /> Andrew, C. G., Ritchie, M. A., Hunt, L. G., Steane, S. E., Davis, K. G.,<br /> Booking, S. P., Rhie, A. G. O., Bennett, A. D., & Hope, J. (1999).<br /> Post-translational hydroxylation at the n-terminus of the prior protein<br /> reveals presence of pi structure in vivo. Journal of the American Society<br /> of Mass Spectrum, 10, 91-103.<br /> Bernhard, K., & Mann, M. (1998). Identifying proteins and post-translational<br /> modifications by mass spectrometry. Current Opinion Journals, 8(3),<br /> 393-400.<br /> Blakytny, R., & Harding, J. J. (1992). Glycation (non-enzymic glycosylation)<br /> inactivates glutathione reductase. Biochem Journal, 288 (1), 303-7.<br /> Cagney, G., Amiri, S., Premawaradena, T., Lindo, M., & Emili, A. (2003). In<br /> silico proteome analysis to facilitate proteomics experiments using<br /> mass spectrometry. Proteome Science, 7(1), 5.<br /> Cheung, W. L., Briggs, S. D., & Allis, C. D. (2000). Acetylation and<br /> chromosomal functions. Current Opinion Journals, 72(3), 326-33.<br /> Cios K., Pedrycz, W., & Swiniarski, R.W. (1998). Data mining methods for<br /> knowledge discovery. Norwell, MA: Kluwer Academic Publishers.<br /> Crieghton, T. E. (1993). Proteins: Structures and molecular properties. New<br /> York, NY: W.H. Freeman & Company.<br /> Dennis, J. W., Granovsky, M., & Warren, C. E. (1999). Protein glycosylation<br /> in development and disease. Bioessays, 21 (5), 412-21.<br /> 53<br /> <br /><br /> Derek, T. M., & Brian, T. C. (2001). Analysis of phosphorylated proteins and<br /> peptides by mass spectrometry. Current Opinion Chemical Biology,<br /> 5(5), 591-602.<br /> Duncan, M., Fung, K., Wang, H., Yen, C., & Cios, K. (2003). Identification of<br /> contaminants in proteomics mass spectrometry data procedure of the<br /> computational systems bioinformatics (CBS03). IEEE Computer<br /> Society, 409-410.<br /> Eng, J. K., McCormack, A. L., & Yates, J.R. (1994). An approach to correlate<br /> MS/MS data to amino acid sequences in a protein database. Annual<br /> Review Biochemistry, 5(11), 976-89.<br /> Graves, J. D., & Krebs, E. D. (1999). Protein phosphorylation and signal<br /> transduction. Subcell Biochemistry, 26(2), 115-64.<br /> Gupta, R., & Brunak, S. (2002). Prediction of glycosylation across the human<br /> proteome and the correlation to protein function. Science, 246(4926),<br /> 64-71.<br /> Hanover, J. A. (2001). Glycan-dependent signaling: O-linked N-<br /> actylglucosamine. FASEB Journal, 15, 1865-1876.<br /> Jensen, L. J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C.,<br /> Nielsen, H., Staerfeldt, H. H., Rapacki, K., Workman, C., Andersen, C.<br /> A., Knudsen, S., Krogh, A., Valencia, A., & Brunak, S. (2002).<br /> Prediction of human protein functions from post-translational<br /> modifications and localization features. Journal of Molecular Biology,<br /> 319(5), 1257-65.<br /> John, J. H. (2002). Viewing molecular mechanisms of ageing through a lens.<br /> Ageing Research Reviews, 5(1), 465-479.<br /> Jonscher, K. R. & Yates, J. R. (1997). Matrix-assisted laser desorption<br /> ionization/quadrupole ion trap mass spectrometry of peptides,<br /> application to the localization of phosphorylation sites on the p protein<br /> from sendai virus. Journal of Biological Chemistry, 272(3), 1735-41.<br /> 54<br /> <br /><br /> Kelly, W. G., & Hart, G. W. (1998). Glycosylation of chromosomal proteins:<br /> localization of o-linked n-acetylglucosamine in drosophila chromatin<br /> cell. Cell, 57(2), 243-51.<br /> Kleinsmith, L. J., & Kish, V. M. (1995). Principles of Cell and Molecular<br /> Biology. New York: Benjamin-Cummings Publishing Company.<br /> Kouzarides, T. (2000). Acetylation: A regulatory modification to rival<br /> phosphorylation? The EMBO Journal, 19(6), M16-9.<br /> Kukuruzinska, M. A., Bergh, M. L., & Jackson, B. J. (1997). Protein<br /> glycosylation in yeast. The PNAS Journal, 84(8), 2145-2149.<br /> Lahm, H. W., & Langen, H. (2000). Mass spectrometry: A tool for the<br /> identification of proteins separated by gels. Electrophoresis, 27(11),<br /> 2105-14.<br /> Mann, M., & Jensen, O. N. (2003). Proteomic analysis of post-translational<br /> modification. Nature Biotechnology, 21(3), 255-61.<br /> Mann, M., Ong, S., Grpnborg, M Steen, H., Jensen, O. N & Pandey, /<br /> (2002). Analysis of protein phosphorylation using mass spectrome'<br /> deciphering the phosphoproteome. Trends Biotechnology, 20(6), 2'<br /> Masini, P., & Bemasconi, M. (2002). Ab initio simulations of hydr'<br /> and dehydroxylation reactions at surfaces: Amorphous<br /> brucite. Journal of Physics, 14, 4133-4144.<br /> Michael, J. D & Lee, K. H. (2000). Proteomic analysis. EU<br /> 818-825.<br /> Moyer, S. C., MarzilYi, L. A., Woods, A. S Laiko, V.<br /> & Cotter, R. J. (2002). Atmospheric pres'<br /> desorption/ionization (ap maldi) on a<br /> spectrometer. Journal American Socief<br /> 214-283.<br /> Ong, S., Foster, L. J., & Mann, M<br /> approaches in quantitative prote<br /> 55<br /> <br /><br /> BIBLIOGRAPHY<br /> Aebersold, R., & Mann, M. (2003). Mass spectrometry-based proteomics.<br /> Nature, 422 (6928), 198-207.<br /> Andrew, C. G., Ritchie, M. A., Hunt, L. G Steane, S. E Davis, K. G.,<br /> Booking, S. P., Rhie, A. G. 0., Bennett, A. D & Hope, J. (1999).<br /> Post-translational hydroxylation at the n-terminus of the prior protein<br /> reveals presence of pi structure in vivo. Journal of the American Society<br /> of Mass Spectrum, 10, 91-103.<br /> Bernhard, K., & Mann, M. (1998). Identifying proteins and post-translational<br /> modifications by mass spectrometry. Current Opinion Journals, 8(3),<br /> 393-400.<br /> Blakytny, R., & Harding, J. J. (1992). Glycation (non-enzymic glycosylation)<br /> inactivates glutathione reductase. Biochem Journal, 288 (1), 303-7.<br /> Cagney, G., Amiri, S., Premawaradena, T., Lindo, M., & Emili, A. (2003). In<br /> silico proteome analysis to facilitate proteomics experiments using<br /> mass spectrometry. Proteome Science, 7(1), 5.<br /> Cheung, W. L., Briggs, S. D., & Allis, C. D. (2000). Acetylation and<br /> chromosomal functions. Current Opinion Journals, 72(3), 326-33.<br /> Cios K., Pedrycz, W., & Swiniarski, R.W. (1998). Data mining methods<br /> knowledge discovery. Norwell, MA: Kluwer Academic Publishers.<br /> Crieghton, T. E. (1993). Proteins: Structures and molecular propertie<br /> York, NY: W.H. Freeman & Company.<br /> Dennis, J. W., Granovsky, M., & Warren, C. E. (1999). Protein gly<br /> in development and disease. Bioessays, 21 (5), 412-21.<br /> 53<br /> <br /><br /> . T r r?non Analysis of phosphorated proteins ar<br /> ^ere*t> peptides^ by^mass spectrometry. Current Opinion Chemical Biolog.<br /> 5(5), 591-602.<br /> Duncan, M Fung, K Wang, H Yen, C & Cios, K. (2003). Identification.<<br /> contaminants in proteomics mass spectrometry data procedure of th<br /> computational systems bioinformatics (CBS03). IEEE Compute<br /> Society, 409-410.<br /> Eng, J. K., McCormack, A. L., & Yates, J.R. (1994). An approach to correlate<br /> MS/MS data to amino acid sequences in a protein database. Annua<br /> Review Biochemistry, 5(11), 976-89.<br /> Graves, J. D., & Krebs, E. D. (1999). Protein phosphorylation and signal<br /> transduction. Subcell Biochemistry, 26(2), 115-64.<br /> Gupta, R & Brunak, S. (2002). Prediction of glycosylation across the human<br /> proteome and the cotreiation to pmtein funclio, Science<br /> Hanover, J.<br /> N-<br /> *r> J. A. (2001). Glycan-dependent sionor<br /> actylglucosamine. FASEB Journal, 15, 18654876 ^ 0',inked<br /> Jensen L j., GuPfa> K Blom, N., Devos D T<br /> Nielsen, H StaerfelHf u o S Sanies T v<br /> X<br /> 3I9(S), 1257-65.<br /> Jhn, I, H<br /> . (2002) Vi <br /> Ageing /> e^7/]g rrini<br /> aPplic,t;<br /> from to<br /> S' *<br /> ^ k<br /> % y %<br /> w X<br /> %AJ, it<br /> V\V5<br /> <br /><br /> </div> </td> </tr> </tr> </table> </section> <!-- Hidden field is used for postbacks to indicate what to save and reset --> <input type="hidden" id="item_action" name="item_action" value="" /> </form> <script type="text/javascript" src="http://cdn.sobekrepository.org/includes/jquery-ui-draggable/1.10.3/jquery-ui-1.10.3.draggable.min.js"></script> <!-- Close microdata itemscope div --> </section> <!-- Footer divisions complete the web page --> <div id="aurariaitem-footer"> <table id="aurariaitem-footer-table"> <tr style="vertical-align: top;"> <td style="width:260px;"> <span class="auraria-footerOffWhite">©Auraria Library</span><br /> <div id="sobek">Powered by <a href="http://sobekrepository.org">SobekCM</a></div> </td> <td style="text-align:center;"><a href="http://library.auraria.edu/siteindex/">Library Site Index</a> | <a href="http://library.auraria.edu/info/frequently-asked-questions-faqs">Library FAQs</a> | <a href="http://library.auraria.edu/services/researchhelp">Ask Us</a> | <a href="http://library.auraria.edu/contact/comments">Send a Comment</a></td> <td style="text-align:right;width:260px;"><img src="http://cdn.sobekrepository.org/instances/auraria/logo-footer.png" class="aurariaitem-logo" alt="Aurarian Library"/></td> </tr> </table> </div> </body> </html>