Protein
Identification With Mass Spectrometry Data
Introduction
Edman degradation was the method of choice
for protein identification prior to affordable mass spectrometry based
ID methods. Edman degradation
is still a very powerful technique. With Edman sequencing, amino
acids are cleaved from the N-terminus of a peptide or protein, and each
amino acid is then chromatographed using a 20 to 50 min HPLC gradient.
Identification is based on correlating the retention time of the eluting amino
acid to a standard chromatogram. The power of the Edman technique is that
the exact sequence can often be determined, and there is no confusion as
in MS, with some amino acids having isobaric mass. The Edman technique is simple,
and powerful, and....slow, and it usually identifies one protein at a
time. On average it takes about seven cycles of the sequencer to
uniquely identify a protein in a sequence database, if you are running a 50 min
gradient, that's about six hours for one protein ID!
Identification of proteins by mass spectrometry uses peptide
masses or the MS/MS fragmentation of a peptide to identify proteins. In stark contrast mass spectrometry can easily ID 10-50
proteins in about 30 min! Here are a few of the most popular MS
ID
techniques.
- Peptide Mass Fingerprinting : A protein is first digested
with an enzyme and the peptide masses are then used to search a
sequence database.(1,2)
-
- Sequence Tag: A peptide is fragmented in a mass
spectrometer, and then a short stretch of amino acids is determined.
With this technique, the parent mass of the peptide, the sequence of the
tag, and the starting and ending mass of the tag, is used to search a
sequence database. Proteins can be correlated with the
fragmentation of a single peptide using this technique.(3)
-
- MS/MS Peptide Identification: A peptide is
fragmented in a mass spectrometer and the fragment ion masses are then
used to
search a sequence database. Proteins can be correlated with the
fragmentation of a single peptide using this technique.(4)
With all of these techniques the larger the set of peptides identified for a given protein the better the
identification. Identification is really too strong a term, perhaps
correlation is more appropriate. Throughout the tutorial we will be
using the terms, identification, ID, and correlation interchangeably. For example, if you can only identify a single
peptide, then you may only be able to narrow the search down to a family
of proteins. If you can only correlate a portion of the sequence you will
never be entirely sure if you have identified the protein in the database
or an entirely new splice variant with an entirely different
function.
If you are interested in MS ID
philosophy and strategy please continue on, if you want to get directly to
the techniques and the examples browse back to the table
of contents and jump ahead to the techniques.
|
References:
-
Henzel
WJ, Billeci TM, Stults JT, Wong SC, Grimley C, Watanabe C. Identifying
proteins from two-dimensional gels by molecular mass searching of peptide
fragments in protein sequence databases. Proc Natl Acad Sci U S
A. 1993 Jun 1;90(11):5011-5.
-
Henzel
WJ, Watanabe C, Stults JT. Protein identification: the origins of
peptide mass fingerprinting. J Am Soc Mass Spectrom. 2003
Sep;14(9):931-42.
-
Mann
M, Wilm M., Error-tolerant identification of peptides in
sequence databases by peptide sequence tags. Anal Chem. 1994
Dec 15;66(24):4390-9.
-
Yates
JR 3rd, Eng JK, McCormack AL, Schieltz D., Method
to correlate tandem mass spectra of modified peptides to amino acid
sequences in the protein database.
Anal Chem. 1995 Apr 15;67(8):1426-36.
|