Prediction of potential GPI-modification sites in proprotein sequences.
1999 Sep 24; 292(3): 741-58. PubMed: 10497036.
Abstract + PDF
Glycosylphosphatidylinositol (GPI) lipid anchoring is a common posttranslational modification known mainly from extracellular eukaryotic proteins. Attachment of the GPI moiety to the carboxyl terminus (omega-site) of the polypeptide follows after proteolytic cleavage of a C-terminal propeptide. For the first time, a new prediction technique locating potential GPI-modification sites in precursor sequences has been applied for large-scale protein sequence database searches. The composite prediction function (with separate parametrisation for metazoan and protozoan proteins) consists of terms evaluating both amino acid type preferences at sequence positions near a supposed omega-site as well as the concordance with general physical properties encoded in multi-residue correlation within the motif sequence. The latter terms are especially successful in rejecting non-appropriate sequences from consideration. The algorithm has been validated with a self-consistency and two jack-knife tests for the learning set of fully annotated sequences from the SWISS-PROT database as well as with a newly created database "big-Pi" (more than 300 GPI-motif mutations extracted from original literature sources). The accuracy of predicting the effect of mutations in the GPI sequence motif was above 83 %. Lists of potential precursor proteins which are non-annotated in SWISS-PROT and SPTrEMBL are presented on the WWW-page http://www.embl-heidelberg.de/beisenha/gpi/gpi_p rediction. html The algorithm has been implemented in the prototype software "big-Pi predictor" which may find application as a genome annotation and target selection tool.
PSIC: profile extraction from sequence alignments with position-specific counts of independent observations.
Sequence weighting techniques are aimed at balancing redundant observed information from subsets of similar sequences in multiple alignments. Traditional approaches apply the same weight to all positions of a given sequence, hence equal efficiency of phylogenetic changes is assumed along the whole sequence. This restrictive assumption is not required for the new method PSIC (position-specific independent counts) described in this paper. The number of independent observations (counts) of an amino acid type at a given alignment position is calculated from the overall similarity of the sequences that share the amino acid type at this position with the help of statistical concepts. This approach allows the fast computation of position-specific sequence weights even for alignments containing hundreds of sequences. The PSIC approach has been applied to profile extraction and to the fold family assignment of protein sequences with known structures. Our method was shown to be very productive in finding distantly related sequences and more powerful than Hidden Markov Models or the profile methods in WiseTools and PSI-BLAST in many cases. The profile extraction routine is available on the WWW (http://www.bork.embl-heidelberg. de/PSIC or http://www.imb.ac.ru/PSIC).