Position-specific annotation of protein function based on multiple homologs.
I present in this work an algorithm for deriving protein functional annotations which are position-specific. The input is based on the results of a sequence similarity search of the query sequence against a sequence database. Strings of words are extracted from the descriptions of the proteins, and the correlation between proteins having the same descriptors and the amino acid conservation is used to compute a score that indicates which descriptor is likely to describe better the function of each particular residue. Analysis of the score curves and comparison of different functions allows an easy detection of parts of the sequence associated to different function. Different levels of functional specificity can be compared, allowing to choose the one that suits better the function of the protein. Immediate applications of this algorithm are, support for (automated) methods of protein functional annotation, and database coherence check.