Prediction of nonsynonymous single nucleotide polymorphisms in human disease-associated genes.
Analysis of human genetic variation can shed light on the problem of the genetic basis of complex disorders. Nonsynonymous single nucleotide polymorphisms (SNPs), which affect the amino acid sequence of proteins, are believed to be the most frequent type of variation associated with the respective disease phenotype. Complete enumeration of nonsynonymous SNPs in the candidate genes will enable further association studies on panels of affected and unaffected individuals. Experimental detection of SNPs requires implementation of expensive technologies and is still far from being routine. Alternatively, SNPs can be identified by computational analysis of a publicly available expressed sequence tag (EST) database following experimental verification. We performed in silico analysis of amino acid variation for 471 of proteins with a documented history of experimental variation studies and with confirmed association with human diseases. This allowed us to evaluate the level of completeness of the current knowledge of nonsynonymous SNPs in well studied, medically relevant genes and to estimate the proportion of new variants which can be added with the help of computer-aided mining in EST databases. Our results suggest that approx. 50% of frequent nonsynonymous variants are already stored in public databases. Computational methods based on the scan of an EST database can add significantly to the current knowledge, but they are greatly limited by the size of EST databases and the nonuniform coverage of genes by ESTs. Nevertheless, a considerable number of new candidate nonsynonymous SNPs in genes of medical interest were found by EST screening procedure.