Are knowledge-based potentials derived from protein structure sets discriminative with respect to amino acid types?
The parametric description of residue environments through solvent accessibility, backbone conformation, or pairwise residue-residue distances is the key to the comparison between amino acid types at protein sequence positions and residue locations in structural templates (condition of protein sequence-structure match). For the first time, the research results presented in this study clarify and allow to quantify, on a rigorous statistical basis, to what extent the amino acid type-specific distributions of commonly used environment parameters are discriminative with respect to the 20 amino acid types. Relying on the Bahadur theory, we estimate the probability of error in a single-sequence-structure alignment based on weak or absent discriminative power in a learning database of protein structure. We present the results for many residue environment variables and demonstrate that each fold description parameter is sensitive with respect to only a few amino acid types while indifferent to most of the other amino acid types. Even complex structural characteristics combining solvent-accessible surface area, backbone conformation, and pairwise distances distinguish only some amino acid types, whereas the others remain nondiscriminated. We find that the knowledge-based potentials currently in use treat especially Ala, Asp, Gln, His, Ser, Thr, and Tyr as essentially "average" amino acids. Thus, highly discriminative amino acid types define the alignment register in gapless sequence-structure alignments. The introduction of gaps leads to alignment ambiguities at sequence positions occupied by nondiscriminated amino acid types. Therefore, local sequence-structure alignments produced by techniques with gaps cannot be reliable. Conceptionally new and more sensitive environment parameters must be invented.