Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions.
Grid-free protein folding simulations based on sequence and secondary structure knowledge (using mostly experimentally determined secondary structure information but also analysing results from secondary structure predictions) were investigated using the genetic algorithm, a backbone representation, and standard dihedral angular conformations. Optimal structures are selected according to basic protein building principles. Having previously applied this approach to proteins with helical topology, we have now developed additional criteria and weights for beta-strand-containing proteins, validated them on four small beta-strand-rich proteins with different topologies, and tested the general performance of the method on many further examples from known protein structures with mixed secondary structural type and less than 100 amino acid residues. Topology predictions close to the observed experimental structures were obtained in four test cases together with fitness values that correlated with the similarity of the predicted topology to the observed structures. Root-mean-square deviation values of C alpha atoms in the superposed predicted and observed structures, the latter of which had different topologies, were between 4.5 and 5.5 A(2.9 to 5.1 A without loops). Including 15 further protein examples with unique folds, root-mean-square deviation values ranged between 1.8 and 6.9 A with loop regions and averaged 5.3 A and 4.3 A, including and excluding loop regions, respectively.