Applying experimental data to protein fold prediction with the genetic algorithm.
Specific residue interactions as revealed from a few and readily available experiments can be quite important in shaping a protein's tertiary topology by complementing basic and general folding principles. This experimental information is employed in structure prediction (mainchain topology) based on sequence knowledge and the genetic algorithm with its ability to optimize simultaneously many parameters. Examples investigated include the distribution of cysteinyl S-S bonds, protein side-chain ligands to iron-sulfur cages, cofactor-ligands, crosslinks amongst side-chains, and conserved hydrophobic and catalytic residues. Such interactions yield an improvement in the predicted topology (0.4-6.6 A root mean square deviation in the positions of the backbone C alpha-atoms relative to those observed) compared with those resulting from simulations relying only on basic protein folding principles. For several examples the resultant topology depended critically on knowledge of the few and specific interactions such that the relationship between predicted and observed C alpha-positions was near random without their use. The combined methodology (experimental data and the genetic algorithm) should prove helpful in settings where experiment and theory can cooperate in successive steps to elucidate an unknown structure.