Sequence-based factors influencing the expression of heterologous genes in the yeast Pichia pastoris-A comparative view on 79 human genes.
High yield expression of heterologous proteins is usually a matter of "trial and error". In the search of parameters with a major impact on expression, we have applied a comparative analysis to 79 different human cDNAs expressed in Pichia pastoris. The cDNAs were cloned in an expression vector for intracellular expression and recombinant protein expression was monitored in a standardized procedure and classified with respect to the expression level. Of all sequence-based parameters with a possible influence on the expression level, more than 10 were analysed. Three of those factors proved to have a statistically significant association with the expression level. Low abundance of AT-rich regions in the cDNA associates with a high expression level. A comparatively high isoelectric point of the recombinant protein associates with failure of expression and, finally, the occurrence of a protein homologue in yeast is associated with detectable protein expression. Interestingly, some often discussed factors like codon usage or GC content did not show a significant impact on protein yield. These results could provide a basis for a knowledge-oriented optimisation of gene sequences both to increase protein yields and to help target selection and the design of high-throughput expression approaches.