EUROFINS | Medigenomix GmbH
Page in German Search Sitemap

Gene Synthesis – Codon Usage Adaptation (background information)


The following article was published in the magazine 'mensch+umwelt spezial' (vol. 17, 2005, in german) of the GSF – National Research Center for Environment and Health, Neuherberg/Munich (www.gsf.de).


Synthetic genes for expression in plants

Dr. Pfeiffer, Dr. Köhler
Dr. Matthias Pfeiffer (left) and Dr. Uwe Köhler are members of Eurofins Medigenomix GmbH.

by Matthias Pfeiffer and Uwe Köhler

The chemical synthesis of nucleic acids makes it possible to develop complete genes de novo. In a single experimental approach, several partially overlapping short nucleic acids (oligonucleotides) are chemically synthesized and enzymatically fused by ligase or polymerase reactions. Using this method, DNA fragments of arbitrary sequence and length can be produced. These fragments can then be translated into the appropriate proteins after transfer into plant cells.

Before synthesis, the gene sequence can be adapted in silico, i.e. at the computer, to the needs of the host organism. At this point, all factors are considered which will unfavourably affect the efficiency of the protein expression. For example, secondary structures which could prematurely stop the protein synthesis from occuring. On the other hand, important positive acting elements, such as signal peptides or introns (non coding sequences) which often have regulatory functions in plants, can be integrated into the sequence.

The most important factor, however, is the adjustment of the gene to the "codon preference" of the host organism. The genetic code is a triplet code (that is, three nucleotides on the DNA code for one amino acid), and with few exceptions, is identical in all organisms. There are four different nucleotides (G, A, T and C) and thus for a triplet code 64 (43) codon variants are possible. However, three of these codons do not code for amino acids, but rather serve as stop signals for translation. Altogether there are 20 different amino acids, which are coded for by the 61 remaining triplets. This means that for most amino acids two, three, four or even six different codons are available. Therefore a gene, which carries the information for a distinct protein, can be quite variable in its codon composition.

Sequence comparisons show that prokaryotic, eukaryotic and viral species often prefer certain kinds of alternative triplets. In general, many plants have GC-rich coding sequences, while animal genes are rather AT-rich. Bacteria vary dependent on the respective strain. For example, in E. coli K12 the triplet AAA (and not AAG) is the preferred codon for the amino acid lysine. 75 % of the triplets coding for lysine are AAA. In contrast, in the monocotyledon plant Zea mays (maize) codon AAG is the preferred triplet for lysine: 72,5 % the codons are AAG.

genetic code Fig. 1: The genetic code

Three bases, read from the inside outward,
show the amino acid encoded by the triplet.
For example, the triplet CAC encodes the
amino acid histidine (His).


Figure adapted from
L. Merkel, N. Budisa, BIOspektrum 2006 , 12 , 41.
Veränderung des genetischen Codes.

Codon preference plays an important role when bacterial or animal proteins are expressed in plants. If the foreign gene is composed of codons which are less preferred in the plant, the gene is poorly translated into protein. This is due to the fact that plant cells possess only a limited stock of transferRNAs (tRNAs) which are responsible for the transport of a certain amino acid during protein synthesis. If a codon, which is rarely used by the plant, is needed for the translation of the foreign gene, the appropriate tRNA molecules become depleted. As a consequence, the translation of the foreign gene slows or incorrect amino acids are incorporated into the nascent peptide chain.

Therefore it is useful – or in certain cases even necessary – to adapt the sequence of a foreign gene to the conditions of the host organism: codons with low preference are replaced by codons with high preference without changing the amino acid sequence of the resulting protein. This method has been already used in dicotyledon plants (for the expression of bacterial endotoxin in potatoes). In comparison to dicotyledon plants, monocotyledon plants are even more GC rich. Therefore, the expression of bacterial or animal genes in rice was rarely successful. The application of codon adapted synthetic genes has led to successful expression of bacterial and animal genes in this important plant.

Generation of synthetic genes    Fig. 2: The generation of synthetic genes

a) Chemically synthesized oligonucleotides containingcomplementary ends are mixed.

b) Under suitable conditions the complementary ends of the oligonucleotides hybridize to short double strands.

c) and d) The single stranded gaps are filled by the enzyme DNA polymerase using the opposite strand as a template.

e) The synthetic DNA is now double stranded, however the backbones of both strands are not yet covalently closed.

f) The enzyme ligase covalently closes the gaps in the backbones.

g) The synthetic gene is now complete and can be used for expression of the corresponding protein in a host organism.



Eurofins Medigenomix GmbH
Anzinger Str. 7a, D-85560 Ebersberg, Germany
Phone: ++49 (0)8092 / 8289-200, Fax: -201
sales@medigenomix.de
Page last modífied: 03.07.2008

top of page