Genetic code

The genetic code is the way in which the nucleotide sequence of an RNA single strand is translated into the amino acid sequence of the polypeptide chain of a protein. In the cell, this occurs after the genetic information laid down in the sequence of base pairs of the DNA double strand has previously been transcribed into the sequence of the RNA single strand (messenger ribonucleic acid, mRNA).

This genetic code is basically the same in all known species of living organisms. It assigns a specific proteinogenic amino acid to a triplet of three consecutive nucleobases of the nucleic acids - the so-called codon. The translation takes place at the ribosomes in the cytosol of a cell. They form the sequence of amino acids of a peptide according to the sequence of nucleotides of an mRNA by assigning a specific amino acid to each codon via the anticodon of a transfer ribonucleic acid (tRNA) and linking it to the previous one. In this way, a certain predetermined information is transferred into the form of a peptide chain, which then folds into the particular shape of a protein.

However, the more complex organisms are, the higher the proportion of genetic information that is not translated into proteins appears to be. A considerable amount of non-coding DNA is transcribed into RNAs but not translated into a peptide chain. In addition to the tRNAs and ribosomal RNAs (rRNA) required for translation, these non-protein coding RNA species of the transcriptome include a number of other, mostly small, RNA forms. These serve in many ways to regulate various cellular processes - such as transcription itself, as well as possible translation, in addition to possible DNA repair, and furthermore special epigenetic markings of DNA segments, as well as various functions of the immune system, among others.

Transfer ribonucleic acids, tRNAs, contain a distinctive nucleotide triplet at a prominent position on a loop of the cloverleaf-like molecule that sets them apart. Each consists of three nucleotides that correspond to the nucleotides of a particular codon by being complementary to them, thus forming a tripartite anticodon. The codon and anticodon match each other in a base-pairing manner, and the same specific amino acid is assigned to them. A tRNA is loaded with the amino acid that is represented by the codon that matches its anticodon. In this way, through the specific binding of an amino acid to a tRNA with a specific anticodon, the sign for a specific amino acid, the codon, is thus translated into the genetically encoded amino acid.

Strictly speaking, therefore, the genetic code is already contained in the structure of the various tRNA species: This is because each tRNA molecule contains an amino acid binding site structured in such a way that only that amino acid is bound to it which corresponds to its anticodon according to the genetic code. After binding to its tRNA, an amino acid is available for the biosynthesis of proteins on the ribosome, so that it can be added as the next link in the polypeptide chain - if the anticodon of the tRNA matches a codon in the given nucleotide sequence of the mRNA.

As a prerequisite for this protein synthesis, the DNA segment of a gene must first be transcribed into a ribonucleic acid (RNA) (transcription). In eukaryotic cells, certain parts of this hnRNA can be specifically removed (splicing) or subsequently modified (RNA editing); subsequently, this preliminary pre-mRNA is further processed to the definitive mRNA, which is finally exported from the cell nucleus. This is because it is only at the ribosomes, which may be free in the cytosol or bound to the endoplasmic reticulum, that the amino acids of the tRNAs matching the codons are then linked together to form a polypeptide on the basis of the mRNA template.

This process, by which the information of a gene is expressed in the form of a protein (gene expression), thus results from a sequence of steps. Here, the main processes are distinguished as (1) transcription - a section of the DNA of the genome is transcribed into RNA by RNA polymerase - and (2) posttranscriptional modification - an RNA of the transcriptome is modified - as well as (3) translation - an mRNA is translated into a polypeptide at the ribosome. This can be followed by (4) posttranslational modification - a polypeptide of the proteome is changed. In the sequence of these processes up to the provision of a function-bearing protein, translation is thus the step in which the genetic information of the base triplet sequence is converted into an amino acid sequence.

The actual application of the genetic code, namely the translation of a nucleotide sequence into an amino acid on the basis of the codon or anticodon, respectively, already takes place during the binding of an amino acid to its tRNA by the respective aminoacyl-tRNA synthetase, i.e. during the preparation of the amino acids for their possible assembly in a protein. A few base triplets do not code for an amino acid. Insofar as they carry no meaning in this sense, they are also called nonsense codons; these lead to a stop during translation, which terminates protein synthesis, and are therefore also called stop codons.

All living things use the same basic genetic code. The probably most frequently used version is given in the following tables. They show for this standard code which amino acids are commonly encoded by one of the 43 = 64 possible codons, or which codon is translated into one of the 20 canonical amino acids. For example, the codon GAU stands for the amino acid aspartic acid (Asp), and cysteine (Cys) is encoded by the codons UGU and UGC. The bases indicated in the table are adenine (A), guanine (G), cytosine (C), and uracil (U) of the ribonucleotides of mRNA; in contrast, thymine (T) occurs in place of uracil in the nucleotides of DNA. During the transcription of a DNA segment, the codogenic strand serves an RNA polymerase as a template for the transcript: the DNA base sequence is transcribed base-pairing into the complementary RNA base sequence during the construction of an RNA strand. In this way, the genetic information stored hereditarily in DNA is accessed, which is then available in mRNA for protein biosynthesis.

Representation of the transcription of genetic information from a DNA segment into an RNA transcript, where U then stands in place of T.Zoom
Representation of the transcription of genetic information from a DNA segment into an RNA transcript, where U then stands in place of T.

A representation of the genetic code (code sun): In the sequence from the inside to the outside, a base triplet of the mRNA (read from 5' to 3') is here assigned one of the twenty canonical amino acids or a stop codon is marked.Zoom
A representation of the genetic code (code sun): In the sequence from the inside to the outside, a base triplet of the mRNA (read from 5' to 3') is here assigned one of the twenty canonical amino acids or a stop codon is marked.

An example of the pairing of the codon on an mRNA with the complementary anticodon of a tRNA, here the alanine-loaded tRNAAla, whose anticodon matches GCC.Zoom
An example of the pairing of the codon on an mRNA with the complementary anticodon of a tRNA, here the alanine-loaded tRNAAla, whose anticodon matches GCC.

History of the discovery

In the first half of the 1960s, there was some competition among biochemists to understand the genetic code. On May 27, 1961, at 3 a.m., German biochemist Heinrich Matthaei made a decisive breakthrough in Marshall Nirenberg's laboratory with the Poly-U experiment: the decoding of codon UUU for the amino acid phenylalanine. This experiment is considered by some geneticists to be the most significant of the 20th century. In 1966, five years after the decoding of the first codon, the complete decoding of the genetic code with all 64 base triplets had been achieved.

Codon

Genetic information for the assembly of proteins is contained in specific sections of the base sequence of nucleic acids. Transcribed from DNA into RNA, it becomes available for the biosynthesis of proteins. The base sequence present in the open reading frame is read at the ribosome and translated (translated) according to the genetic code into the amino acid sequence of the synthesized peptide chain, the primary structure of a protein. In this process, the base sequence is read step by step in groups of three and each triplet is assigned a matching tRNA loaded with a specific amino acid. Each amino acid is bound to the previous one by peptide binding. In this way, the sequence segment codes for protein.

A codon is the variation pattern of a sequence of three nucleobases of the mRNA, a base triplet that can code for an amino acid. A total of 43 = 64 possible codons exist, 61 of which code for the total of 20 canonical proteinogenic amino acids; the remaining three are so-called stop codons for the termination of translation. Under certain circumstances, these can be used to encode two additional non-canonical amino acids. Thus, there are several different codings for almost all of the amino acids, each usually quite similar. However, coding as a triplet is necessary insofar as duplet coding would result in only 42 = 16 possible codons, which would already leave insufficient possibilities for the twenty canonical or standard amino acids.

Standard codon table for all 64 possible base triplets

2. base

 

U

C

A

G

 

1. base

U

UUU

Phenylalanine (Phe)

UUC

Phenylalanine (Phe)

UUA

Leucine (Leu)

UUG

Leucine (Leu)

UCU

Serine (Ser)

UCC

Serine (Ser)

UCA

Serine (Ser)

UCG

Serine (Ser)

UAU

Tyrosine (Tyr)

UAC

Tyrosine (Tyr)

UAA

Stop

UAG

Stop

UGU

Cysteine (Cys)

UGC

Cysteine (Cys)

UGA

Stop*

UGG

Tryptophan (Trp)

 

C

CUU

Leucine (Leu)

CUC

Leucine (Leu)

CUA

Leucine (Leu)

CUG

Leucine (Leu)

CCU

Proline (Pro)

CCC

Proline (Pro)

CCA

Proline (Pro)

CCG

Proline (Pro)

CAU

Histidine (His)

CAC

Histidine (His)

CAA

Glutamine (Gln)

CAG

Glutamine (Gln)

CGU

Arginine (Arg)

CGC

Arginine (Arg)

CGA

Arginine (Arg)

CGG

Arginine (Arg)

 

A

AUU

Isoleucine (Ile)

AUC

Isoleucine (Ile)

AUA

Isoleucine (Ile)

AUG

Methionine (Met)*

ACU

Threonine (Thr)

ACC

Threonine (Thr)

ACA

Threonine (Thr)

ACG

Threonine (Thr)

AAU

Asparagine (Asn)

AAC

Asparagine (Asn)

AAA

Lysine (Lys)

AAG

Lysine (Lys)

AGU

Serine (Ser)

AGC

Serine (Ser)

AGA

Arginine (Arg)

AGG

Arginine (Arg)

 

G

GUU

Valine (Val)

GUC

Valine (Val)

GUA

Valine (Val)

GUG

Valine (Val)

GCU

Alanine (Ala)

GCC

Alanine (Ala)

GCA

Alanine (Ala)

GCG

Alanine (Ala)

GAU

Aspartic acid (Asp)

GAC

Aspartic acid (Asp)

GAA

Glutamic acid (Glu)

GAG

Glutamic acid (Glu)

GGU

Glycine (Gly)

GGC

Glycine (Gly)

GGA

Glycine (Gly)

GGG

Glycine (Gly)

 

Coloring of the amino acids

hydrophobic (non-polar)

hydrophilic neutral (polar)

hydrophilic and positively charged (basic)

hydrophilic and negatively charged (acidic)

* The triplet of the codonAUGfor methionine also serves as a translation start signal. One of the first AUG triplets on the mRNA becomes the first codon to be decoded. The ribosome recognizes which AUG is to be used as the start codon for the tRNAiMet by signals from the neighboring mRNA sequence. The
triplet of the
stop codonUGAalso serves as a codon for the (21st proteinogenic) amino acid selenocysteine in humans, for example, under certain conditions.

 

The codons indicated apply to the nucleotide sequence of an mRNA. It is read in the 5′→3′ direction on the ribosome and translated into the amino acid sequence of a polypeptide.

Reverse codon table

Az

AS

AS

Codon

1

Start

> 

AUG

1

Met

M

AUG

1

Trp

W

UGG

1

Sec

U

(UGA)

1

Pyl

O

(UAG)

2

Tyr

Y

UAU UAC

2

Phe

F

UUU UUC

2

Cys

C

UGU UGC

2

Asn

N

AAU AAC

2

Asp

D

GAU GAC

2

Gln

Q

CAA CAG

2

Glu

E

GAA GAG

2

His

H

CAU CAC

2

Lys

K

AAA AAG

3

Ile

I

AUU AUC AUA

4

Gly

G

GGU GGC GGA GGG

4

Ala

A

GCU GCC GCA GCG

4

Val

V

GUU GUC GUA GUG

4

Thr

T

ACU ACC ACA ACG

4

Per

P

CCU CCC CCA CCG

6

Leu

L

CUU CUC CUA CUG UUA UUG

6

Ser

S

UCU UCC UCA UCG AGU AGC

6

Arg

R

CGU CGC CGA CGG AGA AGG

3

Stop

< 

UAA UAG UGA

Translation begins with a start codon. However, certain initiation sequences and factors are also necessary to cause the binding of the mRNA to a ribosome and to start the process. This includes a special initiator tRNA that carries the first amino acid. The most important start codon is AUG, which codes for methionine. ACG and CUG - as well as GUG and UUG in prokaryotic cells - can also serve as start codons, but with lower efficiency. However, the first amino acid is mostly a methionine - N-fomylated in bacteria and in mitochondria.

The translation ends with one of the three stop codons, also called termination codons. Initially, these codons were also given names - UAG is amber, UGA is opal, and UAA is ochre (a pun on the last name of their discoverer Harris Bernstein).

While the codon UGA is mostly read as stop, it can rarely and only under certain conditions stand for a 21st (proteinogenic) amino acid: Selenocysteine (Sec). The biosynthesis and insertion mechanism of selenocysteine into proteins is very different from that of all other amino acids: its insertion requires a novel translation step in which a UGA is interpreted differently in the context of a specific sequence environment and together with specific cofactors. This also requires a structurally unique tRNA (tRNASec) specific for selenocysteine, which in vertebrates can also be loaded with two chemically related amino acids: serine or phosphoserine in addition to selenocysteine.

In addition, some archaea and bacteria can translate a canonical stop codon UAG into another (22nd) proteinogenic amino acid: Pyrrolysine (Pyl). They have a special tRNAPyl as well as a specific enzyme to load it (pyrrolysyl-tRNA synthetase).

Some short DNA sequences occur only rarely or not at all in the genome of a species (nullomers). In bacteria, some of these prove to be toxic; the codon AGA, which codes for the amino acid arginine, is also avoided in bacteria (CGA is used instead). There are definitely species-specific differences in codon usage. Differences in codon usage do not necessarily mean differences in the abundance of amino acids used. This is because there is more than a single codon for most of the amino acids, as the table above shows.

Degeneration and fault tolerance

If a certain amino acid is to be encoded, it is often possible to choose among several codons with the same meaning. The genetic code is a code in which several expressions have the same meaning, i.e. the same semantic unit can be encoded by different syntactic symbols. Compared to a coding system in which one syntactic expression corresponds to each semantic unit and vice versa, such a code is called degenerate.

It has the advantage that more than 60 codons are available for the approximately 20 amino acids to be translationally inserted. They are represented as a combination of three nucleotides with four possible bases each, so that there are 64 combinations. Their respective assignment to an amino acid is such that very similar codon variations code for a specific amino acid. Due to the error tolerance of the genetic code, two nucleotides are often sufficient to specify an amino acid with certainty.

The base triplets coding for an amino acid usually differ in only one of the three bases; they have the minimum distance in code space, see Hamming distance or Levenshtein distance. Mostly, triplets differ in the third base, the "wobbling" one, which is most likely to be misread during translations (see "wobble" hypothesis). Amino acids frequently needed for protein assembly are represented by more codons than those rarely used. A deeper analysis of the genetic code reveals further correlations, for example with regard to the molar volume and the hydrophobic effect (see figure).

It is also noteworthy that the base in the middle of a triplet can largely indicate the character of the assigned amino acid: Thus, in the case of _ U _ they are hydrophobic, but hydrophilic in the case of _ A _. In the case of _ C _ they are nonpolar or polar without charge, those with charged side chains occur in _ G _ as well as in _ A _, with negative charge only in _ A _ (see table above). Therefore, radical substitutions - the exchange for amino acids of a different character - are often a consequence of mutations in that second position. Mutations in the third position ("wobble"), on the other hand, often preserve the respective amino acid or at least its character as a conservative substitution. Since transitions (conversion of purines or pyrimidines into each other, for example C→T) occur more frequently than transversions (conversion of a purine into a pyrimidine or vice versa; this process usually requires depurination) for mechanistic reasons, a further explanation for the conservative properties of the code emerges.

Contrary to earlier assumptions, the first codon position is often more important than the second position, presumably because changes in the first position alone can reverse the charge (from a positively charged to a negatively charged amino acid or vice versa). Charge reversal, however, can have dramatic consequences for protein function. This was overlooked in many previous studies.

The so-called degeneracy of genetic codes also makes it possible to store genetic information less sensitive to external influences. This is particularly true with regard to point mutations, both synonymous mutations (leading to the same amino acid) and non-synonymous mutations leading to amino acids with similar properties.

Apparently, early in evolutionary history, it was helpful to lower the susceptibility of coding to incorrectly formed codons. The function of a protein is determined by its structure. This depends on the primary structure, the sequence of amino acids: how many, which ones and in which order are linked to form a peptide chain. The base sequence contains this information as genetic information. An increased error tolerance of the coding ensures correct decoding. If an amino acid of a similar character is inserted in the case of an incorrect one, this changes the protein function less than if it were of a completely different character.

Grouping of codons according to the molar volume of the respective encoded amino acid and the hydropathic index.Zoom
Grouping of codons according to the molar volume of the respective encoded amino acid and the hydropathic index.

Origin of the genetic code

The use of the word "code" goes back to Erwin Schrödinger, who had used the terms "hereditary code-script", "chromosome code" and "miniature code" in a series of lectures in 1943, which he summarized in 1944 and used as the basis for his 1944 book "What is Life?". The exact location or carrier of this code was still unclear at that time.

In the past, it was believed that the genetic code arose by chance. As late as 1968, Francis Crick referred to it as "frozen chance". However, it is the result of strict optimization with regard to error tolerance. Errors are particularly serious for the spatial structure of a protein if the hydrophobicity of an incorrectly incorporated amino acid differs significantly from the original. In a statistical analysis, among a million random codes, only 100 turn out to be better than the actual one in this respect. If additional factors corresponding to typical patterns of mutations and reading errors are taken into account when calculating the error tolerance, this number is even reduced to 1 in 1 million.

Code universality

Basic principle

It is remarkable that the genetic code is in principle the same for all living beings, i.e. all living beings use the same "genetic language". Not only is genetic information present in all of them in the sequence of nucleic acids, and is always read in triplets for the construction of proteins. With a few exceptions, a specific codon also stands for the same amino acid in each case; the standard code reflects the common usage. This is why it is possible in genetic engineering, for example, to insert the gene for human insulin into bacteria so that they then produce the hormone protein insulin. This basic principle of coding, shared by all organisms, is called "universality of code." It is explained by evolution in such a way that the genetic code was shaped very early in the evolutionary history of life and then passed on by all evolving species. Such generalization does not exclude the possibility that the frequency of different code words may differ between organisms (see Codon Usage).

Variants

In addition, there are also various variants that deviate from the standard code, i.e. in which a few codons are translated into an amino acid other than that specified in the #standard codon table. Some of these deviations can be narrowed down taxonomically, so that special codes can be defined. In this way, more than thirty variant genetic codes have already been distinguished.

In eukaryotic cells, those organelles that have an independent genomic system and are presumably derived from symbiotic bacteria (endosymbiont theory) show their own variants of the genetic code. In mitochondria, more than ten modified forms of mitochondrial code are known for their own DNA (mtDNA, mitogenome syn. chondriome). These differ from the nuclear code for the genetic material in the nucleus, the nuclear genome (karyome). In addition, the plastids that also occur in plant cells have their own code for their plastid DNA (cpDNA, plastome).

The ciliates (Ciliophora) also show deviations from the standard code: UAG, not infrequently also UAA, code for glutamine; this deviation is also found in some green algae. UGA also sometimes stands for cysteine. Another variant is found in the yeast Candida, where CUG codes for serine.

Furthermore, there are some variants of amino acids that can be incorporated by recoding not only by bacteria (Bacteria) and archaea (Archaea) during translation; for example, as described above, UGA can encode selenocysteine and UAG can encode pyrrolysine, in the standard code of both stop codons.

In addition, other deviations from the standard code are known, which often concern initiation (start) or termination (stop); especially in mitochondria, a codon (base triplet of the mRNA) is often not assigned the usual amino acid. Some examples are listed in the following table:

Deviations from the standard code

Occurrence

Codon

Standard

Deviation

Mitochondria (in all organisms examined so far)

UGA

Stop

Tryptophan

Mammalian, Drosophila and S. cerevisiae and protozoan mitochondria.

AUA

Isoleucine

Methionine = Start

Mammalian mitochondria

AGC, AGU

Serine

Stop

Mammalian mitochondria

AG(A, G)

Arginine

Stop

Mitochondria from Drosophila

AGA

Arginine

Stop

Mitochondria e.g. in Saccharomyces cerevisiae

CU(U, C, A, G)

Leucine

Threonine

Mitochondria of higher plants

CGG

Arginine

Tryptophan

Some species of the fungal genus Candida

CUG

Leucine

Serine

Eukarya (rare)

CUG

Leucine

Start

Eukarya (rare)

ACG

Threonine

Start

Eukarya (rare)

GUG

Valine

Start

Bacteria

GUG

Valine

Start

Bacteria (rare)

UUG

Leucine

Start

Bacteria (SR1 Bacteria)

UGA

Stop

Glycine

Genetic codes in DNA alphabet

DNA sequence databases such as GenBank also report mRNA sequences in a format that conforms to historical conventions, using the DNA alphabet, i.e., T instead of U. Examples:

  • Standard code (= id)
        AS = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRIIIMTTTNNKKSSRRVVVAAAADDEEGGGG Starts = ---M------**--*----M---------------M----------------------------  Base1 = TTTTTTTTTTTTCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG     id = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRIIIMTTTNNKKSSRRVVVAAAADDEEGGGG
  • Vertebrate Mitochondrial Code
        AS = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRIIMMTTTNNKKSS**VVVAAAADDEEGGGG Starts = ----------**--------------------MMMM----------**---M------------  Base1 = TTTTTTTTTTTTCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG     id = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRIIIMTTTNNKKSSRRVVVAAAADDEEGGGG
  • Yeast Mitochondrial Code
        AS = FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRIIMMTTNNKKSSRRVVVAAAADDEEGGGG Starts = ----------**----------------------MM----------------------------  Base1 = TTTTTTTTTTTTCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG     id = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRIIIMTTTNNKKSSRRVVVAAAADDEEGGGG
  • Invertebrates Mitochondrial Code
        AS = FFLLSSSSYY**CCWWLLPPPPHHQQRRRIIMMTTTNNKKSSSSVVVAAAADDEEGGGG Starts = ---M------**--------------------MM---------------M------------  Base1 = TTTTTTTTTTTTCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCTCAGTCAGTCAGTCAG     id = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRIIIMTTTNNKKSSRRVVVAAAADDEEGGGG
  • Bacteria, archaea and plastids code
    AS = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRIIIMTTTNNKKSSRRVVVAAAADDEEGGG Starts = ---M------**--*----M------------MMMM---------------M------------  Base1 = TTTTTTTTTTTTCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGGGTTCCCCAAAAGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG     id = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRIIIMTTTNNKKSSRRVVVAAAADDEEGGGG

Note: In the respective first line "AS", the amino acids are indicated in the one-letter code (see #Inverted Codon Table), with deviations from the standard code (id) shown in bold (or red) in each case. In the second line "Starts" M shows Initiation, * Termination; some variants differ solely with respect to (alternative) start codons or stop codons. Further codes can be taken from the freely accessible source.

Engineering of the genetic code

Main article: Synthetic biology

Generally, the concept of evolution of the genetic code from the original and ambiguous original genetic code to the well-defined ("frozen") code with the repertoire of 20 (+2) canonical amino acids is accepted. However, there are different opinions and ideas on how these changes took place. Based on these, models are even proposed that predict "entry points" for the invasion of the genetic code with synthetic amino acids.

See also

  • Codogenic strand
  • Epigenetic code
  • Gene duplication
  • Xenobiology

AlegsaOnline.com - 2020 / 2023 - License CC3