Building blocks
Deoxyribonucleic acid is a long chain molecule (polymer) of many building blocks called deoxyribonucleotides or nucleotides for short. Each nucleotide has three components: Phosphoric acid or phosphate, the sugar deoxyribose, and a heterocyclic nucleobase or base for short. The deoxyribose and phosphoric acid subunits are the same for each nucleotide. They form the backbone of the molecule. Units of base and sugar (without phosphate) are called nucleosides.
The phosphate residues are hydrophilic due to their negative charge, they give DNA in aqueous solution an overall negative charge. Since this negatively charged DNA dissolved in water cannot release any further protons, strictly speaking it is not (any longer) an acid. The term deoxyribonucleic acid refers to an uncharged state in which protons are attached to the phosphate residues.
The base can be a purine, namely adenine (A) or guanine (G), or a pyrimidine, namely thymine (T) or cytosine (C). Since the four different nucleotides differ only by their base, the abbreviations A, G, T and C are also used for the corresponding nucleotides.
The five carbon atoms of a deoxyribose are numbered from 1' (read one dash) to 5'. The base is attached to the 1' end of this sugar. The phosphate residue is attached to the 5' end. Strictly speaking, the deoxyribose is the 2-deoxyribose; the name comes from the fact that, compared to a ribose molecule, an alcoholic hydroxy group (OH group) is missing at the 2'-position (i.e. replaced by a hydrogen atom).
An OH group is present at the 3' position, which links the deoxyribose to the 5' carbon atom of the sugar of the next nucleotide via a so-called phosphodiester bond (see figure). As a result, each so-called single strand has two different ends: a 5' and a 3' end. DNA polymerases, which carry out the synthesis of DNA strands in the living world, can only add new nucleotides to the OH group at the 3' end, but not at the 5' end. Thus, the single strand always grows from 5' to 3' (see also DNA replication below). In this process, a nucleoside triphosphate (with three phosphate residues) is delivered as a new building block, from which two phosphates are cleaved in the form of pyrophosphate. The remaining phosphate residue of each newly added nucleotide is linked to the OH group at the 3' end of the last nucleotide present in the strand with water splitting. The sequence of bases in the strand encodes the genetic information.
The double helix
DNA normally occurs as a helical double helix in a conformation called B-DNA. Two of the single strands described above are attached to each other in opposite directions: At each end of the double helix, one of the two single strands has its 3' end, the other its 5' end. As a result of the juxtaposition, two particular bases are always opposite each other in the middle of the double helix, they are "paired". The double helix is stabilized mainly by stacking interactions between successive bases of the same strand (and not, as often claimed, by hydrogen bonds between strands).
Adenine and thymine always pair up, forming two hydrogen bonds, or cytosine with guanine, which are connected by three hydrogen bonds. Bridging occurs between the molecular positions 1═1 and 6═6, and in the case of guanine-cytosine pairings additionally between 2═2. Since the same bases always pair, the sequence of the bases in one strand can be used to deduce that of the other strand; the sequences are complementary (see also: base pair). Here, the hydrogen bonds are almost exclusively responsible for the specificity of the pairing, but not for the stability of the double helix.
Since a purine is always combined with a pyrimidine, the distance between the strands is the same everywhere, resulting in a regular structure. The whole helix has a diameter of about 2 nm and winds further by 0.34 nm with each sugar molecule.
The planes of the sugar molecules are at an angle of 36° to each other, and complete rotation is consequently achieved after 10 bases (360°) and 3.4 nm. DNA molecules can become very large. For example, the largest human chromosome contains 247 million base pairs.
When the two single strands wind around each other, lateral gaps remain, so that here the bases lie directly on the surface. There are two of these furrows that wind around the double helix (see figures and animation at the beginning of the article). The "large furrow" is 2.2 nm wide, the "small furrow" only 1.2 nm.
Accordingly, the bases in the major groove are more accessible. Proteins that bind to DNA in a sequence-specific manner, such as transcription factors, therefore usually bind to the major furrow.
Some DNA dyes, such as DAPI, also attach to a furrow.
The cumulative binding energy between the two single strands holds them together. Covalent bonds are not present here, so the DNA double helix consists of two molecules rather than one. This allows the two strands to be temporarily separated in biological processes.
In addition to the B-DNA just described, there is also A-DNA and a left-handed, so-called Z-DNA, which was studied for the first time in 1979 by Alexander Rich and his colleagues at MIT. This occurs particularly in G-C-rich segments. It was not until 2005 that a crystal structure was reported showing Z-DNA directly in a junction with B-DNA, thus providing evidence for a biological activity of Z-DNA. The following table and the adjacent figure show the differences of the three forms in direct comparison.
| Structural information of the three DNA forms that could be biologically relevant (B-DNA is the most common form in living nature). |
| Structure feature | A-DNA | B-DNA | Z-DNA |
| Structure from | Monomers | Monomers | Dimer |
| Direction of rotation of the helix | right | right | left |
| Diameter (approx.) | 2.6 nm | 2.37 nm | 1.8 nm |
| Helical turn per base pair (twist) | 32,7° | 34,3° | 30° |
| Base pairs per helical turn | 11 | 10 | 12 |
| Increase per base | 0.29 nm | 0.34 nm | 0.37 nm |
| Rise per turn (gear height) | 3.4 nm | 3.4 nm | 4.4 nm |
| Angle of inclination of the base pairs to the axis | 20° | 6° | 7° |
| Large furrow | narrow and deep | wide and deepDepth : 0.85 nm | flat |
| Small furrow | wide and flat | Narrow and deepDepth : 0.75 nm | narrow and deep |
| Pyrimidine bases (cytosine/thymine/uracil) Sugar conformation Glycosidic bond | C3'-endoanti | C2'-endoanti | C2'-endo anti |
| Purine bases (adenine/guanine) Sugar conformation Glycosidic bond | C3'-endoanti | C2'-endoanti | C3'-endo syn |
The stacks of base pairs (base stackings) do not lie exactly parallel to each other like books, but form wedges that tilt the helix in one direction or the other. The largest wedge is formed by adenosines paired with thymidines of the other strand. Consequently, a series of AT pairs forms an arc in the helix. When such series follow each other at short intervals, the DNA molecule adopts a bent or a curved structure, which is stable. This is also called sequence-induced diffraction, because the diffraction can also be caused by proteins (the so-called protein-induced diffraction). Sequence-induced diffraction is often found at important sites in the genome.
Chromatin and chromosomes
DNA in the eukaryotic cell is organized in the form of chromatin strands called chromosomes, which are located in the nucleus. A single chromosome contains a long, continuous DNA double strand (in a chromatid) from anaphase to the beginning of S phase. At the end of S phase, the chromosome consists of two identical DNA strands (in two chromatids).
Since such a DNA thread can be several centimeters long, but a cell nucleus has a diameter of only a few micrometers, the DNA must be additionally compressed or "packed". In eukaryotes, this is done with so-called chromatin proteins, of which the basic histones deserve special mention. They form the nucleosomes around which the DNA is wrapped at the lowest packaging level. During nuclear division (mitosis), each chromosome is condensed to its maximum compact form. This allows them to be identified particularly well in metaphase with the light microscope.
Bacterial and viral DNA
In prokaryotic cells, the majority of double-stranded DNA in the cases documented so far is not present as linear strands, each with a beginning and an end, but as circular molecules - each molecule (i.e. each DNA strand) closes into a circle with its 3' and its 5' end. These two circular, closed DNA molecules are called bacterial chromosomes or plasmids, depending on the length of the sequence. In bacteria, they are also not located in a nucleus, but are freely present in the plasma. The prokaryote DNA is wound up with the help of enzymes (for example, topoisomerases and gyrases) into simple "supercoils" that resemble a curled telephone cord. By twisting the helices around themselves, the space required for the genetic information is reduced. In bacteria, topoisomerases ensure that the twisted double strand is untwisted at a desired position by constantly cutting and rejoining the DNA (a prerequisite for transcription and replication). Depending on their type, viruses contain either DNA or RNA as genetic information. In both DNA and RNA viruses, the nucleic acid is protected by a protein envelope.
Chemical and physical properties of the DNA double helix
DNA is a negatively charged molecule at neutral pH, with the negative charges sitting on the phosphates in the backbone of the strands. Although two of the three acidic OH groups of the phosphates are esterified with the respective adjacent deoxyriboses, the third is still present and releases a proton at neutral pH, causing the negative charge. This property is exploited in agarose gel electrophoresis to separate different DNA strands according to their length. Some physical properties such as the free energy and the melting point of the DNA are directly related to the GC content, i.e. they are sequence-dependent.
Stacking interactions
Two main factors are responsible for the stability of the double helix: base pairing between complementary bases and stacking interactions between consecutive bases.
Contrary to initial assumptions, the energy gain from hydrogen bonds is negligible, since the bases can form similarly good hydrogen bonds with the surrounding water. The hydrogen bonds of a GC base pair contribute only minimally to the stability of the double helix, while those of an AT base pair even have a destabilizing effect. Stacking interactions, on the other hand, act only in the double helix between consecutive base pairs: A dipole-induced dipole interaction occurs between the aromatic ring systems of the heterocyclic bases, which is energetically favorable. Thus, the formation of the first base pair is quite unfavorable due to the low energy gain and loss, but the elongation (lengthening) of the helix is energetically favorable because the base pair stacking proceeds under energy gain.
However, the stacking interactions are sequence-dependent and energetically most favorable for stacked GC-GC, less favorable for stacked AT-AT. The differences in stacking interactions mainly explain why GC-rich DNA segments are thermodynamically more stable than AT-rich ones, while hydrogen bonding plays a minor role.
Melting point
The melting point of DNA is the temperature at which the binding forces between the two single strands are overcome and they separate from each other. This is also referred to as denaturation.
As long as DNA denatures in a cooperative transition (which occurs in a narrow temperature range), the melting point refers to the temperature at which half of the double strands have denatured into single strands. The correct terms "midpoint of transition temperature" or midpoint temperature Tm are derived from this definition.
The melting point depends on the respective base sequence in the helix. It increases when there are more GC base pairs in it, since these are entropically more favorable than AT base pairs. This is not so much due to the different number of hydrogen bonds formed by the two pairs, but much more to the different stacking interactions. The stacking energy of two base pairs is much smaller when one of the two pairs is an AT base pair. GC stacks, on the other hand, are more energetically favorable and stabilize the double helix more strongly. The ratio of GC base pairs to the total number of all base pairs is given by the GC content.
Since single-stranded DNA absorbs UV light about 40 percent more strongly than double-stranded DNA, the transition temperature can be easily determined in a photometer.
When the temperature of the solution falls back below Tm, the individual strands can reattach to each other. This process is called renaturation or hybridization. The interplay of de- and renaturation is exploited in many biotechnological processes, for example in the polymerase chain reaction (PCR), Southern blots and in situ hybridization.
Cruciform DNA on palindromes
A palindrome is a sequence of nucleotides in which the two complementary strands can be read in the same way from the right as from the left.
Under natural conditions (with high torsional stress of the DNA) or artificially in the test tube, this linear helix can form as a cruciform by creating two branches that protrude from the linear double strand. The branches each represent a helix in their own right, but at least three nucleotides remain unpaired at the end of a branch. During the transition from the cross form to the linear helix, base pairing is maintained because of the bending ability of the phosphodiester-sugar backbone.
The spontaneous assembly of complementary bases into so-called stem-loop structures is also frequently observed in single-stranded DNA or RNA.
Non-standard bases
Occasionally, deviations from the above four canonical bases (standard bases) adenine (A), guanine (G), thymine (T), and cytosine (C) are observed in viruses and cellular organisms; further deviations can be artificially generated.
Natural non-standard bases
- Uracil (U) is not normally found in DNA; it occurs only as a degradation product of cytosine. However, in several bacteriophages (bacterial viruses), thymine is replaced by uracil:
- Bacillus subtilis bacteriophage PBS1 (ICTV: species Bacillus virus PBS1) and "PBS2" (proposed species 'Bacillus phage PBS2' aka "Bacteriophage PBS2") - both species are myophages, i.e. phages from the family Myoviridae (without assigned genus).
- Bacillus virus PBS1 (ICTV: species Yersinia virus R1RT in genus Tg1virus, family Myoviridae).
- Staphylococcus phage S6 (aka Staphylococcus aureus bacteriophage 15, also from the family Myoviridae)
Uracil is also found in the
DNA of eukaryotes such as Plasmodium falciparum (Apicomplexa). It is present there in relatively small amounts (7-10 uracil units per million bases).
- 5-Hydroxymethyldesoxyuridine (hm5dU) replaces thymidine in the genome of various Bacillus phages of the species Bacillus virus SPO1, genus Spo1virus (formerly spounalikevirus or SPO1-like viruses), also family Myoviridae. These are the phages SPO1, SP8, SP82, "Phi-E" alias "ϕe" and "2C").
- 5-Dihydroxypentauracil (DHPU, with nucleotide 5-dihydroxypentyl-dUMP, DHPdUMP) was described as a replacement for thymidine in "Bacillus phage SP15" (also "SP-15", family Myoviridae).
- Beta-d-glucopyranosyloxymethyluracil (base J), also a modified form of uracil, was found in several organisms: The flagellates Diplonema and Euglena (both Excavata: Euglenozoa) and all genera of kinetoplastids. The biosynthesis of J occurs in two steps: In the first step, a specific thymidine in DNA is converted to hydroxymethyldesoxyuridine (HOMedU), and in the second step, HOMedU is glycosylated to J. There are some proteins that specifically bind to this base. These proteins appear to be distant relatives of the Tet1 oncogene, which is involved in the pathogenesis of acute myeloid leukemia. J appears to act as a termination signal for RNA polymerase II.
- 2,6-Diaminopurine (a.k.a. 2-aminoadenine, base D or X, DAP): In 1976, it was found that the "Cyanophage S-2L" (a.k.a. "Cyanobacteria phage S-2L", genus "Cyanostylovirus", family Siphoviridae, possibly own family "Cyanostyloviridae" or "Styloviridae"), whose hosts are species of the genus Synechocystis, replaced all adenosine bases in its genome with 2,6-diaminopurine. Three additional studies followed in 2021, and a summary can be found on sciencealert (May 2021). A similar situation applies to "Acinetobacter phage SH-Ab 15497", also Siphoviridae, and other representatives of this family as well as Podoviridae.
- As found in 2016, 2'-deoxyarchaeosin (dG+) is present in the genome of several bacteria and in Escherichia phage 9g (ICTV: Escherichia virus 9g, genus Nonagvirus, family Siphoviridae).
- 6-methylisoxanthopterin
- 5-Hydroxyuracil
Natural modified bases (methylations etc.)
Modified bases also occur in natural DNA. In particular, methylations of the canonical bases are studied in the context of epigenetics:
- First, in 1925, 5-methylcytosine (m5C) was found in the genome of Mycobacterium tuberculosis. In the genome of Xanthomonas oryzae bacteriophage Xp12 (Xanthomonas phage XP-12, family Siphoviridae) and halovirus ΦH (Halobacterium virus phiH, genus Myohalovirus, Myoviridae), the entire cystosine contingent is replaced by 5-methylcytosine.
- A complete replacement of cytosine by 5-glycosylhydroxymethylcytosine (syn. glycosyl-5-hydroxymethylcytosine) in phages T2, T4, and T6 of the species Escherichia virus T4 (genus Tquattrovirus, subfamily Tevenvirinae of the family Myoviridae) was observed in 1953.
- As discovered in 1955, N6-methyladenine (6mA, m6A) is present in the DNA of coliform bacteria.
- N6-carbamoylmethyladenine was described in 1975 in bacteriophage Mu (ICTV: species Escherichia virus Mu, formerly Enterobacteria phage Mu; genus Muvirus, obsolete Mulikevirus in the family Myoviridae) and Lambda-Mu.
- 7-Methylguanine (m7G) was described in 1976 in phage DDVI ('Enterobacteria phage DdVI' aka 'DdV1', genus T4virus) of Shigella disenteriae.
- N4-methylcytosine (m4C) in DNA was described in 1983 (in Bacillus centrosporus).
- In 1985, 5-hydroxycytosine was found in the genome of Rhizobium phage RL38JI.
- α-Putrescinylthymine (alpha-putrescinylthymine, putT) and α-glutamylthymidine (alpha-glutamylthymidine) are present in the genome of both Delftia phage ΦW-14 (Phi W-14, species 'Dellftia virus PhiW14', genus Ionavirus, family Myovrirdae) and Bacillus phage SP10 (also family Myoviridae).
- 5-Dihydroxypentyluracil was found in Bacillus phage SP15 (also SP-15, family Myoviridae).
The function of these non-canonical bases in DNA is not known. They act, at least in part, as a molecular immune system and help protect the bacteria from infection by viruses.
However, non-standard and modified bases in microbes are not the whole story:
- Four modifications of cytosine residues in human DNA have also been reported. These modifications consist of the addition of the following groups:
- Methyl (-CH3)
- Hydroxymethyl (-CH2OH)
- Formyl (-CHO)
- Carboxyl (-COOH)
It is assumed that these modifications have regulatory functions, keyword epigenetics.
- Uracil is found in the centromere regions of at least two human chromosomes (6 and 11).
Synthetic bases
In the laboratory, DNA (and also RNA) was provided with additional artificial bases. The aim is usually to create unnatural base pairs (UBP):
- In 2004, DNA was generated that contained an expanded alphabet with six nucleobases (A, T, G, C, dP, and dZ) instead of the four standard nucleobases (A, T, G, and C). Here, for these two new bases, dP represents 2-amino-8-(1′-β-D-2′-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin-4(8H)-one and dZ represents 6-amino-5-nitro-3-(1′-β-D-2′-deoxyribofuranosyl)-2(1H)-pyridone.
- In 2006, DNA with bases extended by a benzene group or a naphthyl group (called either xDNA or xxDNA or yDNA or yyDNA, depending on the position of the extension groups) was studied for the first time.
- Yorke Zhang et al. reported at the turn of 2016/2017 semisynthetic organisms with DNA extended to include bases X (a.k.a. NaM) and Y' (a.k.a. TPT3) or the (deoxyribo) nucleotides dX (dNaM) and dY' (dTPT3) that pair with each other. This was preceded by experiments with pairings based on bases X and Y (aka 5SICS), i.e., nucleotides dX and dY (aka d5SICS). Other bases that can pair with 5SICS are FEMO and MMO2.
- In early 2019, DNA and RNA were reported to have eight bases each (four natural and four synthetic), all of which map to each other in pairs (Hachimoji DNA).
Enantiomers
DNA occurs in living organisms as D-DNA; however, L-DNA can be synthesized as an enantiomer (mirror) (the same applies analogously to RNA). L-DNA is degraded more slowly by enzymes than the natural form, which makes it interesting for pharmaceutical research.