13.1: DNA Structure - Biology

A. Early Clues and Ongoing Misconceptions

By 1878, a substance in the pus of wounded soldiers derived from cell nuclei (called nuclein) was shown to be composed of 5 bases (the familiar ones of DNA and RNA). The four bases known to make up DNA (as part of nucleotides) were thought to be connected through the phosphate groups in short repeating chains of four nucleotides. By the 1940s, we knew that DNA was a long polymer. Nevertheless, it was still considered too simple to account for genes (see above). After the Hershey and Chase experiments, only a few holdouts would not accept DNA as the genetic material. So, the question remaining was how such a “simple” molecule could account for all the genes, even in so simple an organism as a bacterium. The answer to this question was to lie at least in part in an understanding of the physical structure of DNA, made possible by the advent of X-Ray Crystallography.

If a substance can be crystallized, the crystal will diffract X-rays at angles revealing regular (repeating) structures of the crystal. William Astbury demonstrated that high molecular weight DNA had just such a regular structure. His crystallographs suggested DNA to be a linear polymer of stacked bases (nucleotides), each nucleotide separated from the next by 0.34 nm. Astbury is also remembered for coining the term “molecular biology” to describe his studies. The term now covers as all aspects of biomolecular structure, as well as molecular functions (e.g. replication, transcription, translation, gene regulation…).

In an irony of history, the Russian biologist Nikolai Koltsov had already intuited in 1927 that the basis of genetic transfer of traits would be a "giant hereditary molecule" made up of "two mirror strands that would replicate in a semi-conservative fashion using each strand as a template". A pretty fantastic inference if you think about it since it was proposed long before Watson and Crick and their colleagues worked out the structure of the DNA double-helix!

B. Wilkins, Franklin, Watson & Crick

Maurice Wilkins, an English biochemist, was the first to isolate highly pure, high molecular weight DNA. Working in Wilkins laboratory, Rosalind Franklin was able to crystalize this DNA and produce very high-resolution X-Ray diffraction images of the DNA crystals. Franklin’s most famous (and definitive) crystallography was “Photo 51”, shown below.

This image confirmed Astbury’s 0.34 nm repeat dimension and revealed two more numbers, 3.4 nm and 2 nm, reflecting additional repeat structures in the DNA crystal. When James Watson and Francis Crick got hold of these numbers, they used them along with other data to build DNA models out of nuts, bolts and plumbing. Their models eventually revealed DNA to be a pair of antiparallel complementary of nucleic acid polymers…, shades of Koltsov’s mirror-image macromolecules! Each strand is a string of nucleotides linked by phosphodiester bonds, the two strands held together in a double helix by complementary H-bond interactions.

Let’s look at the evidence for these conclusions and as we do, refer to the two illustrations of the double helix below.

Recalling that Astbury’s 0.34 nm dimension was the distance between successive nucleotides in a DNA strand, Watson and Crick surmised that the 3.4 nm repeat was a structurally meaningful 10-fold multiple of Astbury’s number. When they began building their DNA models, they realized from the bond angles connecting the nucleotides that the strand was forming a helix, from which they concluded that the 3.4 nm repeat was the pitch of the helix, i.e., the distance of one complete turn of the helix. This meant that there were 10 bases per turn of the helix. They further reasoned that the 2.0 nm number might reflect the diameter of helix. When their scale model of a single stranded DNA helix predicted a helical diameter much less than 2.0 nm, they were able to model a double helix that more nearly met the 2.0 nm diameter requirement. In building their double helix, Watson and Crick realized that bases in opposing strands would come together to form H-bonds, holding the helix together. However, for their double helix to have a constant diameter of 2.0 nm, they also realized that the smaller pyrimidine bases, Thymine (T) and Cytosine (C), would have to H-bond to the larger purine bases, Adenine (A) and Guanosine (G).

Now to the question of how a “simple” DNA molecule could have the structural diversity needed to encode thousands of different polypeptides and proteins. In early studies, purified E. coli DNA was chemically hydrolyzed down to nucleotide monomers. The hydrolysis products contained nearly equal amounts of each base, reinforcing the notion that DNA was that simple molecule that could not encode genes. But Watson and Crick had private access to revealing data from Erwin Chargaff. Chargaff had determined the base composition of DNA isolated from different species, including E. coli. He found that the base composition of DNA from different species was not always equimolar, meaning that for some species, the DNA was not composed of equal amounts of each of the four bases (see some of this data below).

The mere fact that DNA from some species could have base compositions that deviated from equimolarity put to rest the argument that DNA had to be a very simple sequence. Finally, it was safe to accept that to accept the obvious, namely that DNA was indeed the “stuff of genes”.

Chargaff’s data also showed a unique pattern of base ratios. Although base compositions could vary between species, the A/T and G/C ratio was always one, for every species. Likewise the ratio of (A+C)/(G+T) and (A+G)/(C+T). From this information, Watson and Crick inferred that A (a purine) would H-bond with T (a pyrimidine), and G (a purine) would H-bond with C (a pyrimidine) in the double helix. When building their model with this new information, they also found H-bonding between the complementary bases would be maximal only if the two DNA strands were antiparallel, leading to the most stable structure of the double helix.

Watson and Crick published their conclusions about the structure of DNA in 1953 (Click here to read their seminal article: Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. Their article is also famous for predicting a semi-conservative mechanism of replication, something that had been predicted by Koltsov 26 years earlier, albeit based on intuition… and much less evidence!

Watson, Crick and Wilkins shared a Nobel Prize in 1962 for their work on DNA structure. Unfortunately, Franklin died in 1958 and Nobel prizes were not awarded posthumously. There is still controversy about why Franklin did not get appropriate credit for her role in the work. But she has been getting well-deserved, long-delayed recognition, including a university in Chicago named in her honor!

169 Unraveling the Structure of DNA

Confirmation of Watson & Crick’s suggestion of semiconservative replication came from Meselson and Stahl’s very elegant experiment, which tested the three possible models of replication shown below.

In their experiment, E. coli cells were grown for in medium containing 15N, a ‘heavy’ nitrogen isotope. After many generations, all of the DNA in the cells had become labeled with the heavy isotope. At that point, the 15N-tagged cells were placed back in medium containing the more common, ‘light’ 14N isotope and allowed to grow for exactly one generation.

Meselson and Stahl’s predictions and their experimental design are shown below.

Meselson and Stahl knew that 14N-labeled and 15N-labeled DNA would form separate bands after centrifugation on CsCl chloride density gradients. They tested their predictions by purifying and centrifuging the DNA from the 15N-labeled cells grown in 14N medium for one generation. They found that this DNA formed a single band with a density between that of 15N-labeled DNA and 14N-labeled DNA. This result eliminated a conservative model of DNA replication (as Watson and Crick also predicted. That left two possibilities: replication was either semiconservative or dispersive. The dispersive model was eliminated when DNA isolated from cells grown for a 2nd generation on 14N were shown to contain two bands of DNA on the CsCl density gradients.

170 Replication is Semiconservative


We understood from the start of the 20th century that chromosomes contained genes. Therefore, it becomes necessary to understand the relationship between chromosomes, chromatin, DNA and genes. As noted earlier, chromosomes are a specialized, condensed version of chromatin, with key structural features shown below.

We now know that the compact structure of a chromosome prevents damage to the DNA during cell division. This damage can occur when forces on centromeres generated by mitotic or meiotic spindle fibers pull chromatids apart. As the nucleus breaks down during mitosis or meiosis, late 19th century microscopists saw chromosomes condense from the dispersed cytoplasmic background. These chromosomes remained visible as they separated, moving to opposite poles of the cell during cell division. Such observations of chromosome behavior during cell division pointed to their role in heredity. A computer- colorized cell in mitosis is shown below.

It is possible to distinguish one chromosome from another by karyotyping. When cells in metaphase of mitosis are placed under pressure, they burst and the chromosomes spread apart. Such a chromosome spread is shown below.

By the early 1900s, the number, sizes and shapes of chromosomes were shown to be species-specific. What’s more, a close look at chromosome spreads revealed that chromosomes came in morphologically matched pairs. This was so reminiscent of Gregor Mendel’s paired hereditary factors that chromosomes were then widely accepted as the structural seat of genetic inheritance. Cutting apart micrographs like the one above and pairing the chromosomes by their morphology generates a karyotype. Paired human homologs are easily identified in the colorized micrograph below.

Captured in mitosis, all dividing human cells contain 23 pairs of homologous chromosomes. The karyotype is from a female; note the pair of homologous sex (“X”) chromosomes (lower right of the inset). X and Y chromosomes in males are not truly homologous. Chromosomes in the original spread and in the aligned karyotype stained with fluorescent antibodies to chromosome-specific DNA sequences, ‘light up’ the different chromosomes.

171 DNA, Chromosomes, Karyotypes & Gene Maps

Identification of amino acids essential for DNA binding and dimerization in p67SRF: implications for a novel DNA-binding motif

The serum response factor (p67SRF) binds to a palindromic sequence in the c-fos serum response element (SRE). A second protein, p62TCF binds in conjunction with p67SRF to form a ternary complex, and it is through this complex that growth factor-induced transcriptional activation of c-fos is thought to take place. A 90-amino-acid peptide, coreSRF, is capable for dimerizing, binding DNA, and recruiting p62TCF. By using extensive site-directed mutagenesis we have investigated the role of individual coreSRF amino acids in DNA binding. Mutant phenotypes were defined by gel retardation and cross-linking analyses. Our results have identified residues essential for either DNA binding or dimerization. Three essential basic amino acids whose conservative mutation severely reduced DNA binding were identified. Evidence which is consistent with these residues being on the face of a DNA binding alpha-helix is presented. A phenylalanine residue and a hexameric hydrophobic box are identified as essential for dimerization. The amino acid phasing is consistent with the dimerization interface being presented as a continuous region on a beta-strand. A putative second alpha-helix acts as a linker between these two regions. This study indicates that p67SRF is a member of a protein family which, in common with many DNA binding proteins, utilize an alpha-helix for DNA binding. However, this alpha-helix is contained within a novel domain structure.

Length of a Human DNA Molecule

The chromosomes in the nucleus of a cell contain all the information a cell needs to carry on its life processes. They are made up of a complex chemical (a nucleic acid) called deoxyribonucleic acid, or DNA for short. Scientist's decoding of the chemical structure of DNA has led to a simple conceptual understanding of genetic processes. DNA is the hereditary material of all cells. It is a double-stranded helical macromolecule consisting of nucleotide monomers with deoxyribose sugar and the nitrogenous bases adenine (A), cytosine (C), guanine (G), and thymine (T). In the chromosomes of a cell, DNA occurs as fine, spirally coiled threads that in turn coils around another, like a twisted ladder.

The DNA molecule is threaded so fine that it is only possible to see it under high powerful electron microscopes. To get a sense of exactly how long an uncoiled DNA molecule is compared to a typical cell, a cell is magnified 1000 times. At this scale, the total length of all the DNA in the cell's nucleus would be 3 km -- the equivalent distance of the Lincoln Memorial to the capital in Washington, DC.

The human genome comprises the information contained in one set of human chromosomes which themselves contain about 3 billion base pairs (bp) of DNA in 46 chromosomes (22 autosome pairs + 2 sex chromosomes). The total length of DNA present in one adult human is calculated by the multiplication of

(length of 1 bp)(number of bp per cell)(number of cells in the body)
(0.34 × 10 𕒽 m)(6 × 10 9 )(10 13 )
2.0 × 10 13 meters

That is the equivalent of nearly 70 trips from the earth to the sun and back.

2.0 × 10 13 meters = 133.691627 astronomical units
133.691627/2 = 66.8458135 round trips to the sun

On the average, a single human chromosome consists of DNA Molecule that is almost 5 centimeters.

How are DNA sequences used to make proteins?

DNA's instructions are used to make proteins in a two-step process. First, enzymes read the information in a DNA molecule and transcribe it into an intermediary molecule called messenger ribonucleic acid, or mRNA.

Next, the information contained in the mRNA molecule is translated into the "language" of amino acids, which are the building blocks of proteins. This language tells the cell's protein-making machinery the precise order in which to link the amino acids to produce a specific protein. This is a major task because there are 20 types of amino acids, which can be placed in many different orders to form a wide variety of proteins.

DNA's instructions are used to make proteins in a two-step process. First, enzymes read the information in a DNA molecule and transcribe it into an intermediary molecule called messenger ribonucleic acid, or mRNA.

Next, the information contained in the mRNA molecule is translated into the "language" of amino acids, which are the building blocks of proteins. This language tells the cell's protein-making machinery the precise order in which to link the amino acids to produce a specific protein. This is a major task because there are 20 types of amino acids, which can be placed in many different orders to form a wide variety of proteins.

Gene mutations

Mutations occur when the number or order of bases in a gene is disrupted. Nucleotides can be deleted, doubled, rearranged, or replaced, each alteration having a particular effect. Mutation generally has little or no effect, but, when it does alter an organism, the change may be lethal or cause disease. A beneficial mutation will rise in frequency within a population until it becomes the norm.

For more information on the influence of genetic mutations in humans and other organisms, see human genetic disease and evolution.

The Editors of Encyclopaedia Britannica This article was most recently revised and updated by Adam Augustyn, Managing Editor, Reference Content.

New molecular tool precisely edits mitochondrial DNA

The genome in mitochondria -- the cell's energy-producing organelles -- is involved in disease and key biological functions, and the ability to precisely alter this DNA would allow scientists to learn more about the effects of these genes and mutations. But the precision editing technologies that have revolutionized DNA editing in the cell nucleus have been unable to reach the mitochondrial genome.

Now, a team at the Broad Institute of MIT and Harvard and the University of Washington School of Medicine has broken this barrier with a new type of molecular editor that can make precise C* G-to-T* A nucleotide changes in mitochondrial DNA. The editor, engineered from a bacterial toxin, enables modeling of disease-associated mitochondrial DNA mutations, opening the door to a better understanding of genetic changes associated with cancer, aging, and more.

The work is described in Nature, with co-first authors Beverly Mok, a graduate student from the Broad Institute and Harvard University, and Marcos de Moraes, a postdoctoral fellow at the University of Washington (UW).

The work was jointly supervised by Joseph Mougous, UW professor of microbiology and an investigator at the Howard Hughes Medical Institute (HHMI), and David Liu, the Richard Merkin Professor and director of the Merkin Institute of Transformative Technologies in Healthcare at the Broad Institute, professor of chemistry and chemical biology at Harvard University, and HHMI investigator.

"The team has developed a new way of manipulating DNA and used it to precisely edit the human mitochondrial genome for the first time, to our knowledge -- providing a solution to a long-standing challenge in molecular biology," said Liu. "The work is a testament to collaboration in basic and applied research, and may have further applications beyond mitochondrial biology."

Agent of bacterial warfare

Most current approaches to studying specific variations in mitochondrial DNA involve using patient-derived cells, or a small number of animal models, in which mutations have occurred by chance. "But these methods pose major limitations, and creating new, defined models has been impossible," said co-author Vamsi Mootha, institute member and co-director of the Metabolism Program at Broad. Mootha is also an HHMI investigator and professor of medicine at Massachusetts General Hospital.

While CRISPR-based technologies can rapidly and precisely edit DNA in the cell nucleus, greatly facilitating model creation for many diseases, these tools haven't been able to edit mitochondrial DNA because they rely on a guide RNA to target a location in the genome. The mitochondrial membrane allows proteins to enter the organelle, but is not known to have accessible pathways for transporting RNA.

One piece of a potential solution arose when the Mougous lab identified a toxic protein made by the pathogen Burkholderia cenocepacia. This protein can kill other bacteria by directly changing cytosine (C) to uracil (U) in double-stranded DNA.

"What is special about this protein, and what suggested to us that it might have unique editing applications, is its ability to target double-stranded DNA. All previously described deaminases that target DNA work only on the single-stranded form, which limits how they can be used as genome editors," said Mougous. His team determined the structure and biochemical characteristics of the toxin, called DddA.

"We realized that the properties of this 'bacterial warfare agent' could allow it to be paired with a non-CRISPR-based DNA-targeting system, raising the possibility of making base editors that do not rely on CRISPR or on guide RNAs," explained Liu. "It could enable us to finally perform precision genome editing in one of the last corners of biology that has remained untouchable by such technology -- mitochondrial DNA."

"Taming the beast"

The team's first major challenge was to eliminate the toxicity of the bacterial agent -- what Liu described to Mougous as "taming the beast" -- so that it could edit DNA without damaging the cell. The researchers divided the protein into two inactive halves that could edit DNA only when they combined.

The researchers tethered the two halves of the tamed bacterial toxin to TALE DNA-binding proteins, which can locate and bind a target DNA sequence in both the nucleus and mitochondria without the use of a guide RNA. When these pieces bind DNA next to each other, the complex reassembles into its active form, and converts C to U at that location -- ultimately resulting in a C* G-to-T* A base edit. The researchers called their tool a DddA-derived cytosine base editor (DdCBE).

The team tested DdCBE on five genes in the mitochondrial genome in human cells and found that DdCBE installed precise base edits in up to 50 percent of the mitochondrial DNA. They focused on the gene ND4, which encodes a subunit of the mitochondrial enzyme complex I, for further characterization. Mootha's lab analyzed the mitochondrial physiology and chemistry of the edited cells and showed that the changes affected mitochondria as intended.

"This is the first time in my career that we've been able to engineer a precise edit in mitochondrial DNA," said Mootha. "It's a quantum leap forward -- if we can make targeted mutations, we can develop models to study disease-associated variants, determine what role they actually play in disease, and screen the effects of drugs on the pathways involved."

Future developments

One goal for the field now will be to develop editors that can precisely make other types of genetic changes in mitochondrial DNA.

"A mitochondrial genome editor has the long-term potential to be developed into a therapeutic to treat mitochondrial-derived diseases, and it has more immediate value as a tool that scientists can use to better model mitochondrial diseases and explore fundamental questions pertaining to mitochondrial biology and genetics," Mougous said.

The team added that some features of DdCBE, such as its lack of RNA, may also be attractive for other gene-editing applications beyond the mitochondria.

This work was supported in part by the Merkin Institute of Transformative Technologies in Healthcare, NIH (R01AI080609, U01AI142756, RM1HG009490, R35GM122455, R35GM118062, and P30DK089507), Defense Threat Reduction Agency (1-13-1-0014), and University of Washington Cystic Fibrosis Foundation


This study clarifies the individual diversification history of the sections Cyclobalanopsis, Ilex, and Quercus, as manifest by their chloroplast diversity. The results highlight the importance of geological events and ecological adaptive capacity for the spatial genetic pattern of oak clades and provides detailed insights into the formation mechanism of their contemporary diversity. Further insights into the divergence history of this groups will originate from a combination of whole-chloroplast sequencing and nuclear genetic data of deeper population sampling. Finally, association mapping can be used to investigate the relationship between genetic polymorphisms and environment, which will help to identify the relative effects of the climatic, edaphic variation, and migration history on genetic variation in multiple clades.

Double helix structure

The 1953 discovery of the shape of DNA, known as a double helix, is mainly credited to Francis Crick, James Watson, Rosalind Franklin and Maurice Wilkins. It is rather like a spiral staircase or twisted ladder in which every rung is a bond between matching “bases” on its two strands. But it was the work of many researchers throughout the decades that followed that determined what DNA codes for, how it is read, and how it is copied and passed on to new cells and future generations.

The order of DNA’s chemical bases form the genetic code. These come in four types: adenine (A), guanine (G), cytosine (C) and thymine (T). The bases always pair up with the same complementary compound on the other strand of DNA: A with T, and C with G.


Three bases in a row together code for a specific amino acid, the basic building blocks of proteins. ACT, for instance, tells cells to make an amino acid called threonine. In this way, each gene tells the cell’s machinery how to make a vast array of proteins.

There is a lot of DNA packed in to every human cell. If you stretched it out, it would be almost two metres long. So your three billion bases, which are more than 99 per cent the same as everyone else’s, need to be packaged up neatly. The coiled strands of your DNA are thus organised into chromosomes. Humans usually have 46 of these in each cell, 23 from each parent. The number varies in other animals: fruit flies have only eight and the black mulberry plant has 308, for example. Mitochondrial DNA is entirely inherited from an organism’s mother.

What makes DNA so amazing is that it can copy itself, which allows all known organisms to function, grow and reproduce. Each strand of DNA in the double helix can serve as a template for duplicating its sequence of bases, enabling new cells to be exact copies of existing ones – although mutations often occur as a result of small errors in this process.


Amino acids are typically drawn either with no charges or with a plus and minus charge (see figure 13.1.1). When an amino acid contains both a plus and a minus charge in the "backbone", it is called a zwitterion and has an overall neutral charge. The zwitterion of an amino acid exists at a pH equal to the isoelectric point. Each amino acid has its own pI value based on the properties of the amino acid. At pH values above or below the isoelectric point, the molecule will have a net charge which depends on its pI value as well as the pH of the solution in which the amino acid is found.

PH < pI

When pH is less than pI, there is an excess amount of (ce) in solution. The excess (ce) is attracted to the negatively charged carboxylate ion resulting in its protonation. The carbohydrate ion is protonated, making it neutral, leaving only a positive charge on the amine group. Overall, the amino acid will have a charge of (+1).

PH > pI

When pH is greater than pI, there is an excess amount of (ce) in solution. The excess (ce) is attracted to the positively charged amine group resulting in the removal of an (ce) ion to form (ce). The amine group has a neutral charge leaving only a negative charge on the carboxylate group. Overall, the amino acid will have a charge of (-1).

Figure (PageIndex<4>): Amino acid side chains and pI values.

  1. Identify the amino acid pictured below.
  2. Find the pI value for the amino acid.
  3. Determine how the amino acid will exist at pH = 3.52
  4. Determine how the amino acid will exist at pH = 9.34
  5. Determine how the amino acid will exist at pH = 5.02

a. Look at the side chain to identify the amino acid. The side chain contains (ce<-CH_2SH>) which matches the structure of cysteine.

b. The pI values for amino acids are found in the table of amino acids. For cysteine, pI = 5.02.

c. At pH = 3.52, the (ce) concentration is high (low pH = more acidic = more (ce)). Therefore the (ce) will add to the carboxylate ion and neutralize the negative charge. The amino acid will have a positive charge on the amine group left and will have an overall charge of (+1).

d. At pH = 9.34, the (ce) concentration is high (high pH = more basic = less (ce) = more (ce)). Therefore the (ce) will be attracted to the positively charged amine group and will "steal" an (ce) from it. As a result, the only remaining charge will be on the carboxylate ion so the amino acid will have a (-1) charge.

e. At pH = 5.02, the pH = pI so the amino acid will exist as the zwitterion with both the positive and negative charges as shown above.

Watch the video: : DNA Structure (January 2022).