In our discussion of DNA and RNA synthesis up to this point, the role of the template strand has been reserved for DNA. However, some enzymes use an RNA template for nucleic acid synthesis. With the important exception of viruses with an RNA genome, these enzymes play only supporting roles in information pathways. RNA viruses are the source of most characterized RNA-dependent polymerases, although some eukaryotes also use these enzymes to amplify double-stranded RNAs used in RNA interference.
The existence of RNA replication requires an elaboration of the central dogma — the notion that genetic information flows only from DNA to RNA to proteins. RNA-dependent polymerases allow the genetic information stored in RNA to be replicated and reverse transcribed into DNA. The enzymes of the RNA replication process have profound implications for investigations into the nature of self-replicating molecules that may have existed in prebiotic times.
Certain RNA viruses that infect animal cells carry within the viral particle an RNA-dependent DNA polymerase called reverse transcriptase. On infection, the single-stranded RNA viral genome (∼10,000 nucleotides) and the enzyme enter the host cell. The reverse transcriptase first catalyzes the synthesis of a DNA strand complementary to the viral RNA (Fig. 26-29), then degrades the RNA strand of the viral RNA-DNA hybrid and replaces it with DNA. The resulting duplex DNA often becomes incorporated into the genome of the eukaryotic host cell. These integrated (and dormant) viral genes can be activated and transcribed, and the gene products — viral proteins and the viral RNA genome itself — are packaged as new viruses. The RNA viruses that contain reverse transcriptases are known as retroviruses (retro is the Latin prefix for “backward”).
The existence of reverse transcriptases in RNA viruses was predicted by Howard Temin in 1962, and the enzymes were ultimately detected by Temin and, independently, by David Baltimore in 1970. Their discovery aroused much attention as dogma-shaking proof that genetic information can flow “backward” from RNA to DNA.
Retroviruses typically have three genes: gag (a name derived from the historical designation group associated antigen), pol, and env (Fig. 26-30). The transcript that contains gag and pol is translated into a long “polyprotein,” a single large polypeptide that is cleaved into six proteins with distinct functions. The proteins derived from the gag gene make up the interior core of the viral particle. The pol gene encodes the protease that cleaves the long polypeptide, an integrase that inserts the viral DNA into the host chromosomes, and reverse transcriptase. Many reverse transcriptases have two subunits, α and β. The pol gene specifies the β subunit (), and the α subunit () is simply a proteolytic fragment of the β subunit. The env gene encodes the proteins of the viral envelope. At each end of the linear RNA genome are long terminal repeat (LTR) sequences of a few hundred nucleotides. Transcribed into the duplex DNA, these sequences facilitate integration of the viral chromosome into the host DNA and contain promoters for viral gene expression.
Reverse transcriptases catalyze three different reactions: (1) RNA-dependent DNA synthesis, (2) RNA degradation, and (3) DNA-dependent DNA synthesis. Each transcriptase is most active with the RNA of its own virus, but each can be used experimentally to make DNA complementary to a variety of RNAs. The DNA and RNA synthesis and RNA degradation activities use separate active sites on the protein. For DNA synthesis to begin, the reverse transcriptase requires a primer, a cellular tRNA obtained during an earlier infection and carried in the viral particle. This tRNA is base-paired at its end with a complementary sequence in the viral RNA. The new DNA strand is synthesized in the direction, as in all RNA and DNA polymerase reactions. Reverse transcriptases, like RNA polymerases, do not have proofreading exonucleases. They generally have error rates of about 1 per 20,000 nucleotides added. An error rate this high is extremely unusual in DNA replication and seems to be a characteristic of most enzymes that replicate the genomes of RNA viruses. A consequence is a higher mutation rate and a faster rate of viral evolution, which is a factor in the frequent appearance of new strains of disease-causing retroviruses.
Reverse transcriptases have become important reagents in the study of DNA-RNA relationships and in DNA cloning techniques. They make possible the synthesis of DNA complementary to an mRNA template, and synthetic DNA prepared in this manner, called complementary DNA (cDNA), can be used to clone cellular genes (see Fig. 9-13).
Retroviruses have featured prominently in the molecular understanding of cancer. Most retroviruses do not kill their host cells but remain integrated in the cellular DNA, replicating when the cell divides. Some retroviruses, classified as RNA tumor viruses, contain an oncogene that can cause the cell to grow abnormally. The first retrovirus of this type to be studied was the Rous sarcoma virus (also called avian sarcoma virus; Fig. 26-31), named for F. Peyton Rous, who studied chicken tumors now known to be caused by this virus. Since the initial discovery of oncogenes by Harold Varmus and Michael Bishop, many dozens of such genes have been found in retroviruses.
The human immunodeficiency virus (HIV), which causes acquired immune deficiency syndrome (AIDS), is a retrovirus. Identified in 1983, HIV has an RNA genome with standard retroviral genes along with several other unusual genes (Fig. 26-32). Unlike many other retroviruses, HIV kills many of the cells it infects (principally T lymphocytes) rather than causing tumor formation. This gradually leads to suppression of the immune system in the host organism. The reverse transcriptase of HIV is even more error-prone than other known reverse transcriptases — 10 times more so — resulting in high mutation rates in this virus. One or more errors are generally made every time the viral genome is replicated, so any two viral RNA molecules are likely to differ.
Many modern vaccines for viral infections consist of one or more coat proteins of the virus, produced by methods described in Chapter 9. These proteins are not infectious on their own but stimulate the immune system to recognize and resist subsequent viral invasions (Chapter 5). Because of the high error rate of the HIV reverse transcriptase, the env gene in this virus (along with the rest of the genome) undergoes very rapid mutation, complicating the development of an effective vaccine. However, repeated cycles of cell invasion and replication are needed to propagate an HIV infection, so inhibition of viral enzymes offers the most effective therapy currently available. The HIV protease is targeted by a class of drugs called protease inhibitors (see Fig. 6-29). Reverse transcriptase is the target of some additional drugs widely used to treat HIV-infected individuals (Box 26-3).
Some well-characterized eukaryotic DNA transposons from sources as diverse as yeast and fruit flies have a structure very similar to that of retroviruses; these are sometimes called retrotransposons (Fig. 26-33). Retrotransposons encode an enzyme homologous to the retroviral reverse transcriptase, and their coding regions are flanked by LTR sequences. They transpose from one position to another in the cellular genome by means of an RNA intermediate, using reverse transcriptase to make a DNA copy of the RNA, followed by integration of the DNA at a new site. Most transposons in eukaryotes use this mechanism for transposition, distinguishing them from bacterial transposons, which move as DNA directly from one chromosomal location to another (see Fig. 25-41).
Two horizontal bars are shown. The top bar is labeled T y element (italicized Saccharomyces end italics) and the bottom bar is labeled italicized Copia end italics element (italicized Drosophila end italics). The top bar, T y element (italicized Saccharomyces end italics), has labeled regions with additional labeling above the bar. From left to right, the top bar has a purple region labeled delta beneath (L T R), a gray region labeled T Y A beneath (italicized g a g end italics), a very narrow gray vertical bar, a gray region labeled T Y B beneath (italicized p o l end italics), and a purple region labeled delta beneath (L T R). The bottom bar, italicized Copia end italics element (italicized Drosophila end italics), has a purple region beneath the label L T R and then a gray region labeled from left to right with italicized g a g, P R, I N T, R T, and R H. There is another purple bar beneath the label L T R at the right end.
Retrotransposons lack an env gene and so cannot form viral particles. They can be thought of as defective viruses, trapped in cells. Comparisons between retroviruses and eukaryotic transposons suggest that reverse transcriptase is an ancient enzyme that predates the evolution of multicellular organisms.
Many group I and group II introns are also mobile genetic elements. In addition to their self-splicing activities, they encode DNA endonucleases that promote their movement. During genetic exchanges between cells of the same species, or when DNA is introduced into a cell by parasites or by other means, these endonucleases promote insertion of the intron into an identical site in another DNA copy of a homologous gene that does not contain the intron, in a process termed homing (Fig. 26-34). Whereas group I intron homing is DNA-based, group II intron homing occurs through an RNA intermediate. The endonucleases of the group II introns have associated reverse transcriptase activity. The proteins can form complexes with the intron RNAs themselves, after the introns are spliced from the primary transcripts. Because the homing process involves insertion of the RNA intron into DNA and reverse transcription of the intron, the movement of these introns has been called retrohoming. Over time, every copy of a particular gene in a population may acquire the intron. Much more rarely, the intron may insert itself into a new location in an unrelated gene. If this event does not kill the host cell, it can lead to the evolution and distribution of an intron in a new location. The structures and mechanisms used by mobile introns support the idea that at least some introns originated as molecular parasites whose evolutionary past can be traced to retroviruses and transposons.
Part a is labeled production of homing endonuclease. A double stranded piece of D N A labeled D N A from gene italicized X end italics, allele italicized a end italics, is blue at the edges. In the center, each strand has a red piece with gray pieces on each side. This center region is labeled Group Roman numeral 1 intron. An arrow labeled transcription points downward. This yield a primary transcript, which is a single horizontal strand that is green on each side with a red piece in the center that has yellow pieces to its left and right. An arrow labeled splicing points downward and splits to indicate two products. The left-hand product is a green horizontal strand with an arrow labeled translation that points down to gene italicized X end italics product. The right-hand product is a spliced group Roman numeral 1 intron that is red in the center and yellow on each side. An arrow labeled translation points down to show that it yields a homing endonuclease, shown as a circle with a triangular cutout on top. Part b is labeled homing. Two horizontal blue lines are labeled D N A for gene italicized X end italics, allele italicized b end italics, no intron. An arrow points down labeled blue highlighted homing endonuclease accompanied by a red circle with a triangular cutout. This yields horizontal D N A that has been cut with two paired shorter strands on the let and two paired longer strands on the right. An arrow points down accompanied by a curved arrow showing the addition of D N A labeled gene italicized X end italics, allele italicized a end italics with intron. This is a set of two horizontal strands that are blue at the edges. Each contains a red central region that is gray on each side. An arrow pointing downward is labeled double-strand break repair. This yields two double stranded pieces of D N A. Each is blue at the edges with a red center piece with gray pieces to its left and right. The top double stranded piece is labeled italicized a end italics with intron. The bottom double stranded piece is labeled italicized b end italics with intron. Part c is labeled retrohoming. Two lines representing double stranded D N A are labeled D N A for gene italicized Y end italics, allele italicized a end italics, donor. The lines are blue at the edges. Each contains a red central region with gray pieces to each side. This central region is labeled group Roman numeral 2 intron. An arrow pointing downward is labeled transcription. A second arrow pointing downward is labeled splicing. This yields a spliced intro, shown as a horizontal bar with yellow on the left, red in the center, and yellow to the right. An arrow labeled translation points right to show a red structure resembling two triangles attached by their points. This is labeled endonuclease / reverse transcriptase. An arrow pointing downward is labeled reverse splicing. A curved line shows the addition of D N A for gene italicized Y end italics, allele italicized b end italics, recipient. This is shown as two horizontal blue lines. A red structure resembling two triangles attached by their points is shown by the arrow and is labeled blue highlighted endonuclease. This yields two blue horizontal strands of D N A that have changed. The bottom strand has a break near the center. The top strand has a loop above the broken region of the bottom strand. This loop has yellow sides and a red top. An arrow points down accompanied by a red structure resembling two triangles attached by their points labeled blue highlighted reverse transcriptase. This yields double stranded D N A with two different strands. Both strands are blue on the ends with a red piece in the center. The top piece has yellow regions to the left and right of the red center. The bottom piece has gray regions to the left and right of the red center. An arrow pointing downward is labeled R N A replaced by D N A, ligation. This yields two strands that are blue on the ends with red centers that both have gray regions to the left and right of the red center. Accompanying text reads, italicized b end italics with intron.
Telomeres, the structures at the ends of linear eukaryotic chromosomes (see Fig. 24-7), generally consist of many tandem copies of a short oligonucleotide sequence. This sequence usually has the form in one strand and in the complementary strand, where x and y are typically in the range of 1 to 4 (p. 890). Telomeres vary in length from a few dozen base pairs in some ciliated protozoans to tens of thousands of base pairs in mammals. The TG strand is longer than its complement, leaving a region of single-stranded DNA of up to a few hundred nucleotides at the end.
The ends of a linear chromosome are not readily replicated by cellular DNA polymerases. DNA replication requires a template and primer, and beyond the end of a linear DNA molecule no template is available for the pairing of an RNA primer. Without a special mechanism for replicating the ends, chromosomes would be shortened somewhat in each cell generation. The enzyme telomerase, discovered by Carol Greider and Elizabeth Blackburn, solves this problem by adding telomeres to chromosome ends.
The discovery and purification of this enzyme provided insight into a reaction mechanism that is remarkable and unprecedented. Telomerase, like some other enzymes described in this chapter, is an RNP that contains both RNA and protein components. The RNA component in humans is about 150 nucleotides long and contains about 1.5 copies of the appropriate telomere repeat. This region of the RNA acts as a template for synthesis of the strand of the telomere. Telomerase thereby acts as a cellular reverse transcriptase that provides the active site for RNA-dependent DNA synthesis. Unlike retroviral reverse transcriptases, telomerase copies only a small segment of RNA that it carries within itself. Telomere synthesis requires the end of a chromosome as primer and proceeds in the usual direction. Having synthesized one copy of the repeat, the enzyme repositions to resume extension of the telomere (Fig. 26-35a).
Part a shows a blue strand of D N A with its 5 prime end on the left and its 3 prime end within a light brown rectangular telomerase to the right. The sequence is T T G G G G T T G G G G T T G G G G T T G G G G T T G G G G T T G and it is labeled T G primer on the left side. It enters the primer midway through the last sequence of four G nucleotides. The short complementary strand begins at 3 prime beneath the top strand. Its sequence is A A C C C C t prime. Within telomerase, there is a curved green R N A that has its 3 prime end at the bottom center, curves up to the left, runs horizontally across the top, and then curves back down to end at the 5 prime end next to the 3 prime end in the bottom center. The horizontal top piece has the sequence A A C C C C A A C. The left-hand A A C are aligned with the last three bases of the D N A strand above, T T G. An arrow points down accompanied by a curved line showing the addition of d G T P, d T T P and another arrow branches off to show the loss of P P subscript i end subscript. This yields a similar structure in which bases have been added so the top D N A strand is now aligned with the entire sequence at the top of the green R N A. T T G G G G T T G at the right end of the D N A is paired with A A C C C C A A C in the R N A. An arrow points downward accompanied by text in a gray box reading, telomerase translocates. This yields a similar structure in which telomerase has moved to the right, so now only the terminal T T G of the D N A strand are paired with the left-hand A A C of the R N A and there is no D N A above the rest of the horizontal piece of R N A. An arrow points down accompanied by a curved line showing the addition of d G T P, d T T P and another arrow branches off to show the loss of P P subscript i end subscript. This yields a similar structure in which bases have been added so that the top D N A strand now is paired with the entire horizontal portion of the R N A piece again. An arrow points down accompanied by text in two gray boxes, one on each side of the arrow. The left-hand box reads, Telomerase dissociates. R N A primase synthesizes an R N A primer at the end of the new telomere strand. The right-hand box reads, D N A polymerase fills in the intervening gap; D N A ligase seals the final nick. This yields the same top blue strand with the sequence 5 prime T T G G G G T T G G G G T T G G G G T T G G G G T T G G G G T T G G G G T T G G G G T T G 3 prime. The bottom strand has the same blue portion on the left, which is 3 prime A A C C C C. It now has an orange portion to the right that reads, A A C C C C A A C C C C A A C C C C A A C C C C A A C C C. The right end is a green box labeled R N A primer that ends at the 5 prime end. An arrow points down accompanied by a gray box containing text reading, The R N A primer is removed by an R N ase. This yields a similar structure in which the green R N A primer has been removed. An arrow points down accompanied by a gray box containing text reading, The single-stranded part of the telomere end is protected by telomere-binding proteins. This shows the same molecule but there are four blue spheres evenly spaced behind is labeled telomere duplex D N A-binding proteins. To the right, a yellow rectangle with a cutout at its upper right corner labeled T R F 1 fits against a similar tab rectangle with a cutout at its lower left corner labeled T R F 2. Part b shows two blue strands of D N A that run horizontally, then open to form a bubble with T R F 1 and T R F 2 on its bottom center. T R F 1 is a light yellow rectangle with a cutout at the upper right and T R F 2 is a similar tan rectangle with a cutout at the lower left, so they fit together. The two strands come together on the other side of this bubble and loop around to form a circle that ends just above the upper left of the bubble. There are eight blue circles representing telomere duplex D N A-binding proteins along the length of the circle. The top strand, which is the inner strand of the circle, is labeled T G strand. The bottom strand, which is the outer strand of the circle, is labeled C A strand.
After extension of the strand by telomerase, the complementary strand is synthesized by cellular DNA polymerases, starting with an RNA primer (see Fig. 25-11). The single-stranded region is protected by specific binding proteins in many lower eukaryotes, especially those species with telomeres of less than a few hundred base pairs. In higher eukaryotes (including mammals) with telomeres many thousands of base pairs long, the single-stranded end is sequestered in a specialized structure called a T loop (Fig. 26-35b). The single-stranded end is folded back and paired with its complement in the double-stranded portion of the telomere. The formation of a T loop involves invasion of the end of the telomere’s single strand into the duplex DNA, perhaps by a mechanism similar to the initiation of homologous genetic recombination (see Fig. 25-34). In mammals, the looped DNA is bound by two proteins, TRF1 and TRF2, with the latter protein involved in formation of the T loop. T loops protect the ends of chromosomes, making them inaccessible to nucleases and the enzymes that repair double-strand breaks.
In protozoans (such as Tetrahymena), loss of telomerase activity results in a gradual shortening of telomeres with each cell division, ultimately leading to the death of the cell line. A similar link between telomere length and cell senescence (cessation of cell division) has been observed in humans. In germ-line cells, which contain telomerase activity, telomere lengths are maintained; in somatic cells, which lack telomerase, they are not. There is a linear, inverse relationship between the length of telomeres in cultured fibroblasts and the age of the individual from whom the fibroblasts were taken: telomeres in human somatic cells gradually shorten as an individual ages. If the telomerase reverse transcriptase is introduced into human somatic cells in vitro, telomerase activity is restored and the cellular life span increases markedly.
Is the gradual shortening of telomeres a key to the aging process? Is our natural life span determined by the length of the telomeres we are born with? Further research in this area should yield some fascinating insights.
Apart from the retroviruses, the RNA viruses include some E. coli bacteriophages as well as eukaryotic viruses such as the influenza virus and coronaviruses that cause SARS or COVID-19. The single-stranded RNA chromosomes of these viruses also function as mRNAs for the synthesis of viral proteins. They are replicated in the host cell by an RNA-dependent RNA polymerase (RNA replicase). All RNA viruses — with the exception of retroviruses — must encode a protein with RNA-dependent RNA polymerase activity, either because the host cells lack such an enzyme or because the RNA genome structure of a virus imposes specialized enzymatic requirements.
The RNA replicase isolated from E. coli cells infected with the bacteriophage Qβ catalyzes the formation of an RNA complementary to the viral RNA, in a reaction equivalent to that catalyzed by DNA-dependent RNA polymerases. New RNA strand synthesis proceeds in the direction by a chemical mechanism identical to that used in all other nucleic acid synthetic reactions that require a template. RNA replicase requires RNA as its template and will not function with DNA. It lacks a separate proofreading endonuclease activity and has an error rate similar to that of RNA polymerase. Unlike the DNA and RNA polymerases, RNA replicases are specific for the RNA of their own virus; the RNAs of the host cell are generally not replicated. This explains how RNA viruses are preferentially replicated in the host cell, which contains many other types of RNA.
RNA-dependent RNA polymerases are not limited to viruses. Enzymes of this type are found in plants, protists, fungi, and some simpler animals, but not in insects or mammals. Those found in the genomes of eukaryotes generally play a role in the metabolism of another class of small RNAs, called small interfering RNAs (siRNAs), which participate in gene regulation (Chapter 28).
Even though viral RNA replication and reverse transcription, retrohoming, and telomere synthesis represent a diverse array of biological processes, the polymerases involved in each pathway bear remarkable similarities in structure (Fig. 26-36). In all cases, palm and finger domains are used to grip the duplexed template and primer nucleic acids within the active site. Amazingly, the group II intron retrohoming reverse transcriptase is structurally most closely related to a protein component of the spliceosome that helps scaffold its RNA active site. In addition to identical splicing chemistry and active site features (see Fig. 26-17), this provides further evidence that the spliceosome evolved from a group II intron–like ancestor.
Part a is labeled group Roman numeral 2 intron retrohoming. It has two red helices curving up on the left two brown beta strands with a loop connecting them near the center labeled finger, a large green section of helices connected by strands with a few beta strands from the lower right to top center with the very top labeled palm, and two blue helices just to the right of the red helices at the far left with the right-hand blue helix overlapping the top of a green helix. The N terminus is at the upper left at the end of a red strand and the C terminus is at the lower right at the end of the green strand. Part b is labeled nuclear pre-m R N A splicing. It is very similar but there is a brown helix at the upper left, there are two small blue helices across the top and the N terminus is at the end of a blue strand in the top center, the green portion has two diagonal helices on the left and then a thicker region of helices and beta strands on the right, and the finger is very small. Part c is labeled telomere extension. It is similar to part b except that there are no red helices, there is vertical light blue helix on the left center with the N terminus at the bottom, to beta sheets extending to the right center form a conspicuous finger, and the green portion has fewer parts and is less substantial. Part d is labeled viral reverse transcription and is similar to part c except that the finger is much less substantial and consists of a thready loop at the bottom center, although it has some strands to the left accompanied by additional coils and strands at the left center. There are no red or purple helices. Part e is labeled viral replication. It resembles part a with the addition of more purple helices, more substantial brown beta sheets around the finger, and a light blue almost vertical helix at the bottom left center that is relatively short.