9.1 Studying Genes and Their Products in Chapter 9 DNA-Based Information Technologies

Genes Can Be Isolated by DNA Cloning

A clone is an identical copy. This term originally applied to cells of a single type, isolated and allowed to reproduce to create a population of identical cells. When applied to DNA, a clone represents many identical copies of a particular gene segment. In brief, our researcher must separate the gene from the larger chromosome, attach it to a much smaller piece of carrier DNA, and allow microorganisms to make many copies of it. This is the process of DNA cloning. The result is selective amplification of a particular gene or DNA segment so that its genetic information may be studied and utilized. Classically, the cloning of DNA from any organism entails five general procedures:

Obtaining the DNA segment to be cloned. Enzymes called restriction endonucleases act as precise molecular scissors, recognizing specific sequences in DNA and cleaving genomic DNA into smaller fragments suitable for cloning. Alternatively, genomic DNA can be sheared randomly into fragments of a desired size. Since the sequence of targeted genomic regions is often known (available in databases), DNA segments to be cloned are most often amplified by the polymerase chain reaction (PCR) or are simply synthesized (both methods are described in Chapter 8).
Selecting a small molecule of DNA capable of autonomous replication. These small DNAs are called cloning vectors (a vector is a carrier or delivery agent). Most cloning vectors used in the laboratory are modified versions of naturally occurring small DNA molecules found in bacteria or eukaryotes. Viral DNAs may also play this role.
Joining two DNA fragments covalently. The enzyme DNA ligase links the cloning vector to the DNA fragment to be cloned. Composite DNA molecules of this type, comprising covalently linked segments from two or more sources, are called recombinant DNAs.
Moving recombinant DNA from the test tube to a host organism. The host organism provides the enzymatic machinery for DNA replication.
Selecting or identifying host cells that contain recombinant DNA. The cloning vector generally has features that allow the host cells to survive in an environment in which cells lacking the vector would die. Cells containing the vector are thus “selectable” in that environment.

The methods used to accomplish these and related tasks are collectively referred to as recombinant DNA technology or, more informally, genetic engineering.

Much of our initial discussion focuses on DNA cloning in the bacterium Escherichia coli, the first organism used for recombinant DNA work and still the most common host cell. E. coli has many advantages: its DNA metabolism (like many other of its biochemical processes) is well understood; many naturally occurring cloning vectors associated with E. coli, such as plasmids and bacteriophages (bacterial viruses; also called phages), are readily available; and techniques are available for moving DNA expeditiously from one bacterial cell to another. The principles discussed here are broadly applicable to DNA cloning in other organisms, a topic discussed more fully later in the section.

Restriction Endonucleases and DNA Ligases Yield Recombinant DNA

A set of enzymes (Table 9-1) made available through decades of research on nucleic acid metabolism is indispensable for generating and propagating a recombinant DNA molecule (Fig. 9-1). First, restriction endonucleases (also called restriction enzymes) recognize and cleave DNA at specific sequences (recognition sequences or restriction sites) to generate a set of smaller fragments. Second, the DNA fragment to be cloned is joined to a suitable cloning vector by using DNA ligases to link the DNA molecules together. The recombinant vector is then introduced into a host cell, which amplifies the fragment in the course of many generations of cell division.

TABLE 9-1 Some Enzymes Used in Recombinant DNA Technology
Enzyme(s)	Function
Type II restriction endonucleases	Cleave DNA molecules at specific base sequences
DNA ligase	Joins two DNA molecules or fragments
DNA polymerase I (E. coli)	Fills gaps in duplexes by stepwise addition of nucleotides to $3^{'}$ $3 prime$ ends
Reverse transcriptase	Makes a DNA copy of an RNA molecule
Polynucleotide kinase	Adds a phosphate to the $5^{'}$ $5 prime$ -OH end of a polynucleotide to label it or to permit ligation
Terminal transferase	Adds homopolymer tails to the $3^{'}$ $3 prime$ -OH ends of a linear duplex
Exonuclease III	Removes nucleotide residues from the $3^{'}$ $3 prime$ ends of a DNA strand
Bacteriophage λ exonuclease	Removes nucleotides from the $5^{'}$ $5 prime$ ends of a duplex to expose single-stranded $3^{'}$ $3 prime$ ends
Alkaline phosphatase	Removes terminal phosphates from the $5^{'}$ $5 prime$ end or $3^{'}$ $3 prime$ end (or both)

A figure shows the steps of D N A cloning. — FIGURE 9-1 Schematic illustration of DNA cloning. A cloning vector and eukaryotic chromosomes are separately cleaved with the same restriction endonuclease. (A single chromosome is shown here for simplicity.) The fragments to be cloned are then ligated to the cloning vector. The resulting recombinant DNA (only one recombinant vector is shown here) is introduced into a host cell, where it can be propagated (cloned). Note that this drawing is not to scale: the size of the *E. coli* chromosome relative to that of a typical cloning vector (such as a plasmid) is much greater than depicted here.

A circular cloning vector (plasmid) is shown at the top next to a coiled strand labeled eukaryotic chromosome. The coiled strand has a small red segment. Step 1: Cloning vector is cleaved with restriction endonuclease. This produces an uneven cut in the circular plasmid so that one side of the opening has a small rectangular projection from the bottom and the other side of the opening has a small rectangular projection from the top. Step 2: D N A fragment of interest is obtained by cleaving chromosome with the same restriction endonuclease. Four blue fragments of eukaryotic chromosome are shown, all with a rectangular projection on top on one side and on the bottom of the other side. This gives them a shape that fits perfectly into the opening in the plasmid. A fifth fragment of eukaryotic chromosome is red. Step 3: Fragments are ligated to the prepared cloning vector. An arrow points down accompanied by blue highlighted text reading, D N A ligase. This produces a recombinant vector, which is a circular plasmid with a red piece of eukaryotic chromosome filling the area that had been cut open. It is now a complete circle again. Step 4: D N A is introduced into the host cell. An arrow points down to a rod-shaped bacterial cell that has the plasmid containing a red piece of eukaryotic D N A on the right and a mass of thready host D N A on the left. Step 5: Propagation (cloning) of transformed cell produces many copies of recombinant D N A. Five rod-shaped bacterial cells are shown. All have thready host D N A. Three have one copy of the plasmid with eukaryotic D N A, one has two copies, and one has three copies.

Restriction endonucleases are found in a wide range of bacterial species. As Werner Arber discovered in the early 1960s, the biological function of restriction endonucleases is to recognize and cleave foreign DNA (the DNA of an infecting virus, for example); such DNA is said to be restricted. In the host cell’s DNA, the sequence that would be recognized by one of its own restriction endonucleases is protected from digestion by methylation of the DNA, catalyzed by a specific DNA methylase. The restriction endonuclease and the corresponding methylase are sometimes referred to as a restriction-modification system.

There are three types of restriction endonucleases, designated I, II, and III. Types I and III are generally large, multisubunit complexes containing both the endonuclease and methylase activities. Type II restriction endonucleases, first isolated by Hamilton Smith in 1970, are simpler, require no ATP, and catalyze the hydrolytic cleavage of particular phosphodiester bonds in the DNA within the recognition sequence itself. The extraordinary utility of this group of restriction endonucleases was demonstrated by Daniel Nathans, who first used them to develop novel methods for mapping and analyzing genes and genomes.

Thousands of type II restriction endonucleases have been discovered in different bacterial species, and more than 100 different DNA sequences are recognized by one or more of these enzymes. The recognition sequences are usually 4 to 6 bp long and are palindromic (see Fig. 8-18). Table 9-2 lists sequences recognized by a few type II restriction endonucleases.

TABLE 9-2 Recognition Sequences for Some Type II Restriction Endonucleases
BamHI	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses G G A T C under asterisk C open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the first G and the second G. The bottom strand has C C above asterisk T A G G. There is an arrow pointing up to the space between the first G and the second G.	HindIII	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses A A G C T T open parentheses 3 prime under asterisk close parentheses. There is an arrow pointing down to the space between the first A and the second A. The bottom strand has T T C G A A. There is an arrow pointing up to the space between the first A and the second A.
ClaI	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses A T C G A under asterisk T open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the first T and the first C. The bottom strand has T A above asterisk G C T A. There is an arrow pointing up to the space between the first C and the second T.	NotI	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses G C G G C C G C open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the first C and the second G. The bottom strand has C G C C G G C G. There is an arrow pointing up to the space between the third G and the fourth C.
EcoRI	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses G A A under asterisk T T C open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the first G and the first A. The bottom strand has C T T above asterisk A A G. There is an arrow pointing up to the space between the first G and the second A.	PstI	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses C T G C under asterisk A G open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the first A and the second G. The bottom strand has G A C above asterisk G T C. There is an arrow pointing up to the space between the first G and the first A.
EcoRV	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses G A T A T C open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the first T and the second A. The bottom strand has C T A T A G. There is an arrow pointing up to the space between the first A and the second T.	PvuII	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses C A G C T G open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the first G and the second C. The bottom strand has G T C G A C. There is an arrow pointing up to the space between the first C and the second G.
HaeIII	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses G G C under asterisk C open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the second G and the second C. The bottom strand has G G above asterisk C C. There is an arrow pointing up between the second G and the first C.	Tth111I	A double stranded piece of D N A is shown. The top strand has open parentheses 5 prime close parentheses G A C N N N G T asterisk just above and to the right of T C open parentheses 3 prime close parentheses. There is an arrow pointing down to the space between the first N and the second N. The bottom strand has C T G N N N C A G. There is an arrow pointing up between the second N and the third N.
Note: Arrows indicate the phosphodiester bonds cleaved by each restriction endonuclease. Asterisks indicate bases that are methylated by the corresponding methylase (where known). N denotes any base. Note that the name of each enzyme consists of a three-letter abbreviation of the bacterial species from which it is derived, sometimes followed by a strain designation and roman numerals to distinguish different restriction endonucleases isolated from the same bacterial species. Thus BamHI is the first (I) restriction endonuclease characterized from Bacillus amyloliquefaciens, strain H.

Some restriction endonucleases make staggered cuts on the two DNA strands, leaving two to four nucleotides of one strand unpaired at each resulting end. These unpaired strands are referred to as sticky ends (Fig. 9-2a) because they can base-pair with each other or with complementary sticky ends of other DNA fragments. Other restriction endonucleases cleave both strands of DNA straight across, at opposing phosphodiester bonds, leaving no unpaired bases on the ends, often called blunt ends (Fig. 9-2b).

A three-part figure, a, b, and c, shows the cleavage of D N A molecules by restriction endonucleases. — FIGURE 9-2 Use of restriction endonucleases in cloning. (a) Restriction endonucleases recognize and cleave only specific sequences, leaving either sticky ends (with protruding single strands) or blunt ends. Fragments can be ligated to other DNAs, such as the cleaved cloning vector (a plasmid) shown here. This reaction is facilitated by the annealing of complementary sticky ends. Ligation is less efficient for DNA fragments with blunt ends than for those with complementary sticky ends, and DNA fragments with different (noncomplementary) sticky ends generally are not ligated. (b) DNA that has been amplified by the polymerase chain reaction (see Fig. 8-33) can be cloned. The primers can include noncomplementary ends that have a site for cleavage by a restriction endonuclease. Although these parts of the primers do not anneal to the target DNA, the PCR process incorporates them into the DNA that is amplified. Cleavage of the amplified fragments at these sites creates sticky ends, used in ligation of the amplified DNA to a cloning vector. (c) A synthetic DNA fragment with recognition sequences for several restriction endonucleases can be inserted into a plasmid that has been cleaved by a restriction endonuclease. The insert is called a linker; an insert with multiple restriction sites is generally called a multiple cloning site (MCS).

Part a shows a horizontal strand of D N A to be cloned. The top strand has a yellow fragment with dash dash dash G G T, then an orange G, then a cleavage site, then red A A T T C labeled recognition sequences, then blue A G C dot dot dot T A G, then light purple C A G, then a cleavage site, then dark purple C T G, then yellow T A G C dash dash dash. The bottom strand has a yellow fragment with dash dash dash C C A, then orange C T T A A, then an arrow indicating a cleavage site, then red G, then blue T C G dot dot dot A T C, then light purple G T C, then a cleavage site, then dark purple G A C, then yellow A T C G dash dash dash. The same piece of D N A is shown below separated at the cleavage sites. The left-hand cleavage sites were cut by an uppercase E lowercase c lowercase o uppercase R uppercase I restriction endonuclease, and the right cleavage sites were cut by an uppercase P lowercase v lowercase u lowercase l lowercase l restriction endonuclease. There is a piece on the left with the top half containing yellow dash dash dash G G T orange G space space space space and the bottom half containing yellow dash dash dash C C A orange C T T A A. There is a middle piece with the top half containing red A A T T C, blue A G C dot dot dot T A G, light purple C A G and the bottom half containing space space space space red G, blue T C G dot dot dot A T C light purple G T C. The ends are uneven, with the bottom half longer on the left piece and the top half longer on the middle piece. These are labeled sticky ends. On the far right, there is a piece with a top half that has dark purple C T G yellow T A G C dash dash dash above a bottom half that has dark purple G A C yellow A T C G dash dash dash. These ends and the right ends of the middle piece line up without an overhang and are called blunt ends. Arrows point down and meet with an arrow from a circular molecule that has been cut away on top labeled plasmid cloning vector cleaved with uppercase E lowercase c lowercase o uppercase R uppercase I and uppercase P lowercase v lowercase u lowercase l lowercase l. The left side end has an orange overhang, and the right side end is blunt. This comes together with the pieces cut previously to form a complete circle. The red overhang from the middle piece of D N A fits against the orange left end of the plasmid and the light purple blunt end on the right side of the middle piece of D N A fits against the dark purple blunt end of the plasmid. Part b shows two strands representing double stranded D N A with a dark central region. An arrow points down accompanied by two steps. Step 1: Heat to separate stands. Step 2: Anneal primers containing noncomplementary regions with cleavage site for restriction endonuclease. The two strands are shown below but are farther apart. There is an orange box below the left side of the highlighted region of the top strand. To the left of the box, a piece of D N A extends. From the right side where it joins the box to the left, it is C T T A A G 5 prime. The bottom strand has a similar orange box above the right side of its highlighted region with the same strand of D N A but extending to the right. An arrow pointing down is labeled replication. Two sets of D N A strands are shown. The top strand is the original top strand. Beneath it, a strand begins with the orange box beneath the left side of the highlighted region and runs to the far right side. It has the same highlighted region. The original bottom strand has a similar strand above it running to the far left side. Three arrows point down labeled many P C R cycles. Beneath this, a double stranded molecule is shown with just the highlighted regions of the original molecule. The top strand begins on the left with 3 prime C T T A A G, then has a blue highlighted region, then has a small orange box, then has C T T A A G 5 prime. The bottom strand begins on the left with 5 prime G A A T T C, then has a small orange box, then has a blue highlighted region, then ends with G A A T T C 3 prime. An arrow pointing down is labeled blue highlighted uppercase E lowercase c lowercase o uppercase R uppercase I endonuclease. The top strand now begins on the left with G, then has a blue highlighted region, then has an orange box and then C T T A A. The bottom strand has A A T T and then C beneath G of the top strand, then an orange box, then a blue highlighted region, then G beneath C of the top strand. An arrow points down to text reading, clone by insertion at an uppercase E lowercase c lowercase o uppercase R uppercase I site in a cloning vector. Part c shows a horizontal double stranded piece of D N A. The left and right sides are labeled uppercase E lowercase c lowercase o uppercase R uppercase I sticky end. The top strand begins with 5 prime A A T T with space beneath it, then begins to pair with the bottom strand. It continues with C, then a bracketed sequence labeled uppercase P lowercase s lowercase t uppercase I begins that includes C T G C A G, then a bracketed sequence labeled uppercase H lowercase i lowercase n lowercase d Roman numeral 3 begins that includes A A G C T T, then it has C C, then a bracketed sequence labeled uppercase B lowercase a lowercase m uppercase H uppercase I that includes G G A T C C. The last C is also included in a sequence labeled uppercase S lowercase m lowercase a uppercase I that also contains C C G G G. The bottom strand begins with G beneath the first C of the top strand, then begins the part within the uppercase P lowercase s lowercase t lowercase I bracket with G A C G T C, then continues within the uppercase H lowercase I lowercase n lowercase d Roman numeral 3 bracket with T T C G A A, then has G G, then continues within the uppercase B lowercase a lowercase m uppercase H uppercase I bracket with C C T A G G. The last G is also included in a bracket for uppercase S lowercase m lowercase a uppercase I, which also includes G G C C C. The strand ends with T T A A, which is an overhanging end. An arrow points down accompanied by the highlighted term D N A ligase. A plasmid cloning vector cleaved with uppercase E lowercase c lowercase o uppercase R uppercase I is added. The cloning vector is mostly circular but open on top. The left side has a top strand that ends with G and a bottom strand that has C paired with that G, then an unpaired sequence of T T A A. The right side has G on the bottom strand paired with C on the top strand. The top strand continues on beyond C with unpaired T T A A. The final product is the plasmid with the original horizontal D N A curved to fit into it. The unpaired bases on the plasmid match the sticky ends from the original plasmid, and multiple additional cloning sites are present in the piece of D N A.

The gene or DNA segment to be cloned is most often generated by the polymerase chain reaction. Careful design of the primers used for PCR (see Fig. 8-33) can alter the amplified segment by the inclusion, at each end, of additional DNA not present in the chromosome that is being targeted. For example, including restriction endonuclease cleavage sites can facilitate the subsequent cloning of the amplified DNA (Fig. 9-2c).

After the target DNA fragment is prepared and digested with the appropriate restriction enzyme, DNA ligase can be used to join it to a vector digested by the same restriction endonuclease; a fragment generated by EcoRI, for example, generally will not link to a fragment generated by BamHI. As described in more detail in Chapter 25 (see Fig. 25-15), DNA ligase catalyzes the formation of new phosphodiester bonds in a reaction that uses ATP or a similar cofactor. The base pairing of complementary sticky ends greatly facilitates the ligation reaction (Fig. 9-2a). Blunt ends can also be ligated, albeit less efficiently. Researchers can create new DNA sequences for a wide range of purposes by inserting synthetic DNA fragments, called linkers, to bridge the ends that are being ligated. An inserted DNA fragment with multiple recognition sequences for restriction endonucleases (often useful later as points for inserting additional DNA by cleavage and ligation) is called the multiple cloning site (MCS) (Fig. 9-2d).

The effectiveness of sticky ends in selectively joining two DNA fragments was apparent in the earliest recombinant DNA experiments. Before restriction endonucleases were widely available, some investigators found they could generate sticky ends by the combined action of the bacteriophage λ exonuclease and terminal transferase (Table 9-1). The fragments to be joined were given complementary homopolymeric tails. Peter Lobban and Dale Kaiser used this method in 1971 in the first experiments to join naturally occurring DNA fragments. Similar methods were used soon after in Paul Berg’s laboratory to join DNA segments from simian virus 40 (SV40) to DNA derived from bacteriophage λ, thereby creating the first recombinant DNA molecule with DNA segments from different species.

Cloning Vectors Allow Amplification of Inserted DNA Segments

The factors that govern the delivery of recombinant DNA in clonable form to a host cell, and its subsequent amplification in the host, are well illustrated in three popular cloning vectors: plasmids and bacterial artificial chromosomes, used in experiments with E. coli, and a vector used to clone large DNA segments in yeast.

Plasmids

A plasmid is a circular DNA molecule that replicates separately from the host chromosome. Naturally occurring bacterial plasmids range in size from 5,000 to 400,000 bp. Many of the plasmids found in bacterial populations are little more than molecular parasites, similar to viruses but with a more limited capacity to transfer from one cell to another. To survive in the host cell, plasmids incorporate several specialized sequences that enable them to make use of the cell’s resources for their own replication and gene expression.

Naturally occurring plasmids usually have a symbiotic role in the cell. They may provide genes that confer resistance to antibiotics or that perform new functions for the cell. For example, the Ti plasmid of Agrobacterium tumefaciens allows the host bacterium to colonize the cells of a plant and make use of the plant’s resources. The same properties that enable plasmids to grow and survive in a bacterial or eukaryotic host are useful to molecular biologists who want to engineer a vector for cloning a specific DNA segment. Constructed in 1977, one of the first recombinant vectors — E. coli plasmid pBR322 — illustrates some key features that define a useful cloning vector (Fig. 9-3):

The plasmid pBR322 has an origin of replication, or ori, a sequence where replication is initiated by cellular enzymes (see Chapter 25). This sequence is required to propagate the plasmid. An associated regulatory system is present that limits replication to maintain pBR322 at a level of 10 to 20 copies per cell.
The plasmid contains genes that confer resistance to the antibiotics ampicillin ( ${Amp}^{R}$ $Amp Superscript upper R$ ) and tetracycline ( ${Tet}^{R}$ $Tet Superscript upper R$ ), allowing the selection of cells that contain the intact plasmid or a recombinant version of the plasmid (discussed below).
Several unique recognition sequences in pBR322 are targets for restriction endonucleases (PstI, EcoRI, BamHI, SalI, and PvuII), providing sites where the plasmid can be cut to insert foreign DNA.
The small size of the plasmid (4,361 bp) facilitates its entry into cells and the biochemical manipulation of the DNA. This small size was the result of trimming away many DNA segments from a larger, parent plasmid — sequences that the biochemist does not need.

A figure shows the constructed italics E. coli end italics plasmid lowercase p uppercase B uppercase R 322. — FIGURE 9-3 The constructed *E. coli* plasmid pBR322. Notice the location of some important restriction sites, for PstI, EcoRI, BamHI, SalI, and PvuII; genes for ampicillin and tetracycline resistance ( ${Amp}^{R}$ $Amp Superscript upper R$ and ${Tet}^{R}$ $Tet Superscript upper R$ ); and the replication origin (ori). Constructed in 1977, this was one of the early plasmids designed expressly for cloning in *E. coli*.

FIGURE 9-3 The constructed *E. coli* plasmid pBR322. Notice the location of some important restriction sites, for PstI, EcoRI, BamHI, SalI, and PvuII; genes for ampicillin and tetracycline resistance ( ${Amp}^{R}$ $Amp Superscript upper R$ and ${Tet}^{R}$ $Tet Superscript upper R$ ); and the replication origin (ori). Constructed in 1977, this was one of the early plasmids designed expressly for cloning in *E. coli*.

The figure shows a circular plasmid with lowercase p uppercase B uppercase R 322 (4,361 b p) in the center. Clockwise from the top, the circle has a small yellow piece from the 11:30 position to the 12 o’clock position labeled uppercase E lowercase c lowercase o uppercase R uppercase I, then a large blue piece from the 12 o’clock position to the 3:30 position labeled tetracycline resistance (uppercase T lowercase e lowercase t superscript uppercase R end superscript). Within this large piece, an arrow pointing to the 1 o’clock position is labeled uppercase B lowercase a lowercase m uppercase H uppercase I, and an arrow pointing to the 2:30 position is labeled uppercase S lowercase a lowercase l uppercase I. The next piece is yellow and extends from the 3:30 position to the 7 o’clock position and has an arrow labeled uppercase P lowercase v lowercase u lowercase l lowercase I pointing to the 6 o’clock position. A small brown piece from the 7 o’clock position to the 7:30 position is labeled origin of replication (lowercase o lowercase r lowercase i). Another yellow piece extends from the end of the origin of replication to the 9 o’clock position. Next, a green piece labeled ampicillin resistance (uppercase A lowercase m lowercase p superscript uppercase R end superscript) extends from the 9 o’clock position to the 11:30 position and has an arrow labeled uppercase P lowercase s lowercase t uppercase I pointing to the 10 o’clock position. All data are approximate.

The replication origins inserted in common plasmid vectors were originally derived from naturally occurring plasmids. As in pBR322, each of these origins is regulated to maintain a particular plasmid copy number. Depending on the origin used, the plasmid copy number can vary from one to hundreds or thousands per cell, providing many options for investigators. Two different plasmids cannot function in the same cell if they use the same origin of replication, because the regulation of one will interfere with the replication of the other. Such plasmids are said to be incompatible. When a researcher wants to introduce two or more different plasmids into a bacterial cell, each plasmid must have a different replication origin.

In the laboratory, small plasmids can be introduced into bacterial cells by a process called transformation. The cells (often E. coli, but other bacterial species are also used) and plasmid DNA are incubated together at $0 ° C$ $0 degree upper C$ in a calcium chloride solution, then are subjected to heat shock by rapidly shifting the temperature to between $37 ° C$ $37 degree upper C$ and $43 ° C$ $43 degree upper C$ . For reasons not well understood, some of the cells treated in this way take up the plasmid DNA. Some species of bacteria, such as Acinetobacter baylyi, are naturally competent for DNA uptake and do not require the calcium chloride–heat shock treatment. In an alternative method, called electroporation, cells incubated with the plasmid DNA are subjected to a high-voltage pulse, which transiently renders the bacterial membrane permeable to large molecules.

Regardless of the approach, relatively few cells take up the plasmid DNA, so a method is needed to identify those that do. The usual strategy is to utilize one of two types of genes in the plasmid, referred to as selectable and screenable markers. A selectable marker either permits the growth of a cell (positive selection) or kills the cell (negative selection) under a defined set of conditions. The plasmid pBR322 provides markers for both positive and negative selection (Fig. 9-4). A screenable marker is a gene encoding a protein that causes the cell to produce a colored or fluorescent molecule. Cells are not harmed when the gene is present, and the cells that carry the plasmid are easily identified by the colored or fluorescent colonies they produce.

A figure shows the use of lowercase p uppercase B uppercase R 322 to clone foreign D N A. — FIGURE 9-4 Use of pBR322 to clone foreign DNA in *E. coli* and identify cells containing the DNA.

At the top, six lowercase p uppercase B uppercase R 322 plasmids are shown. Each plasmid is a ring with a short green region labeled uppercase A lowercase m lowercase p superscript uppercase R end superscript and a separate short blue region labeled uppercase T lowercase e lowercase t superscript uppercase R end superscript. Step 1: lowercase p uppercase B uppercase R 322 is cleaved in the bolded uppercase A lowercase m lowercase p superscript uppercase R end superscript end bold gene by uppercase P lowercase s lowercase t uppercase I. An arrow pointing downward is highlighted and labeled uppercase P lowercase s lowercase t uppercase I restriction endonuclease. The six plasmids are all shown cut in the middle of the green region labeled uppercase A lowercase m lowercase p superscript uppercase R end superscript, producing sticky ends on either side of an opening. Six pieces of red foreign D N A are shown that have overhanging ends to fit into the openings in the plasmids. Step 2: D N A fragments to be cloned are ligated into cleaved lowercase p uppercase B uppercase R 322. Where ligation is successful, the uppercase A lowercase m lowercase p superscript uppercase R end superscript gene is disrupted. The uppercase T lowercase e lowercase t superscript uppercase R end superscript gene remains intact. Arrows from the plasmids and foreign D N A come together next to a highlighted label reading, D N A ligase. This produces six plasmids. Three of the plasmids have red pieces of varying sizes within the green region of the uppercase A lowercase m lowercase p superscript uppercase R end superscript gene. Two plasmids do not contain any red, and the green sticky ends have joined back together without incorporating any new D N A. Step 3: Italics E. coli end italics cells are transformed, then grown on agar plates containing tetracycline to select for those that have taken up the plasmid. An arrow pointing downward is accompanied by text reading, transformation of italics E. coli end italics cells. Six rod-shaped bacterial are shown, each of which contains a mass of thready D N A. Three do not have a plasmid. One has a plasmid with no red region. Two have incorporated plasmids that have red regions within the green uppercase A lowercase m lowercase p superscript uppercase R end superscript gene. An arrow pointing downward to an agar plate is labeled, selection of transformed cells. Many small circular colonies are visible on the plate. A label pointing to the agar reads, agar containing tetracycline. Text next to the plate reads, all colonies have plasmids. Step 4: Individual colonies are transferred to matching positions on additional plates. One plate contains tetracycline, the other tetracycline and ampicillin. An arrow labeled colonies transferred to testing splits to point to the two plates. The left-hand plate is labeled agar containing tetracycline (control). Many large yellowish colonies are visible. The right-hand plate is labeled agar containing ampicillin plus tetracycline. Many colonies are in the same places on both plates. The colonies that are on the left-hand plate but not on the right-hand plate are labeled colonies with recombinant plasmids. Step 5: Cells that grow on tetracycline but not on tetracycline plus ampicillin contain plasmids with disrupted ampicillin resistance, hence the foreign D N A. Cells with lowercase p uppercase B uppercase R 322 without foreign D N A retain ampicillin resistance and grow on both plates.

Transformation of typical bacterial cells with purified DNA (never a very efficient process) becomes less successful as plasmid size increases, and it is difficult to clone DNA segments longer than about 15,000 bp when plasmids are used as the vector.

To illustrate the use of a plasmid as a cloning vector, consider the bacterial gene encoding a recombinase called the RecA protein (see Chapter 25). In most bacteria, the gene encoding RecA is one of the thousands of genes on a chromosome millions of base pairs long. The recA gene is just over 1,000 bp long. A plasmid would be a good choice for cloning a gene of this size. As described later, the cloned gene can be altered in a variety of ways, and the gene variants can be expressed at high levels to enable purification of the encoded protein.

Bacterial Artificial Chromosomes

Researchers sometimes want to clone much longer DNA segments than can typically be incorporated into standard plasmid cloning vectors such as pBR322. To meet this need, plasmid vectors have been developed with special features that allow the cloning of very long segments (typically 100,000 to 300,000 bp) of DNA. Once such large segments of cloned DNA have been added, these vectors are large enough to be thought of as chromosomes and are known as bacterial artificial chromosomes, or BACs (Fig. 9-5).

FIGURE 9-5 Bacterial artificial chromosomes (BACs) as cloning vectors. The vector is a relatively simple plasmid, with a replication origin (ori) that directs replication. The *par* genes assist in the even distribution of plasmids to daughter cells at cell division. This increases the likelihood of each daughter cell carrying one copy of the plasmid, even when few copies are present. The low number of copies is useful in cloning large segments of DNA, because this limits the opportunities for unwanted recombination reactions that can unpredictably alter large cloned DNAs over time. The BAC includes selectable markers. A *lacZ* gene (required for the production of the enzyme β-galactosidase) is situated in the cloning region such that it is inactivated by cloned DNA inserts. Introduction of recombinant BACs into cells by electroporation is promoted by the use of cells with an altered (more porous) cell wall. Recombinant DNAs are screened for resistance to the antibiotic chloramphenicol ( ${Cam}^{R}$ $Cam Superscript upper R$ ). Plates also contain X-gal, a substrate for β-galactosidase that yields a blue product. Colonies with active β-galactosidase, and hence no DNA insert in the BAC vector, turn blue; colonies without β-galactosidase activity, and thus with the desired DNA inserts, are white.

A circular plasmid labeled uppercase B uppercase A uppercase C vector has a small red piece at the 12 o’clock position labeled cloning sites (within italicized lowercase lac uppercase Z end italics), a light purple piece from the 1 o’clock position to the 3 o’clock position labeled Cam superscript uppercase R end superscript, a brown piece from the 4 o’clock position to the 5 o’clock position labeled ori, and a green piece labeled F plasmid italicized par end italics genes from the 7 o’clock position to about halfway between the 11 o’clock and 12 o’clock positions. The spaces between these pieces are yellow and unlabeled. All data are approximate. An arrow pointing down is accompanied by the highlighted text, restriction endonuclease. The plasmid is shown again with the red region containing cloning sites open to produce sticky ends. To the right, a long strand of D N A has red sticky ends complementary to those in the plasmid. Text reads, large D N A fragment with appropriate sticky ends. Arrows pointing down from the plasmid and the large D N A fragment join together accompanied by highlighted text reading, D N A ligase. This produces a recombinant uppercase B uppercase A uppercase C, which is an irregular ring with a wavy blue half on the right corresponding to the D N A and a U-shaped piece on the left from the plasmid. Running counterclockwise from the bottom center, the piece from the plasmid has a piece of yellow, then red, then purple, then yellow, then brown, then yellow along the left side of the loop, then a large piece of green, then yellow, then red on the top strand above the red on the bottom strand. Text within this loop reads, italicized lac Z end italics gene (red) disrupted; no Greek letter beta galactosidase produced. An arrow labeled electroporation points down from this structure to a rod-shaped cell with a thready mass of D N A at the lower left and the recombinant uppercase B uppercase A uppercase C on the right side. An arrow labeled selection of Cam superscript uppercase R end superscript cells points down to an agar plate. Text pointing to the agar reads agar containing chloramphenicol and X gal. The plate contains three blue colonies and four white colonies. Text reads, colonies with recombinant uppercase B uppercase A uppercase Cs are white.

A BAC vector (without any cloned DNA inserted) is a relatively simple plasmid, generally not much larger than other plasmid vectors. To accommodate very long segments of cloned DNA, BAC vectors have stable origins of replication that maintain the plasmid at one or two copies per cell. The low copy number is useful in cloning large segments of DNA, because it limits the opportunities for unwanted recombination reactions that can unpredictably alter large cloned DNAs over time. BACs also include par genes, derived from a type of plasmid called an F plasmid. The par genes encode proteins that direct the reliable distribution of the recombinant chromosomes to daughter cells at cell division, thereby increasing the likelihood of each daughter cell carrying one copy, even when few copies are present.

The BAC vector includes both selectable and screenable markers. The BAC vector shown in Figure 9-5 contains a gene that confers resistance to the antibiotic chloramphenicol ( ${Cam}^{R}$ $Cam Superscript upper R$ ). Vector-containing cells can be selected by growing them on agar plates containing this antibiotic — a positive selection, as the cells with the vector survive. A lacZ gene, required for production of the enzyme β-galactosidase, is a screenable marker that can reveal which cells contain plasmids — now chromosomes — that incorporate the cloned DNA segments. The β-galactosidase catalyzes conversion of the colorless molecule 5-bromo-4-chloro-3-indolyl-β-d-galactopyranoside (more simply, X-gal) to a blue product. If the gene is intact and expressed, the colony containing it is blue. If gene expression is disrupted by the introduction of a cloned DNA segment, the colony is white.

Yeast Artificial Chromosomes

As with E. coli, yeast genetics is a well-developed discipline. Research on large genomes and the associated need for high-capacity cloning vectors led to the development of yeast artificial chromosomes, or YACs (Fig. 9-6). As with BACs, YAC vectors can be used to clone very long segments of DNA. In addition, the DNA cloned in a YAC can be altered to study the function of specialized sequences in chromosome metabolism, mechanisms of gene regulation and expression, and many other aspects of eukaryotic molecular biology.

A figure shows the construction of a yeast artificial chromosome. — FIGURE 9-6 Construction of a yeast artificial chromosome (YAC). A YAC vector includes an origin of replication (ori), a centromere (CEN), two telomeres (TEL), and selectable markers (X and Y). Digestion with BamHI and EcoRI generates two separate DNA arms, each with a telomeric end and one selectable marker. A large segment of DNA (e.g., up to $2 \times 10^{6}$ $2 times 10 Superscript 6$ bp from the human genome) is ligated to the two arms to create a yeast artificial chromosome. The YAC transforms yeast cells (prepared by removal of the cell wall to form spheroplasts), and the cells are selected for X and Y; the surviving cells propagate the DNA insert.

FIGURE 9-6 Construction of a yeast artificial chromosome (YAC). A YAC vector includes an origin of replication (ori), a centromere (CEN), two telomeres (TEL), and selectable markers (X and Y). Digestion with BamHI and EcoRI generates two separate DNA arms, each with a telomeric end and one selectable marker. A large segment of DNA (e.g., up to $2 \times 10^{6}$ $2 times 10 Superscript 6$ bp from the human genome) is ligated to the two arms to create a yeast artificial chromosome. The YAC transforms yeast cells (prepared by removal of the cell wall to form spheroplasts), and the cells are selected for X and Y; the surviving cells propagate the DNA insert.

A ring-shaped structure is labeled uppercase Y uppercase A uppercase C vector. A brown region from the 11:30 position to the 2 o’clock position has an arrow pointing to the 1 o’clock position labeled red highlighted uppercase E lowercase c lowercase o uppercase R uppercase I. A small green piece labeled selectable marker Y is at the 2 o’clock position, a brown region runs from the 2 o’clock position to the 5 o’clock position, a small red piece with a triangular left side labeled uppercase T uppercase E uppercase L is at the 5 o’clock position with an arrow pointing from red highlighted uppercase B lowercase a lowercase m uppercase H uppercase I to its pointed end. A brown region runs from this point to the 7 o’clock position, where an arrow points from red highlighted uppercase B lowercase a lowercase m uppercase H uppercase I to the point of a triangular end of a similar red piece pointing ito the right and labeled uppercase T uppercase E uppercase L. Another brown region runs from the 7 o’clock position to just past the 9 o’clock position, where there is a small blue piece labeled selectable marker uppercase X. This blue region runs to the 10 o’clock position, then there is a brown region, then there is a dark brown region labeled lowercase ori at the 11 o’clock position followed by a yellow region labeled uppercase C uppercase E uppercase N that ends at the 11:30 position. All data are approximate. An arrow points down next to text that reads, uppercase B lowercase a lowercase m uppercase H uppercase I digestion creates linear chromosomes with telomeric ends. The brown piece of the original circle between 5 o'clock and 7 o'clock is shown leaving. Below the arrow, a linear chromosome is shown with a red rectangle on the left side labeled uppercase T uppercase E uppercase L with a triangular portion pointing to the left. To its right, there is a brown region, then a small blue region labeled uppercase X, then a small brown region, then a small dark brown region labeled lowercase ori next to a small yellow region labeled uppercase C uppercase E uppercase N. Next, there is a long brown region with an arrow pointing to its midpoint labeled red highlighted uppercase E lowercase c lowercase o uppercase R uppercase I. The next piece is a small green segment labeled uppercase Y, followed by a small brown region, and then the other red rectangle labeled uppercase T uppercase E uppercase L with a triangular end pointing to the right. An arrow points down accompanied by text reading, uppercase E lowercase c lowercase o uppercase R uppercase I digestion creates two arms, each with a selectable marker (X or Y). The same linear chromosome is shown broken in two at the place that had been indicated by the arrow from uppercase E lowercase c lowercase o uppercase R uppercase I. An arrow points down and is joined by an arrow from many small wavy bright brown lines. Text reads, fragments of genomic D N A generated by incomplete digestion with uppercase E lowercase c lowercase o uppercase R uppercase I. Text next to the arrow reads, ligate. A narrow linear chromosome is shown below with a bright brown region filling the space that had been open previously, joining the left and right halves from the previous step. This is labeled uppercase Y uppercase A uppercase C. An arrow labeled transform points down to a cell labeled yeast sphereoplast, which has a circular nucleus containing many small blue wavy lines. The sphereoplast was created from a yeast cell shown to the left that is similar but has a dark outer boundary. An arrow pointing from the yeast cell to the sphereoplast reads, digest cell wall enzymatically. An arrow pointing from the sphereoplast to a cell on the right reads, select for uppercase X and uppercase Y. A circular cell is shown with a thick wall and a circular nucleus that contains many small blue wavy lines and one small red wavy line. It is labeled, yeast with uppercase Y uppercase A uppercase C clone.

The genome of Saccharomyces cerevisiae contains only $14 \times 10^{6} bp$ $14 times 10 Superscript 6 Baseline bp$ (less than four times the size of the E. coli chromosome), and its entire sequence is known. Yeast is also very easy to maintain and grow on a large scale in the laboratory. Plasmid vectors have been constructed for insertions into yeast cells, employing the same principles that govern the use of E. coli vectors. Convenient methods for moving DNA into and out of yeast cells permit the study of many aspects of eukaryotic cell biochemistry. Some recombinant plasmids incorporate multiple replication origins and other elements that allow them to be used in more than one species (e.g., in yeast and in E. coli). Plasmids that can be propagated in cells of two or more species are called shuttle vectors.

YAC vectors contain all the elements needed to maintain a eukaryotic chromosome in the yeast nucleus: a yeast origin of replication, two selectable markers, and specialized sequences (derived from the telomeres and centromere) needed for stability and proper segregation of the chromosomes at cell division (see Chapter 24). In preparation for its use in cloning, the vector is propagated as a circular bacterial plasmid and then isolated and purified. Cleavage with a restriction endonuclease (BamHI in Fig. 9-6) removes a length of DNA between two telomere sequences (TEL), leaving the telomeres at the ends of the linearized DNA. Cleavage at another internal site (by EcoRI in Fig. 9-6) divides the vector into two DNA segments, referred to as vector arms, each with a different selectable marker.

Genomic DNA to be cloned is prepared by partial digestion with restriction endonucleases to obtain a suitable fragment size. Genomic fragments are then separated by pulsed field gel electrophoresis, a variation of gel electrophoresis (see Fig. 3-18) that segregates very large DNA segments. DNA fragments of appropriate size (up to about $2 \times 10^{6} bp$ $2 times 10 Superscript 6 Baseline bp$ ) are mixed with the prepared vector arms and ligated. The ligation mixture is then used to transform yeast cells (pretreated to partially degrade their cell walls) with these very large DNA molecules — which now have the structure and size to be considered yeast chromosomes. Culture on a medium that requires the presence of both selectable marker genes ensures the growth of only those yeast cells that contain an artificial chromosome with a large insert sandwiched between the two vector arms (Fig. 9-6). The stability of YAC clones increases with the length of the cloned DNA segment (up to a point). Those with inserts of more than 150,000 bp are nearly as stable as normal cellular chromosomes, whereas those with inserts of fewer than 100,000 bp are gradually lost during mitosis (so, generally, there are no yeast cell clones carrying only the two vector ends ligated together or vectors with only short inserts). YACs that lack a telomere at either end are rapidly degraded.

Cloned Genes Can Be Expressed to Amplify Protein Production

Frequently, the product of a cloned gene, rather than the gene itself, is of primary interest — particularly when the protein has commercial, therapeutic, or research value. Proteins are encoded by genes in DNA; alter the DNA in a gene, and one can alter the protein product of that gene. Biochemists use purified proteins for many purposes, including to elucidate protein function, study reaction mechanisms, generate antibodies to the proteins, reconstitute complex cellular activities in the test tube with purified components, and examine protein binding partners. With an increased understanding of the fundamentals of DNA, RNA, and protein metabolism and their regulation in a host organism such as E. coli or yeast, investigators can manipulate cells to express cloned genes in order to study their protein products. The general goal is to alter the sequences around a cloned gene to trick the host organism into producing the protein product of the gene, often at very high levels. This overexpression of a protein can make its subsequent purification much easier.

We’ll use the expression of a eukaryotic protein in a bacterium as an example. Eukaryotic genes have surrounding sequences needed for their transcription and regulation in the cells they are derived from, but these sequences do not function in bacteria. Thus, eukaryotic genes lack the DNA sequence elements required for their controlled expression in bacterial cells: promoters (sequences that instruct RNA polymerase where to bind to initiate mRNA synthesis), ribosome-binding sites (sequences that allow translation of the mRNA to protein), and additional regulatory sequences. Appropriate bacterial regulatory sequences for transcription and translation must be inserted in the vector DNA at the correct positions relative to the eukaryotic gene.

Cloning vectors with the transcription and translation signals needed for the regulated expression of a cloned gene are called expression vectors. The rate of expression of the cloned gene is controlled by replacing the gene’s normal promoter and regulatory sequences with more efficient and convenient versions supplied by the vector. Generally, a well-characterized promoter and its regulatory elements are positioned near several unique restriction sites for cloning, so that genes inserted at the restriction sites will be expressed from the regulated promoter elements (Fig. 9-7). Some of these vectors incorporate other features, such as a bacterial ribosome-binding site to enhance translation of the mRNA derived from the gene (Chapter 27) or a transcription termination sequence (Chapter 26). In some cases, cloned genes are so efficiently expressed that their protein product represents 10% or more of the cellular protein. At these concentrations, some foreign proteins can kill the host cell (usually E. coli), so expression of the cloned gene must be limited to the few hours before the planned harvesting of the cells.

A figure shows D N A sequences in a typical italicized E. coli end italics expression vector. — FIGURE 9-7 DNA sequences in a typical *E. coli* expression vector. The gene to be expressed is inserted into one of the restriction sites in the MCS, near the promoter (P), with the end of the gene encoding the amino terminus of the protein positioned closest to the promoter. The promoter allows efficient transcription of the inserted gene, and the transcription-termination sequence sometimes improves the amount and stability of the mRNA produced. The operator (O) permits regulation by a repressor that binds to it. The ribosome-binding site provides sequence signals for the efficient translation of the mRNA derived from the gene. The selectable marker allows the selection of cells containing the recombinant DNA.

A circular molecule is shown. There is a small purple piece labeled P just before the 12 o’clock position with an arrow pointing to the right above it. There is a small yellow piece, then an orange piece labeled O at the 12 o’clock position. The P and O pieces are labeled bacterial promoter (uppercase P) and operator (uppercase O) sequences. There is a yellow piece, and then a small white piece at the 1 o’clock position labeled ribosome binding site. There is a small yellow piece from the 1 o’clock position to the 2 o’clock position with four arrows pointing at it labeled multiple cloning site. There is a small red piece at the 2 o’clock position labeled transcription-termination sequence. A yellow piece runs from it to the 4 o’clock position, where a long blue piece begins that extends to the 6 o’clock position. The blue piece is labeled selectable genetic marker (e.g., antibiotic resistance). A yellow piece runs from the 6 o’clock position to halfway between the 7 and 8 o’clock positions, where there is a small brown piece labeled lowercase ori. A yellow piece runs from just past the 8 o’clock position to the 9 o’clock position. A green piece labeled gene encoding repressor that binds uppercase O and regulates uppercase P runs from the 9 o’clock position to about halfway between the 10 o’clock and the 11 o’clock positions. A yellow piece runs from the 11 o’clock position to the small purple piece labeled uppercase P.

Many Different Systems Are Used to Express Recombinant Proteins

Every living organism has the capacity to express genes in its genomic DNA; thus, in principle, any organism can serve as a host to express proteins from a different (heterologous) species. Almost every sort of organism has, indeed, been used for this purpose, and each host type has a particular set of advantages and disadvantages.

Bacteria

Bacteria, especially E. coli, remain the most common hosts for protein expression. The regulatory sequences that govern gene expression in E. coli and many other bacteria are well understood and can be harnessed to express cloned proteins at high levels. Bacteria are easy to store and grow in the laboratory, on inexpensive growth media. Efficient methods also exist to get DNA into bacteria and extract DNA from them. Bacteria can be grown in huge amounts in commercial fermenters, providing a rich source of the cloned protein.

Problems do exist, however. When expressed in bacteria, some heterologous proteins do not fold correctly, and many do not undergo the posttranslational modifications or proteolytic cleavage that may be necessary for their activity. Certain features of a gene sequence also can make a particular gene difficult to express in bacteria. For example, intrinsically disordered regions are more common in eukaryotic proteins. When expressed in bacteria, many eukaryotic proteins aggregate into insoluble cellular precipitates called inclusion bodies. For these and many other reasons, some eukaryotic proteins are inactive when purified from bacteria or cannot be expressed at all. To help address some of these problems, researchers are regularly developing new bacterial host strains that include enhancements such as the engineered presence of eukaryotic protein chaperones or enzymes that modify eukaryotic proteins.

There are many specialized systems for expressing proteins in bacteria. The well-characterized promoter and regulatory sequences associated with the lactose operon (see Chapter 28) are often fused to the gene of interest to direct transcription. The cloned gene will be transcribed when lactose is added to the growth medium. However, regulation in the lactose system is “leaky”: it is not turned off completely when lactose is absent — a potential problem if the product of the cloned gene is toxic to the host cells. Transcription from the Lac promoter is also not efficient enough for some applications.

An alternative system uses the promoter and RNA polymerase of a bacterial virus called bacteriophage T7. If the cloned gene is fused to a T7 promoter, it is transcribed, not by the E. coli RNA polymerase, but by the T7 RNA polymerase. The gene encoding this polymerase is separately cloned into the same cell in a construct that affords tight regulation (allowing controlled production of the T7 RNA polymerase). The polymerase is also very efficient and directs high levels of expression of most genes fused to the T7 promoter. This system has been used to express the RecA protein in bacterial cells (Fig. 9-8).

An electrophoresis gel shows lanes for uninduced and induced bacterial cells. — FIGURE 9-8 Regulated expression of RecA protein in a bacterial cell. The gene encoding the RecA protein, fused to a bacteriophage T7 promoter, is cloned into an expression vector. Under normal growth conditions (uninduced), no RecA protein appears. When the T7 RNA polymerase is induced in the cell, the *recA* gene is expressed, and large amounts of RecA protein are produced. The positions of standard molecular weight markers that were run on the same gel are indicated.

Yeast

Saccharomyces cerevisiae is probably the best understood eukaryotic organism. The principles underlying the expression of a protein in yeast are the same as those for bacteria. Cloned genes must be linked to promoters that can direct high-level expression in yeast cells. For example, the yeast GAL1 and GAL10 genes (encoding enzymes involved in galactose metabolism) are under cellular regulation such that they are expressed when yeast cells are grown in media with galactose but shut down when the cells are grown in glucose. Thus, if a heterologous gene is expressed using these same regulatory sequences, the expression of that gene can be controlled simply by choosing an appropriate medium for cell growth.

Some of the same problems that accompany protein expression in bacteria also occur with yeast. Heterologous proteins may not fold properly, yeast may lack the enzymes needed to modify the proteins to their active forms, or certain features of the gene sequence may hinder expression of a protein. However, because S. cerevisiae is a eukaryote, the expression of eukaryotic genes (especially yeast genes) is sometimes more efficient in this host than in bacteria. As yeast possess many of the same protein chaperones and modification systems of higher eukaryotes, protein products may also be folded and modified more accurately than are proteins expressed in bacteria.

Insects and Insect Viruses

Baculoviruses are insect viruses with double-stranded DNA genomes. When baculoviruses infect their insect larval hosts, they act as parasites, killing the larvae and turning them into factories for virus production. Late in the infection process, the viruses produce large amounts of two proteins (p10 and polyhedrin), neither of which is needed for production of viruses in cultured insect cells. The genes for both of these proteins can be replaced with the gene for a heterologous protein. When the resulting recombinant virus is used to infect insect cells or larvae, the heterologous protein is often produced at very high levels — up to 25% of the total protein present at the end of the infection cycle.

Autographa californica multicapsid nucleopolyhedrovirus (AcMNPV; A. californica is a moth species that it infects) is the baculovirus most often used for protein expression. It has a large genome (134,000 bp), too large for direct cloning. Virus purification is also cumbersome. These problems have been solved by the creation of bacmids, large circular DNAs that include the entire baculovirus genome along with sequences that allow replication of the bacmid in E. coli (Fig. 9-9). The gene of interest is cloned into a smaller plasmid and combined with the larger plasmid by site-specific recombination in vivo (see Fig. 25-37). The recombinant bacmid is then isolated and transfected into insect cells (the term transfection is used when the DNA used for transformation includes viral sequences and leads to viral replication), followed by recovery of the protein once the infection cycle is finished. A wide range of bacmid systems are available commercially. Baculovirus systems are not successful with all proteins. However, with these systems, insect cells sometimes successfully replicate the protein-modification patterns of higher eukaryotes and produce active, correctly modified eukaryotic proteins.

A two-part figure, a and b, shows how baculoviruses are used for cloning. — FIGURE 9-9 Cloning with baculoviruses. (a) Shown here is the construction of a typical vector used for protein expression in baculoviruses. The gene of interest is cloned into a small plasmid (top left) between two sites (*att*) recognized by a site-specific recombinase, then is introduced into the baculovirus vector by site-specific recombination. This generates a circular DNA product that is used to infect the cells of an insect larva. The gene of interest is expressed during the infection cycle, downstream of a promoter that normally expresses a baculovirus coat protein at very high levels. (b) The photographs show larvae of the cabbage looper moth. The larva on the left is uninfected; the larva on the right was infected with a recombinant baculovirus vector expressing a protein that produces a red color.

Part a has a circular molecule at the top labeled small plasmid. It has a small brown piece at the 12 o’clock position labeled cloning sites with symmetrical small blue pieces to the left and right labeled italicized lowercase a lowercase t lowercase t end italics. An arrow points down and shows that a small red piece labeled gene of interest is added. This gene is inserted in the middle of the brown piece at the 12 o’clock position. This plasmid is added to a large oval molecule labeled bacmid. On the bottom half of the oval is a long green segment labeled D N A sequences needed for plasmid maintenance in bacteria. On each side of this sequence is a small blue segment labeled italicized lowercase a lowercase t lowercase t. The combination of the bacmid and the plasmid with the gene of interest produces a bacmid that has the green segment bent upward so that the blue italic lowercase a lowercase t lowercase t end italics sequences on either side can interact with the blue italic lowercase a lowercase t lowercase t end italics sequences on the small plasmid. There is an X shape between the blue sequences on the bacmid and the blue sequences on the small plasmid, with the red region and small pieces of brown on either side intact between the two bonded regions. Text reads recombinase reaction. An arrow points down to show the bacmid. The small plasmid leaves in the process and now has the green region from the bacmid in between the small blue italicized lowercase a lowercase t lowercase t regions on either side. The bacmid now has a red gene of interest with a small brown piece and a blue italicized lowercase a lowercase t lowercase t end italics piece to its either side. Three arrows point down to show three steps: transfect insect cells; harvest recombinant baculovirus; and transfect insect cells for protein production. Part b shows two insect larvae. The photo on the left is labeled uninfected larva and shows the top half of a segmented green larva with its legs and head at the top. The photo on the right is labeled larva infected with a baculovirus and is similar but is red instead of green and has no fully formed head or legs.

Mammalian Cells in Culture

The most convenient way to introduce cloned genes into a mammalian cell is with viruses. This method takes advantage of the natural capacity of a virus to insert its DNA or RNA into a cell, and sometimes into the cellular chromosome. A variety of engineered mammalian viruses are available as vectors, including human adenoviruses and retroviruses. The gene of interest is cloned so that its expression is controlled by a virus promoter. The virus uses its natural infection mechanisms to introduce the recombinant genome into cells, where the cloned protein is expressed. One advantage of these systems is that proteins can be expressed either transiently (if the viral DNA is maintained separately from the host cell genome and eventually degraded) or permanently (if the viral DNA is integrated into the host cell genome). With the correct choice of host cell, the proper posttranslational modification of the protein to its active form can be ensured. However, the growth of mammalian cells in tissue culture is very expensive, and this technology is generally used to test the function of a protein in vivo rather than to produce a protein in large amounts.

Alteration of Cloned Genes Produces Altered Proteins

Cloning techniques can be used not only to overproduce proteins but also to produce proteins that are altered, subtly or dramatically, from their native forms. Specific amino acids may be replaced individually by site-directed mutagenesis. This approach has greatly enhanced research on proteins by allowing investigators to make specific changes in the primary structure and examine the effects of these changes on the protein’s folding, three-dimensional structure, and activity. The amino acid sequence of the protein is changed by altering the DNA sequence of the cloned gene. If appropriate restriction sites flank the sequence to be altered, researchers can simply remove a DNA segment and replace it with a synthetic one, identical to the original except for the desired change (Fig. 9-10a).

A three-part figure, a, b, and c, shows two approaches to site-directed mutagenesis. — FIGURE 9-10 Two approaches to site-directed mutagenesis. (a) A synthetic DNA segment replaces a fragment removed by a restriction endonuclease. (b) A pair of synthetic and complementary oligonucleotides with a specific sequence change at one position are hybridized to a circular plasmid with a cloned copy of the gene to be altered. The mutated oligonucleotides act as primers for the synthesis of full-length double-stranded (ds) DNA copies of the plasmid that contain the specified sequence change. The blue parental strand was methylated while replicating in its host cell, prior to plasmid isolation. These plasmid copies are then used to transform cells. (c) Results from an automated sequencer (see Fig. 8-35), showing sequences from the wild-type *recA* gene (top) and an altered *recA* gene (bottom), with the triplet (codon) at position 72 changed from AAA to CGC, specifying an Arg (R) residue instead of a Lys (K) residue. [(c) Information from Elizabeth A. Wood, University of Wisconsin–Madison, Department of Biochemistry.]

Part a shows site-directed mutagenesis. A ring structure is shown with a green piece in its top half. It is labeled recombinant vector. An arrow pointing down is labeled cleave with restriction endonuclease. The same circle is shown with arrows pointing to sites toward the left and right ends of the green region, with stepped lines showing how it will break to form sticky ends. An arrow pointing down is labeled, insert synthetic D N A segment containing mutation. A small green piece is added that has a vertical red bar in the middle labeled mutation. The new piece adds between the sticky ends to produce a circle with a vertical red bar in the green region. Part b shows oligonucleotide-directed mutagenesis. A plasmid is shown as two concentric rings. On the left side of each ring, there is a green bar labeled gene. A point in the middle of the outer green bar is labeled target site for mutation. An arrow pointing down is labeled denature plasmid and anneal oligonucleotide primers with mutation. Two small curved pieces, each an X across its middle, are labeled primers and are shown being added. This produces two rings, one larger than the other. The larger ring has a primer just to the inside of the gene, and the smaller ring has a primer just to the outside of the gene. An arrow pointing down is labeled use D N A polymerase to extend and incorporate the mutagenic primers. The left-hand ring is shown with a new inner ring that begins at the bottom of the primer, runs counterclockwise along the inside of the outer ring, and ends as an arrowhead pointing to the top of the ring. The right-hand ring has a similar arrow that begins at the top of the new primer and runs clockwise around the outside of the ring to end as an arrowhead beneath the new primer. An arrow pointing down is labeled digest nonmutated parental D N A template with methylation-specific nuclease, and anneal newly synthesized strands. Two concentric rings are shown, one on the outside and one on the inside. Each has a new primer at the 9 o’clock position. The outer circle has a small space beneath the new plasmid and the inner circle has a small space above the new plasmid. Text pointing to the outer plasmid reads mutated plasmid with nicked strands. An arrow points down to text reading, transform lowercase d lowercase s D N A into cells. Cell repairs nicks in mutated plasmid. Part c shows two rectangular plots showing sequences. The top sequence is labeled wild-type italics lowercase r lowercase 3 lowercase c uppercase A end italics. It has a series of peaks of irregular height, most reaching at least half of the height of the vertical axis and many extending above that. The peaks are color-coded to match different nucleotides. The sequence shown across the top is T C T T C C G G T, shaded box with A A A, then unshaded region with A C C A C G C. The bottom rectangle is labeled mutant italicized lowercase r lowercase e lowercase c uppercase A end italics uppercase K 72 uppercase R. It is very similar to the top rectangle except that the bases in the shaded region are C G C and the peaks beneath are color-coded to match.

When suitably located restriction sites are not present, oligonucleotide-directed mutagenesis can create a specific DNA sequence change (Fig. 9-10b). The cloned gene is denatured, separating the strands. Two short, complementary synthetic DNA strands, each with the desired base change, are annealed to opposite strands of the cloned gene within a suitable circular DNA vector. The mismatch of a single base pair in 30 to 40 bp does not prevent annealing. The two annealed oligonucleotides serve to prime DNA synthesis in both directions around the plasmid vector, creating two complementary strands that contain the mutation. After several cycles of selective amplification by the polymerase chain reaction (PCR; see Fig. 8-33), the mutation-containing DNA predominates in the population and can be used to transform bacteria. Most of the transformed bacteria will have plasmids carrying the mutation.

For an example, we go back to the bacterial recA gene. The product of this gene, the RecA protein, has several activities (see Section 25.3) including the hydrolysis of ATP. The Lys residue at position 72 in RecA (a 352 residue polypeptide) is involved in ATP hydrolysis. Changing ${Lys}^{72}$ $Lys Superscript 72$ to an Arg creates a variant of RecA protein that will bind, but not hydrolyze, ATP (Fig. 9-10c). The engineering and purification of this variant RecA protein has facilitated research into the roles of ATP hydrolysis in the functioning of this protein.

Changes can be introduced into a gene that involve far more than one base pair. Large parts of a gene can be deleted by cutting out a segment with restriction endonucleases and ligating the remaining portions to form a smaller gene. For example, if a protein has two domains, the gene segment encoding one of the domains can be removed so that the gene now encodes a protein with only one of the original two domains. Parts of two different genes can be ligated to create new combinations; the product of such a fused gene is called a fusion protein. Researchers have ingenious methods to bring about virtually any genetic alteration in vitro. After reintroducing the altered DNA into the cell, they can investigate the consequences of the alteration.

Terminal Tags Provide Handles for Affinity Purification

Affinity chromatography is one of the most efficient methods for purifying proteins (see Fig. 3-17c). Unfortunately, many proteins do not bind a ligand that can be conveniently immobilized on a column matrix. However, the gene for almost any protein can be altered to express a fusion protein that can be purified by affinity chromatography. The gene encoding the target protein is fused to a gene encoding a peptide or protein that binds a simple, stable ligand with high affinity and specificity. The peptide or protein used for this purpose is referred to as a tag. Tag sequences can be added to genes such that the resulting proteins have tags at their amino terminus or carboxyl terminus. Table 9-3 lists some of the peptides or proteins commonly used as tags.

TABLE 9-3 Commonly Used Protein Tags
Tag protein/peptide	Molecular mass (kDa)	Immobilized ligand
Protein A	59	Fc portion of IgG
${(His)}_{6}$ $left-parenthesis His right-parenthesis Subscript 6$	0.8	${Ni}^{2 +}$ $Ni Superscript 2 plus$
Glutathione-S-transferase (GST)	26	Glutathione
Maltose-binding protein	41	Maltose
β-Galactosidase	116	p-Aminophenyl-β-d-thiogalactoside (TPEG)
Chitin-binding domain	5.7	Chitin

The general procedure can be illustrated by focusing on a system that uses the glutathione-S-transferase (GST) tag (Fig. 9-11). GST is a small enzyme ( $M_{r}$ $upper M Subscript r$ 26,000) that binds tightly and specifically to glutathione. When the GST gene sequence is fused to a target gene, the fusion protein acquires the capacity to bind glutathione. The fusion protein is expressed in a host organism such as a bacterium, and a crude extract is prepared. A column is filled with a porous matrix consisting of the ligand (glutathione) immobilized on microscopic beads of a stable polymer such as cross-linked agarose. As the crude extract percolates through this matrix, the fusion protein becomes immobilized by binding the glutathione. The other proteins in the extract are washed through the column and discarded. The interaction between GST and glutathione is tight but noncovalent, allowing the fusion protein to be gently eluted from the column with a solution containing either a higher concentration of salts or free glutathione to compete with the immobilized ligand for GST binding. The fusion protein is often obtained with good yield and high purity. In some commercially available systems, the tag can be entirely or largely removed from the purified fusion protein by a protease that cleaves a sequence near the junction between the target protein and its tag.

A two-part figure, a and b, shows how tagged proteins are used in protein purification. — FIGURE 9-11 Use of tagged proteins in protein purification. (a) Glutathione-S-transferase (GST) is a small enzyme that binds glutathione. (b) The GST tag is fused to the carboxyl terminus of the protein by genetic engineering. The tagged protein is expressed in the cell and is present in the crude extract when the cells are lysed. The extract is subjected to affinity chromatography through a matrix with immobilized glutathione.

Part a shows a large purple sphere labeled glutathione S transferase (G S T) with a green oval overlapping its right side. Text on the oval reads, Greek letter gamma dash G l u dash C y s dash G l y, and the oval is labeled glutathione (G S H). Part b shows a horizontal strand with a large red central segment labeled gene for target protein. This is combined with a similar strand with a large purple central segment labeled gene for G S T. An arrow points down to a long strand with a red segment on the left and a purple segment on the right. An arrow at the left of the red segment points up and then horizontally to the right and reads transcription. Text beneath the red and purple segments reads, gene for fusion protein. An arrow points down to a cell with a nucleus that contains a coiled thread, with a red and purple segment. An arrow from the red and purple thread points out of the nucleus to four shapes that each have a red sphere on the left joined with a purple sphere on the right that has a semicircular cutout on its right side. Text reads, express fusion protein in cell. An arrow pointing down to an Ehrlenmeyer flask reads prepare cell extract containing fusion protein as part of the cell protein mixture. The Ehrlenmeyer flask is wide at the base with a narrow top and is half full of liquid that contains red spheres joined with purple spheres, yellow spheres, and small gray spheres. Text reads, protein mixture is added to column. A tube leads vertically out of the flask and bends horizontally to run into a pump, from which it exits vertically and then bends down into the top of a column. The column has a small disk above a thick disk above a narrow cylindrical region above a wide disc above an even wider disc. Beneath this is a clear, vertical cylinder in which the various spheres from the flask can be seen. There are six red spheres bound to purple spheres in the top half. A closeup of these shows a large gray sphere with green ovals all around it that attach to purple spheres on the surface bound to red spheres further away. Text pointing to a purple sphere reads, G S T tag. Text pointing to a green oval reads, glutathione anchored to medium binds G S T tab. A small gray sphere is shown moving down to the left, and a yellow sphere is shown moving downward to the right. In the cylinder, yellow and small gray spheres are visible near the bottom, where there is a thick disc with a smaller disc connected below it with a tube extending out from which a drop of liquid is falling. There are nine numbered vertical test tubes lined up horizontally beneath the cylinder. Test tubes 1 and 2 are three-quarters full of clear liquid. Test tube 3 is three-quarters full of liquid containing three yellow and three small gray spheres. Test tube 4 is half full of liquid containing a yellow sphere and two small gray spheres. The remaining tubes to the right are empty. Text below reads, other proteins flow through column. An arrow points to a similar apparatus with a flask, column, and tet tubes on the right. The flask contains only green ovals. Text reads solution of free glutathione is added to column. The top of the large cylinder is clear, and a close up shows a gray sphere with many green ovals attached to it. At the bottom of the cylinder are four structures, each with a red sphere attached to a purple sphere that is attached to a green oval. There are nine numbered vertical test tubes lined up horizontally beneath the column. Test tubes 1 and 2 are three-quarters full of clear liquid. Test tubes 3 and 4 are three-quarters full of liquid, and each contain three yellow and three small gray spheres. Test tubes 5 and 6 are three-quarters full of clear liquid. Test tube 7 is almost three-quarters full and contains a red sphere attached to a purple sphere attached to a green oval. Test tubes 8 and 9 are empty. Text reads, fusion protein is eluted by glutathione solution.

A shorter tag with widespread application consists of a simple sequence of six or more His residues. These histidine tags, or His tags, bind tightly and specifically to nickel ions. A chromatography matrix with immobilized ${Ni}^{2 +}$ $Ni Superscript 2 plus$ can be used to quickly separate a His-tagged protein from other proteins in an extract. Some of the larger tags, such as maltose-binding protein, provide added stability and solubility, allowing the purification of cloned proteins that are otherwise inactive due to improper folding or insolubility.

Affinity chromatography using terminal tags is powerful and convenient. The tags have been successfully used in thousands of published studies; in many cases, the protein would be impossible to purify and study without the tag. However, even very small tags can affect the properties of the proteins they are attached to, thereby influencing the study results. For example, the tag may adversely affect protein folding. Even if the tag is removed by a protease, one or a few extra amino acid residues can remain behind on the target protein, which may or may not affect the protein’s activity. The types of experiments to be carried out, and the results obtained from them, should always be evaluated with the aid of well-designed controls to assess any effect of a tag on protein function.

The Polymerase Chain Reaction Offers Many Options for Cloning Experiments

Many adaptations of PCR have increased its utility in cloning. For example, sequences in RNA can be amplified if the first PCR cycle uses reverse transcriptase, an enzyme that works like DNA polymerase (see Fig. 8-33) but uses RNA as a template (Fig. 9-12a). After the DNA strand is made from the RNA template, the remaining cycles can be carried out with DNA polymerases, using standard PCR protocols. This reverse transcriptase PCR (RT-PCR) can be used, for example, to detect sequences derived from living cells (which are transcribing their DNA into RNA) as opposed to dead tissues.

A two-part figure, a and b, shows applications of P C R. — FIGURE 9-12 Some applications of PCR. (a) In reverse transcriptase PCR, or RT-PCR, RNA molecules are amplified by using reverse transcriptase in the first two cycles. (b) In quantitative PCR, or qPCR, careful monitoring of the progress of a PCR amplification allows one to determine when a DNA segment has been amplified to a specified threshold level. The amount of PCR product present is determined by measuring the level of a fluorescent probe attached to a reporter oligonucleotide complementary to the DNA segment that is being amplified. Probe fluorescence is not detectable initially, due to a fluorescence quencher attached to the same oligonucleotide. When the reporter oligonucleotide pairs with its complement in a copy of the amplified DNA segment, the fluorophore is separated from the quenching molecule and fluorescence results. As the PCR reaction proceeds, the amount of the targeted DNA segment increases exponentially, and the fluorescent signal also increases exponentially as the oligonucleotide probes anneal to the amplified segments. After many PCR cycles, the signal reaches a plateau as one or more reaction components become exhausted. When a segment is present in greater amounts in one sample than another, its amplification reaches a defined threshold level earlier. The “No template” line follows the slow increase in background signal observed in a control that does not include added sample DNA. CT is the cycle number at which the threshold is first surpassed.

A green strand is labeled with 5 prime on the left and 3 prime on the right. The entire piece is labeled region of target R N A to be amplified. Step 1: Add synthetic D N A oligonucleotide primer. The same strand is shown with a small brown box under the left end with its 5 prime end toward the right and an arrow pointing to the left. Step 2: Add reverse transcriptase to catalyze 5 prime to 3 prime D N A synthesis. This produces two strands of the same length, a green strand above a blue strand. Step 3: Heat to separate strands; reverse transcriptase inactivated; R N A no longer used as a template. Step 4: Add synthetic oligonucleotide primers. The blue strand is shown with its 3 prime end on the left and its 5 prime end on the right. A small brown box is shown under the left side with its 5 prime end toward the left and an arrow pointing toward the right. Step 5: Thermostable italicized Taq end italics D N A polymerase catalyzes 5 prime to 3 prime D N A synthesis. Two blue strands are shown with the small brown box present at the left end of the bottom stand. An arrow points down accompanied by text reading, Repeat steps 3 and 4. The two blue stands are shown farther apart. The top strand has a small brown box under its left side with an arrow pointing toward the right. The bottom strand still has a small brown segment on its left side and has a small brown box above its right side with an arrow pointing toward the left. An arrow points down accompanied by text reading, D N A synthesis (step 5) is catalyzed by the thermostable D N A polymerase (still present). This produces two pairs of strands. The top strand has its 3 prime end on the left, and its complementary bottom strand has a small brown box on its left side. The bottom pair of strands has a small brown box on the right side of the top strand and another small brown box on the left side of the bottom strand. An arrow points down accompanied by text reading, repeat steps 3 through 5. Four paired sets of D N A strands are shown. The top pair is identical to the top pair in the previous step, and the bottom three pairs are identical to the bottom pair in the previous step. An arrow points down to text reading, After 20 cycles, the target sequence has been amplified about 10 superscript 6 end superscript fold. Part b shows a vertical loop of D N A. The left side begins with a green sphere with lines extending out in all directions labeled fluorophore. A vertical strand runs up from it with three lines extending to the right to form complementary base pairs with the right-hand strand. It forms a bulge to the left with a vertical line extending out that does not come near the vertical line extending left from the right hand strand. It begins running along the right-hand strand again with two vertical lines that extend right to pair with the right-hand strand. At the top is a loop labeled probe with three vertical lines extending inward on each side of the loop. The right-hand strand runs downward with two horizontal lines paired with the left-hand stand, an unpaired vertical line, and three more vertical lines that pair with the left-hand strand. It ends at a blue sphere labeled quenching molecule. An arrow points down accompanied by text that reads, Probe binds preferentially to target D N A; fluorophore is separated from the quenching molecule, and the fluorescence signal increases. The strand is shown running horizontally with vertical lines extending downward with the green fluorophore on the left and with the right side bent up diagonally to end at the blue quenching molecule. Beneath it, a strand of target D N A runs horizontally with vertical lines extending upward to bond with the vertical lines extending down from the strand above. A graph is shown below with P C R cycle number shown on the horizontal axis, ranging from 0 to over 50 and labeled in increments of 20, and signal (arbitrary units) on the vertical axis, ranging from below 0 to 2 and labeled in increments of 1. A dotted horizontal line labeled threshold extends from the vertical axis at 0.2. A line labeled baseline begins at (0, 0) and increases very slowly to end slightly below the dotted horizontal line at (48, 0.07). The right side of the line is labeled no template. All of the curves begin by running along the baseline. The curve for sample 1 begins to rise above the curve for baseline at (6, 0) and crosses the dotted horizontal line at (18, 0.2), which is labeled C T. The curve rises rapidly upward, then begins to level off at (28, 1.6), then levels off even more in a region labeled plateau before ending at (48, 1.7). The curve for sample 2 begins to rise above the curve for baseline at (15, 0) and crosses the dotted line at (20, 0.2). It begins to rise more rapidly at (22, 0.3), them begins to level off at (30, 1.6) before ending at (48, 1.65). The curve for sample 3 begins to rise above the curve for baseline at (18, 0) and crosses the dotted line at (22, 0.2). It begins to rise more rapidly at (23, 0.3), then begins to level off at (37, 1.6) before ending at (48, 1.6). The region during which the curves all rise most rapidly is labeled exponential phase. All data are approximate.

PCR protocols can also be used to estimate the relative copy numbers of particular sequences in a sample, an approach called quantitative PCR (qPCR) or real-time PCR. If a DNA sequence is present in higher than usual amounts in a sample — for example, if certain genes are amplified in tumor cells — qPCR can reveal the increased representation of that sequence. In brief, the PCR is carried out in the presence of a probe that emits a fluorescent signal when the PCR product is present (Fig. 9-12b). If the sequence of interest is present at higher levels than other sequences in the sample, the PCR signal will reach a predetermined threshold faster. Reverse transcriptase PCR and qPCR can be combined to determine the relative concentrations of a particular mRNA molecule in a cell, and thereby monitor gene expression under different environmental conditions.

DNA Libraries Are Specialized Catalogs of Genetic Information

In some instances, it is useful to clone many genes or genomic segments rather than a particular one. A DNA library is a collection of DNA clones, usually gathered for purposes of gene discovery or the determination of gene or protein function. The library can take a variety of forms, depending on the source of the DNA and the ultimate purpose of the library.

An example is a library that includes only the genes that are transcribed into RNA — expressed — in a given organism or even just in certain cells or tissues. Such a library lacks any genomic DNA that is not transcribed. The researcher first extracts mRNA from an organism, or from specific cells of an organism, and then prepares the complementary DNAs (cDNAs). Like RT-PCR, this multistep reaction (Figure 9-13a) relies on reverse transcriptase, which synthesizes DNA from a template RNA. The resulting double-stranded DNA fragments are inserted into a suitable vector and cloned, creating a population of clones called a cDNA library. If the library host is a bacterium like E. coli, each cell in the population will carry one particular cloned sequence. The library will encompass many millions of cells with millions of different cloned segments. The presence of a gene for a particular protein in such a library implies that this gene is expressed in the cells and under the conditions used to generate the library.

A figure shows how a c D N A library is constructed from m R N A. — FIGURE 9-13 Building a cDNA library from mRNA. A cell’s total mRNA content includes transcripts from thousands of genes, and the cDNAs generated from this mRNA are correspondingly heterogeneous. Reverse transcriptase can synthesize DNA on an RNA or a DNA template. To prime the synthesis of a second DNA strand, oligonucleotides of known sequence are ligated to the $3^{'}$ $3 prime$ end of the first strand, and the double-stranded cDNA so produced is cloned into a plasmid.

FIGURE 9-13 Building a cDNA library from mRNA. A cell’s total mRNA content includes transcripts from thousands of genes, and the cDNAs generated from this mRNA are correspondingly heterogeneous. Reverse transcriptase can synthesize DNA on an RNA or a DNA template. To prime the synthesis of a second DNA strand, oligonucleotides of known sequence are ligated to the $3^{'}$ $3 prime$ end of the first strand, and the double-stranded cDNA so produced is cloned into a plasmid.

A molecule of m R N A is shown as a green strand with its 5 prime end on the left and a series of 8 A at the right side. An arrow points down accompanied by text reading, m R N A template is annealed to synthetic oligonucleotide (oligo-d T) primer. The same molecule is shown with a small blue stand beneath its right side with its 3 prime end on the left and a series of 8 T complementary to the 8 A of the top strand. An arrow points down accompanied by text reading, Reverse transcriptase and d N T Ps yield a complementary D N A strand. The original strand is shown with a blue strand of the same length beneath it. This is labeled m R N A-D N A hybrid. The new strand has its 3 prime end on the left. An arrow points down accompanied by text reading, m R N A is degraded with alkali. This leaves only the bottom blue strand. An arrow points down accompanied by text reading, To prime synthesis of a second strand, an oligonucleotide of known sequence is often ligated to the 3 prime end of the c D N A. The same strand is shown with a small brown box on its left side. The left side of the box is labeled 3 prime. A similar brown box is shown above with its 5 prime end to the left. An arrow points down accompanied by text reading, D N A polymerase Roman numeral 1 and d N T Ps extend the primer to yield double-stranded D N A. A double stranded molecule is shown labeled duplex D N A. The bottom strand is the same as in the previous step. The top strand has the brown box from the previous step but now has a blue strand that extends to end with a series of 8 A complementary to the 8 T at the end of the bottom strand.

Another type of library, called a combinatorial gene library or simply a gene library, focuses on sequence variants within one gene. For example, beginning with the cloned gene of enzyme X, a segment of the gene could be replaced with nearly identical fragments synthesized with a slight imprecision so that each clone had one or two random base pair changes relative to the original. For example, the gene segment of interest could be amplified by PCR using an altered DNA polymerase that was slightly inaccurate. The library of clones would then consist of many cells, many of which harbored a different variant of the gene for enzyme X. Investigators could use the library to select for variants of enzyme X with enhanced catalytic properties or could simply determine which changes were functional and which were not. The possibilities are limited only by the imagination of the researcher.

SUMMARY 9.1 Studying Genes and Their Products

DNA cloning and genetic engineering involve the cleavage of DNA and assembly of DNA segments in new combinations — recombinant DNA. Cloning entails generating a DNA fragment of interest, inserting the fragment into a suitable cloning vector, transferring the vector with the DNA insert into a host cell for replication, and identifying and selecting cells that contain the DNA fragment.
Key enzymes in gene cloning include restriction endonucleases (especially the type II enzymes) and DNA ligase.
Cloning vectors include plasmids and, for the longest DNA inserts, bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs).
Cloned genes can be expressed in a host cell by incorporating them into expression vectors that have the sequence signals needed for transcription and translation.
Proteins can be expressed in different types of cells using expression systems with various useful features and advantages.
Genetic engineering techniques can alter cloned genes as required by the investigator.
Proteins or peptides can be attached to a protein of interest by altering its cloned gene, creating a fusion protein. The additional peptide segments can be used to detect the protein or to purify it, using convenient affinity chromatography methods.
The polymerase chain reaction (PCR) permits the amplification of chosen segments of DNA or RNA for cloning and can be adapted to determine gene copy number or to monitor gene expression quantitatively.
DNA libraries consist of many clones, encompassing many genomic segments or many variants of a particular gene.