28.3 Regulation of Gene Expression in Eukaryotes in Chapter 28 Regulation of Gene Expression

28.3 Regulation of Gene Expression in Eukaryotes

Initiation of transcription is a crucial regulation point for gene expression in all organisms. Although eukaryotes and bacteria use some of the same regulatory mechanisms, the regulation of transcription in the two systems is fundamentally different.

We can define a transcriptional ground state as the inherent activity of promoters and transcriptional machinery in vivo in the absence of regulatory sequences. In bacteria, RNA polymerase generally has access to every promoter and can bind and initiate transcription at some level of efficiency in the absence of activators or repressors. In eukaryotes, however, strong promoters are generally inactive in vivo in the absence of regulatory proteins. This fundamental difference gives rise to at least five important features that distinguish the regulation of gene expression at eukaryotic promoters from that observed in bacteria.

First, access to eukaryotic promoters is restricted by the structure of chromatin, and activation of transcription is associated with many changes in chromatin structure in the transcribed region. Second, although eukaryotic cells have both positive and negative regulatory mechanisms, positive mechanisms are more prominent. Almost every eukaryotic gene requires activation to be transcribed. Third, regulatory mechanisms involving lncRNAs are more common in eukaryotic transcriptional regulation. Fourth, eukaryotic cells have larger, more complex multimeric regulatory proteins than do bacteria. Finally, transcription in the eukaryotic nucleus is separated from translation in the cytoplasm in both space and time.

The complexity of regulatory circuits in eukaryotic cells is extraordinary, as is evident from the following discussion. The section ends with an illustrated description of one of the most elaborate circuits: the regulatory cascade that controls development in fruit flies.

Transcriptionally Active Chromatin Is Structurally Distinct from Inactive Chromatin

The effects of chromosome structure on gene regulation in eukaryotes have no clear parallel in bacteria. In the eukaryotic cell cycle, interphase chromosomes appear, at first viewing, to be dispersed and amorphous (see Fig. 24-22). Nevertheless, several forms of chromatin can be found along these chromosomes. About 10% of the chromatin in a typical eukaryotic cell is in a more condensed form than the rest of the chromatin. This form, heterochromatin, is transcriptionally inactive. Heterochromatin is generally associated with particular chromosome structures — the centromeres, for example. The remaining, less condensed chromatin is called euchromatin.

Transcription of a eukaryotic gene is strongly repressed when its DNA is condensed within heterochromatin. Some, but not all, of the euchromatin is transcriptionally active. Transcriptionally active chromosomal regions are distinguished from heterochromatin in at least three ways: the positioning of nucleosomes, the presence of histone variants, and the covalent modification of nucleosomes. These transcription-associated structural changes in chromatin are collectively called chromatin remodeling. The remodeling employs a set of enzymes that promote these changes (Table 28-2).

TABLE 28-2 Some Enzyme Complexes That Catalyze Chromatin Structural Changes Associated with Transcription
Enzyme complex^a	Oligomeric structure (number of polypeptides)	Source	Activities
Histone movement, replacement, or editing, requiring ATP
SWI/SNF family	$8 - 17, M_{r} > 10^{6}$ $8 minus 17 comma upper M Subscript r Baseline greater-than 10 Superscript 6$	Eukaryotes	Nucleosome remodeling; transcriptional activation
ISWI family	2−4	Eukaryotes	Nucleosome remodeling; transcriptional repression; transcriptional activation in some cases
CHD family	1−10	Eukaryotes	Nucleosome remodeling; nucleosome ejection for transcriptional activation; some have repressive roles
INO80 family	$> 10$ $greater-than 10$	Eukaryotes	Nucleosome remodeling; transcriptional activation; family member SWR1 engages in replacement of H2A-H2B with H2AZ-H2B
Histone modification
GCN5-ADA2-ADA3	3	Yeast	GCN5 has type A HAT activity
SAGA/PCAF	$> 20$ $greater-than 20$	Eukaryotes	Includes GCN5-ADA2-ADA3; acetylates residues in H3, H2B, H2AZ
NuA4	$\geq 12$ $greater-than-or-equal-to 12$	Eukaryotes	EsaI component has HAT activity; acetylates H4, H2A, and H2AZ
Histone chaperones not requiring ATP
HIRA	1	Eukaryotes	Deposition of H3.3 during transcription
^a The abbreviations for eukaryotic genes and proteins are often more confusing or obscure than those used for bacteria. SWI (switching) was discovered as a protein required for expression of certain genes involved in mating-type switching in yeast, and SNF (sucrose nonfermenting) as a factor for expression of the yeast gene for sucrase. Subsequent studies revealed multiple SWI and SNF proteins that act in a complex. The SWI/SNF complex has a role in expression of a wide range of genes and has been found in many eukaryotes, including humans. ISWI is imitation SWI. CHD is chromodomain, helicase, DNA binding; INO80 is inositol-requiring 80; and SWR1 is SWi2/Snf2-related ATPase 1. The complex of GCN5 (general control nonderepressible) and ADA (alteration/deficiency in activation) proteins was discovered during investigation of the regulation of nitrogen metabolism genes in yeast. These proteins can be part of the larger SAGA (SPF, ADA2,3, GCN5, acetyltransferase) complex in yeasts. The equivalent of SAGA in humans is PCAF (p300/CBP-associated factor). NuA4 is nucleosome acetyltransferase of H4; ESA1 is essential SAS2-related acetyltransferase; HIRA is histone regulator A.

Four known families of chromatin remodeling complexes, distinguished by their structural features, act directly to alter nucleosome composition in transcribed regions. They may unwrap, translocate, remove, or exchange nucleosomes on the DNA, hydrolyzing ATP in the process (Table 28-2; see the table footnote for an explanation of the abbreviated names of enzyme complexes described here). In some cases, the enzymes catalyze the exchange of pairs of histones within nucleosomes to alter nucleosome composition. The multitude of different complexes are specialized to function at particular genes or chromosomal regions. There are two related complexes in the SWI/SNF family in all eukaryotic cells, both of which remodel chromatin so that nucleosomes are ejected from the DNA near transcription start sites. They appear to be involved in a dynamic cycle to allow replacement of nucleosomes with transcription factors (Fig. 28-28). The two distinct complexes generally function at different sets of genes. Most of the ISWI family complexes optimize nucleosome spacing to allow chromatin assembly and transcriptional silencing. There are generally 9 or 10 different CHD family complexes in eukaryotic cells, separated into three subfamilies. The different family members have specialized roles, either ejecting nucleosomes to activate transcription or assembling chromatin to repress transcription. The INO80 family complexes have a variety of roles in remodeling chromatin for transcriptional activation and DNA repair. One family member, SWR1, promotes subunit exchange in nucleosomes to introduce histone variants such as H2AZ (see Box 24-1), found in transcriptionally active regions.

A figure shows nucleosome ejection by a S W I / S N F remodeler. — FIGURE 28-28 Nucleosome ejection by a SWI/SNF remodeler. The SWI/SNF enzyme engulfs the nucleosome, interacting with short CGCG sequences nearby. With the aid of ATP hydrolysis, the DNA is partially separated from the nucleosome, exposing a site for transcription factor (TF) binding. After the transcription factor is bound, the nucleosome is ejected. When transcription is no longer needed, the nucleosome can again replace the transcription factor or factors, completing the cycle. [Information from S. Brahma and S. Henikoff, *Trends Biochem. Sci.* 45:13, 2020.]

At the twelve o’clock position, a piece of blue D N A is shown that begins almost horizontally with a slight upward angle. It has a yellow sequence of C G C G, then bends upward to curve around the left half of a nucleosome. At the top, there is a red region labeled, T F-binding motif. The same blue strand wraps around the other side before leaving horizontally to the right. The visible half of the nucleosome has a circular shape divided into four triangular pieces of different colors. A second half is partially visible emerging behind the D N A to the left. An arrow points clockwise accompanied by a curved line showing the addition of S W I / S N F and by a curved arrow showing the addition of an orange oval labeled A T P and loss of A D P plus P subscript i end subscript. This yields a similar structure at the three o’clock position with a purple structure on both sides. An irregular purple piece covers most of the visible right end of the nucleosome and has two protrusions across the top. An irregular rounded portion to the right has a green triangular piece attached and is labeled S W I / S N F. An arrow points clockwise accompanied by a curved arrow showing the addition of a yellow oval labeled T F and the loss of S W I / S N F. This yields a structure at the twelve o’clock position that has a visible interior with two discs made up of four triangles each, but the left-hand D N A strand has unwound to expose the left side. This piece extends to the upper right to overlap the bottom of the yellow T F oval, then the part with the yellow segment extends to the lower left. An arrow points clockwise accompanied by an arrow that branches away to show the loss of the two discs of the nucleosome. This leaves a strand of blue D N A bound to T F just below the nine o’clock position. An arrow points clockwise accompanied by an arrow showing the loss of T F. A curved line shows the addition of the two discs of a nucleosome. The clockwise arrow points back to the original nucleosome at the twelve o’clock position. All data are approximate.

The covalent modification of histones is altered dramatically within transcriptionally active chromatin. The core histones of nucleosome particles (H2A, H2B, H3, H4; see Fig. 24-24) are modified by methylation of Lys or Arg residues, phosphorylation of Ser or Thr residues, acetylation (see below), ubiquitination (see Fig. 27-47), or SUMOylation (SUMOs are small ubiquitin-like modifiers). Each of the core histones has two distinct structural domains. A central domain is involved in histone-histone interaction and the wrapping of DNA around the nucleosome. A lysine-rich amino-terminal domain is generally positioned near the exterior of the assembled nucleosome particle; the covalent modifications occur at specific residues concentrated in this amino-terminal domain. The patterns of modification have led some researchers to propose the existence of a histone code, in which modification patterns are recognized by enzymes that alter the structure of chromatin. Indeed, some of the modifications are essential for interactions with proteins that play key roles in transcription.

The acetylation and methylation of histones figure prominently in the processes that activate chromatin for transcription. During transcription, histone H3 in nucleosomes is methylated (by specific histone methylases) at ${Lys}^{4}$ $Lys Superscript 4$ near the $5'$ $5 prime$ end of the coding region and at ${Lys}^{36}$ $Lys Superscript 36$ within the coding region. These methylations enable the binding of histone acetyltransferases (HATs), enzymes that acetylate particular Lys residues. Cytosolic (type B) HATs acetylate newly synthesized histones before the histones are imported into the nucleus. The subsequent assembly of the histones into chromatin after replication is facilitated by histone chaperones: CAF1 for H3 and H4 (see Box 24-1), and NAP1 for H2A and H2B.

Where chromatin is being activated for transcription, the nucleosomal histones are further acetylated by nuclear (type A) HATs. The acetylation of multiple Lys residues in the amino-terminal domains of histones H3 and H4 can reduce the affinity of the entire nucleosome for DNA. Acetylation of particular Lys residues is critical for the interaction of nucleosomes with other proteins.

When transcription of a gene is no longer required, the extent of methylation and acetylation of nucleosomes in that vicinity is reduced as part of a general gene-silencing process that restores the chromatin to a transcriptionally inactive state. There are two known classes of demethylases. One class, called LSD (lysine-specific histone demethylases), first converts the ${CH}_{3} ― N$ $CH Subscript 3 Baseline quotation-dash upper N$ linkage to an imine $({CH}_{2} ═ N)$ $left-parenthesis CH Subscript 2 Baseline box drawings double horizontal upper N right-parenthesis$ linkage, followed by hydrolysis to generate formaldehyde and the demethylated lysine. The other class of demethylases contains JmjC (Jumonji-C) domains, first hydroxylating the methyl group, which is again subsequently removed as formaldehyde. More than 20 JmjC domain–containing histone demethylases are encoded by mammalian genomes. They are part of the same α-ketoglutarate-dependent hydroxylase enzyme family that includes the enzyme that hydroxylates proline residues in collagen (see Box 4-2). These enzymes are strongly inhibited by 2-hydroxyglutarate, an unusual metabolite produced in abundance by a mutated form of isocitrate dehydrogenase that is common in human cancers (see Fig. 16-20). Within the tumors, the high levels of 2-hydroxyglutarate produce global changes in gene expression.

Histone acetylation is reduced by the action of histone deacetylases (HDACs). The deacetylases include SIRT1, SIRT2, SIRT6, and SIRT7, which are ${NAD}^{+}$ $NAD Superscript plus$ -dependent enzymes in the sirtuin family (SIRT1–7 in mammals). These enzymes deacetylate specific Lys residues in histones and other, cytoplasmic targets. In addition to the removal of certain acetyl groups, new covalent modification of histones marks chromatin as transcriptionally inactive. For example, ${Lys}^{9}$ $Lys Superscript 9$ of histone H3 is often methylated in heterochromatin.

The net effect of chromatin remodeling in the context of transcription is to make a segment of the chromosome more accessible and to “label” (chemically modify) it so as to facilitate the binding and activity of transcription factors that regulate expression of the gene or genes in that region.

Most Eukaryotic Promoters Are Positively Regulated

As already noted, eukaryotic RNA polymerases have little or no intrinsic affinity for their promoters. The default state of eukaryotic genes is “off,” and initiation of transcription is almost always dependent on the action of multiple activator proteins. One important reason for the apparent predominance of positive regulation seems obvious: the storage of DNA within chromatin effectively renders most promoters inaccessible, so genes are silent in the absence of other regulation. The structure of chromatin affects access to some promoters more than others, but repressors that bind to DNA so as to preclude access of RNA polymerase (negative regulation) would often be simply redundant. Other factors must be at play in the use of positive regulation, and speculation generally centers around two: the large size of eukaryotic genomes and the greater efficiency of positive regulation.

First, nonspecific DNA binding of regulatory proteins becomes a more important problem in the much larger genomes of higher eukaryotes. And the chance that a single specific binding sequence will occur randomly at an inappropriate site also increases with genome size. Combinatorial control thus becomes important in a large genome (Fig. 28-29). Specificity for transcriptional activation can be improved if each of several positive regulatory proteins must bind specific DNA sequences to activate a gene. The average number of regulatory sites for a gene in a multicellular organism is six, and genes that are regulated by a dozen such sites are common. The requirement for binding of several positive regulatory proteins to specific DNA sequences vastly reduces the probability of the random occurrence of a functional juxtaposition of all the necessary binding sites. This requirement also reduces the number of regulatory proteins that must be encoded by a genome to regulate all of its genes (Fig. 28-28). Thus, a new regulator is not needed for every gene, although regulation is complex enough in higher eukaryotes that regulatory proteins may represent 5% to 10% of all protein-coding genes.

A figure shows the advantages of combinatorial control by showing 36 different structures, presented as six columns in seven rows, that show different regulatory combinations possible from six proteins from two families. Each structure has a blue piece of D N A with a yellow segment to the right and two loops to the left. The loops differ in colors to show all of the different combinations. — FIGURE 28-29 The advantages of combinatorial control. Combinatorial control allows specific regulation of many genes using a limited repertoire of regulatory proteins. Consider the possibilities inherent in regulation by two different families of leucine zipper proteins (red and green). If each regulatory gene family had three members (as shown here, in dark, medium, and light shades, each binding to a different DNA sequence) that could freely form either homo- or heterodimers, there would be six possible dimeric species in each family and each dimer would recognize a different bipartite regulatory DNA sequence. If a gene had a regulatory site for each protein family, 36 different regulatory combinations would be possible, using just the six proteins from these two families. With six or more sites used in the regulation of a typical eukaryotic gene, the number of possible variants is much greater than this example suggests.

In principle, a similar combinatorial strategy could be used by multiple negative regulatory elements, but this brings us to the second reason for the use of positive regulation: it is simply more efficient. If the ~20,000 genes in the human genome were negatively regulated, each cell would have to synthesize, at all times, all of the different repressors in concentrations sufficient to permit specific binding to each “unwanted” gene. In positive regulation, most of the genes are usually inactive (that is, RNA polymerases do not bind to the promoters) and the cell synthesizes only the activator proteins needed to promote transcription of the subset of genes required in the cell at that time.

These arguments notwithstanding, there are examples of negative regulation in eukaryotes, from yeasts to humans, as we shall see. Some of that negative regulation involves lncRNAs, which are more economical to synthesize than repressor proteins.

DNA-Binding Activators and Coactivators Facilitate Assembly of the Basal Transcription Factors

To continue our exploration of the regulation of gene expression in eukaryotes, we return to the interactions between promoters and RNA polymerase II (Pol II), the enzyme responsible for the synthesis of eukaryotic mRNAs. Although many (but not all) Pol II promoters include the TATA box and Inr (initiator) sequences, with their standard spacing (see Fig. 26-8), they vary greatly in both the number and the location of additional sequences required for the regulation of transcription.

The additional regulatory sequences, generally bound by transcription activators, are usually called enhancers in higher eukaryotes and upstream activator sequences (UASs) in yeast. A typical enhancer may be found hundreds or even thousands of base pairs upstream from the transcription start site, or may even be downstream, within the gene itself. When bound by the appropriate regulatory proteins, an enhancer increases transcription at nearby promoters regardless of its orientation in the DNA. The UASs of yeast function in a similar way, although generally they must be positioned upstream and within a few hundred base pairs of the transcription start site.

Successful binding of the active Pol II holoenzyme at one of its promoters usually requires the combined action of proteins of five types: (1) transcription activators, which bind to enhancers or UASs and facilitate transcription; (2) architectural regulators, which facilitate DNA looping; (3) chromatin modification and remodeling proteins, described above; (4) coactivators; and (5) basal transcription factors, also called general transcription factors (see Fig. 26-9, Table 26-2), required at most Pol II promoters (Fig. 28-30). The coactivators are required for essential communication between activators and the complex composed of Pol II and the basal transcription factors. Coactivators also play a direct role in assembly of the preinitiation complex (PIC). Furthermore, a variety of repressor proteins can interfere with communication between Pol II and the activators, resulting in repression of transcription (Fig. 28-30b). Here we focus on the protein complexes shown in Figure 28-30 and how they interact to activate transcription.

A three-part figure shows eukaryotic and regulatory proteins by showing a composite promoter and activation in part a, functions of eukaryotic transcriptional repressors in part b, and the structure of an H M G protein complex with D N A in part c. — FIGURE 28-30 Eukaryotic promoters and regulatory proteins. RNA polymerase II and its associated basal (general) transcription factors form a preinitiation complex at the TATA box and Inr site of the cognate promoters, a process facilitated by transcription activators, acting through coactivators (Mediator, TFIID, or both). (a) A composite promoter with typical sequence elements and protein complexes found in both yeast and higher eukaryotes. The carboxyl-terminal domain (CTD) of Pol II (see Fig. 26-9) is an important point of interaction with Mediator and other protein complexes. Histone modification enzymes (not shown) catalyze methylation and acetylation; remodeling enzymes alter the content and placement of nucleosomes. The transcription activators have distinct DNA-binding domains and activation domains. In some cases, their function is affected by interaction with lncRNAs. Arrows indicate common modes of interaction often required for the activation of transcription. The HMG proteins are a common type of architectural regulator (see Fig. 28-5), allowing the looping of the DNA required to bring together system components bound at distant binding sites. (b) Eukaryotic transcriptional repressors function through a range of mechanisms. Some bind directly to DNA, displacing a protein complex required for activation (not shown); many others interact with various parts of the transcription or activation protein complexes to prevent activation. Possible points of interaction are indicated with arrows. (c) The structure of an HMG protein complex with DNA shows how HMG proteins facilitate DNA looping. The binding is relatively nonspecific, although DNA sequence preferences have been identified for many HMG proteins. Shown here is the HMG domain of the protein HMG-D of *Drosophila*, bound to DNA. [(c) Data from PDB ID 1QRV, F. V. Murphy IV et al., *EMBO J.* 18:6610, 1999.]

Part a is labeled activation. A double-stranded piece of D N A, shown as two blue strands, begins at the upper left and curves counterclockwise down to the lower right. At just past the twelve o’clock position, the D N A has two dark blue segments (one on the top strand and one on the bottom strand) labeled U A S above two vertical purple bars lined up adjacent beneath the bottom strand. Two wavy green lines labeled L n c R N A are shown with a dashed arrow pointing to the left-hand purple bar. Between the nine and eleven o’clock positions, four brown crescent-shaped H M G proteins are aligned sequentially along the D N A. At the lower left, the D N A has two blue segments in the top and bottom strands with two brown crescent-shaped structures on the inside curved so that they are open above each blue bar with ends on either side of each bar. A purple piece above the two crescent-shaped structure is roughly oval with cutouts that fit against the crescent shaped structures. A second set of blue bars and accompanying structures is adjacent. The crescents and purple pieces are labeled, transcription activators and coactivators. The D N A continues across the bottom of the structure to two diagonal lines representing a cut, then reaches a P o l Roman numeral 2 initiation complex. The D N A curves upward and has a purple T A T A region just before it bends horizontally and then becomes dark yellow as it runs out to the right. After leaving the complex, the D N A has two diagonal lines separating it above the word gene. Pol Roman numeral 2 is shown behind many structures and provides the opening through which the D N A travels. A horizontal oval piece with a rounded cutout in the bottom runs across the T A T A box. A crescent-shaped green structure is beneath the T A T A box and opens to the left. To the right of the T A T A box, there is a brown protrusion from the upper right of a rounded structure with its base in front of a darker brown oval that protrudes upward across the dark yellow D N A in the middle of the polymerase. Behind it and extending to the right, there is a yellow rounded structure with a rounded cutout in the bottom labeled T F I I D that extends up behind the D N A. An irregular gray piece extends over the top of it and over the polymerase. It has a rounded square protrusion to the right and a long horizontal protrusion to the left that meets the top of a narrow curved purple structure that resembles an inverted “C”. The polymerase has a thin diagonal protrusion at the upper left labeled C T D. A short vertical line points upward from the blue D N A where it becomes dark yellow and meets a horizontal arrow pointing right beneath the green word O N. Many green arrows are shown. A green arrow points down from a purple bar beneath U A S to T F I I D. A second arrow points from the other purple bar to a large blue structure with a roughly linear shape that runs diagonally from the lower left to upper right, where it bends to the lower right. An arrow points from this to C T D, the protrusion to the upper left of Pol Roman numeral 2. An arrow points from one of the sets of transcription activators and coactivators at the lower left to the same large blue structure. An arrow points from the second of these transcription factors and coactivators to T F I I D. An arrow points from T F I I D to T B P. Part b is labeled repression. The structures shown are similar, but different arrows are present. Two red ovals are shown side by side above two wavy green lines. Accompanying text reads, repressors and / or l n c R N As. A red arrow points from one of the green wavy lines to the transcription activators and coactivators at the lower left. A red arrow points from one red oval to one of the purple bars beneath U A S and an arrow points from the other red oval to the large blue linear structure in the center. Red arrows also point from the ovals to T B P and to T R I I D. The short vertical line still extends up from the place where the blue D N A becomes dark yellow and meets a horizontal line pointing to the right, but red text above reads O F F. Part c shows a curved brown structure containing many helices labeled H M G domain. D N A extends beneath this brown structure, extending away from the viewer and into its lower right side. All data are approximate.

Transcription Activators

The requirements for activators vary greatly from one promoter to another. A few are known to activate transcription at hundreds of promoters, whereas others are specific for a few promoters. Many activators are sensitive to the binding of signal molecules, providing the capacity to activate or deactivate transcription in response to a changing cellular environment. Some enhancers bound by activators are quite distant from the promoter’s TATA box. Multiple enhancers (often six or more) are bound by a similar number of activators for a typical gene, providing combinatorial control and response to multiple signals.

Some transcription activators can bind to both DNA and RNA, and their function is affected by one or more lncRNAs. The protein $NF- κ B$ $NF hyphen kappa upper B$ , for example (Fig. 28-14), activates transcription of many genes involved in the immune response and cytokine production. It can bind to a DNA enhancer site or, alternatively, to an lncRNA called lethe, named after the river of forgetfulness in Greek mythology. The lncRNA reduces transcription of genes controlled by $NF- κ B$ $NF hyphen kappa upper B$ .

Architectural Regulators

How do activators function at a distance? The answer in most cases seems to be that, as indicated earlier, the intervening DNA is looped so that the various protein complexes can interact directly. The looping is promoted by architectural regulators that are abundant in chromatin and bind to DNA with limited specificity. Most prominently, the high mobility group (HMG) proteins (Fig. 28-29c; “high mobility” refers to their electrophoretic mobility in polyacrylamide gels) play an important structural role in chromatin remodeling and transcriptional activation.

Coactivator Protein Complexes

Most transcription requires the presence of additional protein complexes. Some major regulatory protein complexes that interact with Pol II have been defined both genetically and biochemically. These coactivator complexes act as intermediaries between the transcription activators and the Pol II complex.

Mediator, a complex consisting of 25 (yeast) to 30 (human) polypeptides, is a major eukaryotic coactivator (Fig. 28-30). Many of the 25 core polypeptides are highly conserved from fungi to humans. A subcomplex of four subunits has a kinase role, interacting transiently with the remainder of the Mediator complex, and may dissociate prior to transcription initiation. Mediator binds tightly to the carboxyl-terminal domain (CTD) of the largest subunit of Pol II. The Mediator complex is required for both basal and regulated transcription at many promoters used by Pol II, and it also stimulates phosphorylation of the CTD by TFIIH (a basal transcription factor). Transcription activators interact with one or more components of the Mediator complex, with the precise interaction sites differing from one activator to another. Coactivator complexes function at or near the promoter’s TATA box.

Additional coactivators, functioning with one or a few genes, have also been described. Some of these operate in conjunction with Mediator, and some may act in systems that do not employ Mediator.

TATA-Binding Protein and Basal Transcription Factors

The first component to bind in the assembly of a preinitiation complex (PIC) at the TATA box of a typical Pol II promoter is the TATA-binding protein (TBP). At promoters lacking a TATA box, TBP is usually delivered as part of a larger complex (13 to 14 subunits) called TFIID. The complete complex also includes the basal transcription factors TFIIB, TFIIE, TFIIF, TFIIH; Pol II; and perhaps TFIIA. This minimal PIC, however, is often insufficient for initiation of transcription and generally does not form at all if the promoter is obscured within chromatin. Positive regulation, leading to transcription, is imposed by the activators and coactivators. Mediator interacts directly with TFIIH and TFIIE, allowing their recruitment to the PIC.

Choreography of Transcriptional Activation

We can now begin to piece together the sequence of transcriptional activation events at a typical Pol II promoter (Fig. 28-31). The exact order of binding of some components may vary, but the model in Figure 28-31 illustrates the principles of activation as well as one common path. Many transcription activators have significant affinity for their binding sites even when the sites are within condensed chromatin. The binding of activators is often the event that triggers subsequent activation of the promoter. Binding of one activator may enable the binding of others, gradually displacing some nucleosomes.

A figure shows the components of transcriptional activation. — FIGURE 28-31 The components of transcriptional activation. Activators bind the DNA first. The activators recruit the histone modification/nucleosome remodeling complexes and a coactivator such as Mediator. Mediator facilitates the binding of TBP (or TFIID) and TFIIB, and the other basal transcription factors and Pol II then bind. Phosphorylation of the CTD of Pol II leads to transcription initiation (not shown). [Information from J. A. D’Alessio et al., *Mol. Cell* 36:924, 2009.]

A blue double stranded piece of D N A begins vertically at the upper left, then curves to run horizontally. In the curved region, there are blue bars labeled enhancers with an activator bound to the enhancer on the right-hand strand. The activator has two brown crescent-shaped pieces that touch the D N A with their left and right sides and curve upward to fit into rounded cutouts in a purple activator above that is rounded on top. Just to the right of the venter of the horizontal piece, there is a purple region labeled T A T A, a space with just blue D N A, and then orange region to its right labeled l n r. An arrow points downward accompanied by text reading, mediator. A second arrow points downward accompanied by text reading, modification and remodeling enzymes. This yields a similar piece of D N A except that a large green complex is now above the activator and its lower right side is joined to a long vertical blue structure labeled mediator. The green complex is oval in the venter with oval protrusions above, to the sides, and on either side of the activator below. The mediator is a long structure with a rounded oval that joins with the modification and remodeling complex. It extends diagonally to the upper right, then bends down to the left for a short distance. An arrow points down accompanied by text reading, T B P and T F I I B. This yields a similar structure in which the right-hand piece of horizontal D N A has bent so that it comes close to the mediator before running horizontally. A brown oval piece with a rounded cutout labeled T B P runs horizontally across the purple T A T A box and a green crescent labeled T F I I B curves from beneath the T A T A box to meet the mediator with its upper left end. A series of six arrows point downward. From top to bottom, they are labeled T F I I D, T F I I A, T F I I B, T F I I F – Pol Roman numeral 2, T F I I E, and T F I I H. This yields a similar structure in which more structures are present around the curved piece of D N A. A large rounded, brown Pol Roman numeral 2 is visible behind other structures with its upper left side visible with a long protrusion to the upper left labeled C T D that extends across the moderator. To the right of the T A T A box, there is a brown protrusion from the upper right of a rounded structure with its base in front of a darker brown oval that protrudes upward across the D N A in the middle of the polymerase. Behind it and extending to the right, there is a yellow rounded structure with a rounded cutout in the bottom labeled T F I I D that extends up behind the D N A. An irregular gray piece labeled T F I I H extends over the top of it and over the polymerase. It has a rounded square protrusion to the right and a long horizontal protrusion to the left that meets the top of a narrow curved purple structure that resembles an inverted “C”.

Crucial remodeling of the chromatin then takes place in stages, facilitated by interactions between activators and HATs or enzyme complexes such as SWI/SNF, or both. In this way, a bound activator can draw in other components necessary for further chromatin remodeling to permit transcription of specific genes. The bound activators interact with the large Mediator complex. Mediator, in turn, provides an assembly surface for the binding of, first, TBP (or TFIID), then TFIIB, and then other components of the PIC, including Pol II. Mediator stabilizes the binding of Pol II and its associated transcription factors and greatly facilitates formation of the PIC. Complexity in these regulatory circuits is the rule rather than the exception, with multiple DNA-bound activators promoting transcription.

The script can change from one promoter to another. For example, many promoters have a different set of recognition sequences and may not have a TATA box, and in multicellular eukaryotes the subunit composition of factors such as TFIID can vary from one tissue to another. However, most promoters seem to require a precisely ordered assembly of components to initiate transcription. The assembly process is not always fast. For some genes it may take minutes; for certain genes of higher eukaryotes, the process can take days.

The Genes of Galactose Metabolism in Yeast Are Subject to Both Positive and Negative Regulation

Some of the general principles described above can be illustrated by one well-studied eukaryotic regulatory circuit (Fig. 28-32). The enzymes required for the importation and metabolism of galactose in yeast are encoded by genes scattered over several chromosomes (Table 28-3). Each of the GAL genes is transcribed separately, and yeast cells have no operons like those in bacteria. However, all the GAL genes have similar promoters and are regulated coordinately by a common set of proteins. The promoters for the GAL genes consist of the TATA box and Inr sequences, as well as an upstream activator sequence $({UAS}_{G})$ $left-parenthesis UAS Subscript upper G Baseline right-parenthesis$ recognized by the transcription activator Gal4 protein (Gal4p). Regulation of gene expression by galactose entails an interplay between Gal4p and two other proteins, Gal80p and Gal3p. Gal80p forms a complex with Gal4p, preventing Gal4p from functioning as an activator of the GAL promoters. When galactose is present, it binds Gal3p, which then interacts with the Gal80p-Gal4p complex and allows Gal4p to function as an activator at the GAL promoters. As the various galactose genes are induced and their products build up, Gal3p may be replaced with Gal1p (a galactokinase needed for galactose metabolism that also acts as a regulator) for sustained activation of the regulatory circuit.

A figure shows the regulation of transcription of G A L genes in yeast. — FIGURE 28-32 Regulation of transcription of *GAL* genes in yeast. Galactose imported into the yeast cell is converted to glucose 6-phosphate by a pathway involving five enzymes, whose genes are scattered over three chromosomes (see Table 28-3). Transcription of these genes is regulated by the combined actions of the proteins Gal4p, Gal80p, and Gal3p, with Gal4p playing the central role of transcription activator. The Gal4p-Gal80p complex is inactive. Binding of galactose to Gal3p leads to interaction of Gal3p with the Gal80p-Gal4p complex and activates Gal4p. The Gal4p subsequently recruits SAGA, Mediator, and TFIID to the galactose promoters, leading to recruitment of RNA polymerase II and initiation of transcription. Chromatin remodeling to allow transcription also requires a SWI/SNF complex.

A blue double stranded piece of D N A begins vertically at the upper left, then curves to run horizontally. At the top, just to the right of the center, there are two dark blue regions in the D N A. Two pink circles labeled G a l 80 p are above two purple ovals labeled G a l 4 p that each angle outward from the center toward their bottoms. Purple stalks from these ovals extend up, pass through thicker bars, and then cross so that the left-hand oval has a small oval that binds to the right side of the top dark blue bar and the right-hand oval has a small oval that binds to the left side of the bottom dark blue bar. At the nine o’clock to seven o’clock positions, four crescent-shaped H M G proteins are aligned along the outer piece of D N A with their openings facing inward. Just to the right of the center of the bottom strand, there is a purple region labeled T A T A, then a small orange box labeled l n r shortly to its right. A series of three arrows point downward. From top to bottom, they are accompanied by text reading G a l 3 p plus galactose, S A G A, and S W I / S N F. This yields a similar structure in which there are two blue spheres labeled G a l 3 p between the pink circles and the D N A above. The right-hand circle is bound to a hexagon labeled galactose. To the left, a large green oval labeled S A G A has ovals that extend out to the left and right, two ovals that angle to the upper left and upper right above, and a rounded protrusion to the lower right. The right-hand oval protrusion runs behind the left-hand pink sphere and purple oval to the right. The left side and top left ovals overlap a gray sphere labeled S W I / S N F that runs behind the D N A above and has two vertical ovals that runs across the front of the D N A, each extending out toward its side of the circle. A series of three arrows point downward. From top to bottom, they are accompanied by text reading, mediator, T F I I D, and Pol Roman numeral 2. The lower right portion of the D N A has now bent upward so that the T A T A box runs vertically through the left side of Pol Roman numeral 2. Pol Roman numeral 2 is visible behind other structures with its curved left and upper left side visible including a thin protrusion to the upper left that overlaps the mediator. The mediator is a blue structure made up of many ovals that runs horizontally from the lower left to upper right before bending toward the lower right near its end. It overlaps the rounded protrusion down from S A G A. A brown horizontal oval with a rounded cutout runs across the T A T A box and a green crescent is behind it that bends on top so that its upper left reaches the mediator. To the right of the T A T A box, there is a brown protrusion from the upper right of a rounded structure with its base in front of a darker brown oval that protrudes upward across the D N A in the middle of the polymerase. Behind it and extending to the right, there is a yellow rounded structure with a rounded cutout in the bottom labeled T F I I D that extends up behind the D N A. An irregular gray piece extends over the top of it and over the polymerase. It has a rounded square protrusion to the right and a long horizontal protrusion to the left that meets the top of a narrow curved purple structure that resembles an inverted “C”. The orange region labeled l n r is in the center of the Pol Roman numeral 2. All data are approximate.

TABLE 28-3 Genes of Galactose Metabolism in Yeast
Gene	Protein function	Chromosomal location	Protein size (number of residues)	Glucose	Glycerol	Galactose
				Relative protein expression in different carbon sources
Regulated genes
GAL1	Galactokinase	II	528	$-$ $minus$	$-$ $minus$	$+ + +$ $plus plus plus$
GAL2	Galactose permease	XII	574	$-$ $minus$	$-$ $minus$	$+ + +$ $plus plus plus$
PGM2	Phosphoglucomutase	XIII	569	$+$ $plus$	$+$ $plus$	$+ +$ $plus plus$
GAL7	Galactose 1-phosphate uridylyltransferase	II	365	$-$ $minus$	$-$ $minus$	$+ + +$ $plus plus plus$
GAL10	UDP-glucose 4-epimerase	II	699	$-$ $minus$	$-$ $minus$	$+ + +$ $plus plus plus$
MEL1	α-Galactosidase	II	453	$-$ $minus$	$+$ $plus$	$+ +$ $plus plus$
Regulatory genes
GAL3	Inducer	IV	520	$-$ $minus$	$+$ $plus$	$+ +$ $plus plus$
GAL4	Transcriptional activator	XVI	881	$+ / -$ $plus slash minus$	$+$ $plus$	$+$ $plus$
GAL80	Transcriptional inhibitor	XIII	435	$+$ $plus$	$+$ $plus$	$+ +$ $plus plus$
Information from R. Reece and A. Platt, Bioessays 19:1001, 1997.

Other protein complexes also have a role in activating transcription of the GAL genes. These include the SAGA complex for histone acetylation and chromatin remodeling, the SWI/SNF complex for chromatin remodeling, and Mediator. The Gal4 protein is responsible for recruitment of these additional factors needed for transcriptional activation. SAGA may be the first and primary recruitment target for Gal4p.

Glucose is the preferred carbon source for yeast, as it is for bacteria. When glucose is present, most of the GAL genes are repressed — whether galactose is present or not. The GAL regulatory system described above is effectively overridden by a complex catabolite repression system that includes several proteins (not depicted in Fig. 28-32).

Transcription Activators Have a Modular Structure

Transcription activators typically have a distinct structural domain for specific DNA binding and one or more additional domains for transcriptional activation or for interaction with other regulatory proteins. Interaction of two regulatory proteins is often mediated by domains containing leucine zippers (Fig. 28-15) or helix-loop-helix motifs (Fig. 28-16). We consider here three distinct types of structural domains used in activation by the transcription activators Gal4p, Sp1, and CTF1 (Fig. 28-33a).

A two-part figure shows typical transcription activators in part a and a chimeric protein that can activate transcription in part b. — FIGURE 28-33 Transcription activators. (a) Typical activators such as CTF1, Gal4p, and Sp1 have a DNA-binding domain and an activation domain. The nature of the activation domain is indicated by symbols: – – –, acidic; Q Q Q, glutamine-rich; P P P, proline-rich. These proteins generally activate transcription by interacting with coactivator complexes such as Mediator. Note that the binding sites illustrated here are not generally found together near a single gene. (b) A chimeric protein containing the DNA-binding domain of Sp1 and the activation domain of CTF1 activates transcription if a GC box is present.

Part a shows a blue double stranded D N A molecule that runs left, then curves down, then runs to the right before curving vertically upward so that the purple T A T A box runs vertically through the left side of Pol Roman numeral 2. Pol Roman numeral 2 is visible behind other structures with its curved left and upper left side visible including a thin protrusion to the upper left that overlaps the mediator. The mediator is a blue structure made up of many ovals that runs horizontally from the lower left to upper right before bending toward the lower right near its end. It overlaps the rounded protrusion down from S A G A. A brown horizontal oval with a rounded cutout runs across the T A T A box and a green crescent is behind it that bends on top so that its upper left reaches the mediator. To the right of the T A T A box, there is a brown protrusion from the upper right of a rounded structure with its base in front of a darker brown oval shown that is shown as a dashed outline that protrudes upward across the D N A in the middle of the polymerase. Behind it and extending to the right, there is a yellow rounded structure with a rounded cutout in the bottom labeled T F I I D that is shown with a dashed outline that extends up behind the D N A. An irregular gray piece labeled T F I I H extends over the top of it and over the polymerase. It has a rounded square protrusion to the right and a long horizontal protrusion to the left that meets the top of a narrow curved purple structure that resembles an inverted “C”. The orange region labeled l n r is in the center of the Pol Roman numeral 2. Near the center of the top horizontal strand of D N A, there are two dark segments of D N A labeled G C box with the base of a triangle labeled S p 1 extending downward. There are three letter “Q”s beneath this triangle with an arrow pointing to the mediator. To the left, there are two purple ovals labeled G a l 4 p that each angle toward the outside and meet at the top, where each has a stalk extending upward through a thicker region before they cross so that the small oval of the right-hand oval binds to the right of a blue segment on the top strand of D N A and the small oval of the left-hand oval binds to the left of the bottom strand. These blue bars are labeled U A S and one of the small ovals is labeled G A L. There are three minus symbols beneath the large purple ovals and an arrow points from these to the mediator. At the upper left corner, there are two dark segments of D N A with text above the outer segment reading C C A A T. A yellow half-circle labeled C T F I fits against the curve in the D N A with its rounded side facing outward, where there are three Ps with an arrow pointing to the mediator. At the lower left, there is a series of four brown crescent-shaped H M G proteins along the outer D N A strand that open inward. Part b shows a similar structure that is lacking the both the G a l 4 p and related structures and the yellow C T F I and related structures. Ut has a G C box with a triangle labeled S p 1 with its bottom vertex extending into a yellow half-circle with a slightly curved top labeled C T F I with three Ps beneath and an arrow pointing to the mediator.

Gal4p contains a zinc finger–like structure in its DNA-binding domain, near the amino terminus; this domain has six Cys residues that coordinate two ${Zn}^{2 +}$ $Zn Superscript 2 plus$ . The protein functions as a homodimer (with dimerization mediated by interactions between two coiled coils) and binds to ${UAS}_{G}$ $UAS Subscript upper G$ , a palindromic DNA sequence about 17 bp long. Gal4p has a separate activation domain with many acidic amino acid residues. Experiments that substitute a variety of different peptide sequences for the acidic activation domain of Gal4p suggest that the acidic nature of this domain is critical to its function, although its precise amino acid sequence can vary considerably.

Sp1 $(M_{r} 80,000)$ $left-parenthesis upper M Subscript r Baseline 80,000 right-parenthesis$ is a transcription activator for many genes in higher eukaryotes. Its DNA-binding site, the GC box (consensus sequence GGGCGG), is usually quite near the TATA box. The DNA-binding domain of the Sp1 protein is near its carboxyl terminus and contains three zinc fingers. Two other domains in Sp1 function in activation and are notable in that 25% of their amino acid residues are Gln. A wide variety of other activator proteins also have these glutamine-rich domains.

CTF1 (CCAAT-binding transcription factor 1) belongs to a family of transcription activators that bind a sequence called the CCAAT site (its consensus sequence is ${TGGN}_{6} GCCAA$ $TGGN Subscript 6 Baseline GCCAA$ , where N is any nucleotide). The DNA-binding domain of CTF1 contains many basic amino acid residues, and the binding region is probably arranged as an α helix. This protein has neither a helix-turn-helix motif nor a zinc finger motif; its DNA-binding mechanism is not yet clear. CTF1 has a proline-rich activation domain, with Pro accounting for more than 20% of the amino acid residues.

The discrete activation and DNA-binding domains of regulatory proteins often act completely independently, as has been demonstrated in “domain-swapping” experiments. Genetic engineering techniques (Chapter 9) can join the proline-rich activation domain of CTF1 to the DNA-binding domain of Sp1 to create a protein that, like intact Sp1, binds to GC boxes on the DNA and activates transcription at a nearby promoter (as in Fig. 28-33b). The DNA-binding domain of Gal4p has similarly been replaced experimentally with the DNA-binding domain of the E. coli LexA repressor (of the SOS response; Fig. 28-21). This chimeric protein neither binds at ${UAS}_{G}$ $UAS Subscript upper G$ nor activates the yeast GAL genes (as would intact Gal4p) unless the ${UAS}_{G}$ $UAS Subscript upper G$ sequence in the DNA is replaced by the LexA recognition site.

Eukaryotic Gene Expression Can Be Regulated by Intercellular and Intracellular Signals

The effects of steroid hormones (and of thyroid and retinoid hormones, which have a similar mode of action) provide additional well-studied examples of the modulation of eukaryotic regulatory proteins by direct interaction with molecular signals (see Fig. 12-34). Unlike other types of hormones, steroid hormones do not have to bind to plasma membrane receptors. Instead, they can interact with intracellular receptors that are transcription activators. Steroid hormones too hydrophobic to dissolve readily in the blood (estrogen, progesterone, and cortisol, for example) travel on specific carrier proteins from their point of release to their target tissues. In the target tissue, the hormone passes through the plasma membrane by simple diffusion. Once inside the cell, the hormone interacts with one of two types of steroid-binding nuclear receptor (Fig. 28-34). In both cases, the hormone-receptor complex acts by binding to highly specific DNA sequences called hormone response elements (HREs), thereby altering gene expression. Acting at these sites, the receptors act as transcription activators, recruiting coactivators and Pol II (plus its associated transcription factors) to trigger transcription of the gene.

A two-part figure shows two mechanisms of steroid hormone receptor function with part a showing the function of monomeric type Roman numeral 1 receptors in the cytoplasm and part by showing the function of type Roman numeral 2 receptors that are found in the nucleus. — FIGURE 28-34 Mechanisms of steroid hormone receptor function. There are two types of steroid-binding nuclear receptors. (a) Monomeric type I receptors (NR) are found in the cytoplasm, in a complex with the heat shock protein Hsp70. Receptors for estrogen, progesterone, androgens, and glucocorticoids are of this type. When the steroid hormone binds, the Hsp70 dissociates and the receptor dimerizes, exposing a nuclear localization signal. The dimeric receptor, with hormone bound, migrates to the nucleus, where it binds to a hormone response element (HRE) and acts as a transcription activator. The activity of the receptor can be repressed by binding to an lncRNA (such as GAS5), which competes directly with binding to the HRE. (b) Type II receptors, by contrast, are always in the nucleus, bound to an HRE in the DNA and to a corepressor that renders the receptor inactive. The thyroid hormone receptor (TR) is of this type. The hormone migrates through the cytoplasm and diffuses across the nuclear membrane. In the nucleus it binds to a heterodimer consisting of the thyroid hormone receptor and the retinoid X receptor (RXR). A conformation change leads to dissociation of the corepressor, and the receptor then functions as a transcription activator.

Part a shows a curved plasma membrane at the top with cytoplasm below. A curved nuclear envelope is shown halfway down and the region beneath the nuclear envelope is labeled nucleus. A hormone that resembles half of a rounded bar is shown above the membrane with an arrow pointing through the membrane to an N R – H s p 70 complex. This consists of a blue oval structure with a circular cutout at its upper left and a slight angled cutout below with a green rectangle to its left with a round protrusion at its lower left that extends into the blue structure. An arrow points down accompanied by an arrow that branches off to show the loss of H s p 70, the green structure. This yields the N R – hormone complex, which is shown as a blue structure with two rounded halves connected by a narrowing in the center with a cutout at the upper left that contains the red hormone. An arrow points down to show that this yields an N R dimer, which consists of two N R – hormone complexes joined together with opposite orientations, so that one has a hormone on the left and the other has the hormone on the right. An arrow points down through an opening in the nuclear envelope to show the N R dimer bound to a dark region labeled N R E on a blue horizontal strand of D N A. A green oval coactivator is angled to rest on top of the N R dimer and extend down to the D N A to its right. P o l Roman numeral 2 is shown to the right with the coactivator overlapping its left side. The D N A runs through the center of the polymerase, which has a curved base that extends up on the left to meet a smaller curved top portion. This produces a narrow opening to the upper left and a wider opening to the right. The polymerase has a thin protrusion tot het upper left. To the right of the polymerase, the D N A becomes yellow and is labeled target gene before becoming blue again. A short vertical line extends upward to a horizontal arrow pointing right to the green word ON. An arrow points upward from ON to a wavy green strand labeled m R N A, from which an arrow points through an opening in the nuclear envelope to m R N A in the cytoplasm. An arrow points from this m R N A to an irregularly shaped, roughly spherical gray proteins, from which an arrow points to text reading, change in cell function. Beneath the piece of D N A where the N R dimer is bound to H R E, an arrow points down accompanied by a curved line showing the addition of l n c R N A. L n c R N A is shown as short vertical green pieces of D N A on either side of a long vertical structure with a loop at the end. This yields an N R dimer with l n c R N A in the middle, with the loop appearing above the top of the spheres where the hormones are bound at the base extending out from the bottom. A horizontal blue strand of D N A is shown below where a blue region labeled H R E on the left and a yellow region labeled target gene on the right. Where the blue region meets the yellow region, a short vertical line points up to a horizontal arrow pointing right to the red word OFF. Part b shows a similar cell with a similar hormone outside of the plasma membrane. An arrow points down from the hormone through the cytoplasm into the nucleus. A blue strand of D N A has a dark blue region on the left labeled H R E and a yellow region to the right labeled target gene. A pink vertical bar labeled R X R is next to an orange bar labeled T R with both attached to H R E with a red oval corepressor across their tops. A short vertical line extends up from the boundary between the blue region and the yellow region to a horizontal arrow pointing right to the red word OFF. An arrow points downward and splits, with one half pointing to the corepressor and the other half pointing to D N A with R X R and T R each bound to a hormone on their outward-facing side. These two bars are angled outward from the center and joined to a green coactivator, which is roughly oval with a rounded part at the top where it meets the two bars. Pol Roman numeral 2 is to the right just before the yellow target gene. A short vertical line extends up from the boundary between the blue region and the yellow region and meets a horizontal line pointing right to the green word ON. An arrow points up to a wavy green strand of m R N A, from which an arrow points up through the nuclear membrane to a similar wavy green strand of m R N A in the cytoplasm. An arrow points up from this m R N A to an irregular, roughly spherical gray protein. An arrow points up from the protein to text reading, change in cell function.

The DNA sequences (HREs) to which hormone-receptor complexes bind are similar in length and arrangement for the various steroid hormones, but they differ in sequence. Each receptor has a consensus HRE sequence (Table 28-4) to which the hormone-receptor complex binds well, with each consensus consisting of two six-nucleotide sequences, either contiguous or separated by three nucleotides, in tandem or in a palindromic arrangement. The hormone receptors have a highly conserved DNA-binding domain with two zinc fingers (Fig. 28-35). The hormone-receptor complex binds to the DNA as a dimer, with the zinc finger domains of each monomer recognizing one of the six-nucleotide sequences. The ability of a given hormone to act through the hormone-receptor complex to alter the expression of a specific gene depends on the exact sequence of the HRE, its position relative to the gene, and the number of HREs associated with the gene.

A figure shows an example of a typical steroid hormone receptor with bolded residues indicated that are common to all steroid receptors. — FIGURE 28-35 Typical steroid hormone receptors. These receptor proteins have a binding site for the hormone, a DNA-binding domain, and a region that activates transcription of the regulated gene. The highly conserved DNA-binding domain has two zinc fingers. The sequence shown here is that for the estrogen receptor, but the residues in bold type are common to all steroid hormone receptors.

A horizontal strand has N H 3 on the left with a positive charge on N connected to a short blue region connected to a green piece labeled transcription activation (variable sequence and length) that is connected to a red region labeled D N A binding (66 to 68 residues, highly conserved). This red region is connected to a short blue region connected to a long brown region labeled hormone binding (variable sequence and length) that is connected to a short blue region that ends with a bond to C O O minus. A close-up of the D N A-binding region shows a structure with zinc fingers. It begins at the lower left as M K E T R Y bonded to bolded C that has a red dative bond to a central purple sphere labeled Z n and that is bonded to A to the upper left bonded to V above bonded to bolded C in a yellow circle to the upper right that is labeled 10 and connected by a red dative bond to Z n. This C is bonded to N above further bonded to D Y A S G and then Y at the top before bending down again with H Y G labeled 20 V W S bonded to bolded C in a yellow circle connected by a red dative bond to Z n and bonded to bolded E to the lower right bonded to bolded G below bonded to bolded C in a yellow circle to the lower right that has a dative bond to Z n, forming a circular structure around Z n. This C is bonded to K below that begins a horizontal sequence of A F labeled 30 F K R S I Q G H N D labeled 40 Y M. This M is bonded to bolded C in a yellow circle above connected by a red dative bond to Z N and bonded to a curved series of bolded residues to the left of P A T N Q that then bond back to bolded C in a red circle above with a dative bond to Z n and also bonded to T above labeled 50 further bonded to a loop with I D K, N at the top, then R R K S and then bolded C in a yellow circle with a dative bond to Z N and bonded to Q labeled 60 to the lower right bonded to A below and further bonded to bolded C in a yellow circle with a dative bond to Z n and bonded to R below that begins a horizontal sequence of L R K C Y E V labeled 70 G M M K G G I R K D labeled 80 R R G G.

TABLE 28-4 Hormone Response Elements (HREs) Bound by Steroid-Type Hormone Receptors
Receptor	HRE consensus sequence bound^a
Androgen	$GG (A/T) {ACAN}_{2} TGTTCT$ $GG left-parenthesis upper A slash upper T right-parenthesis ACAN Subscript 2 Baseline TGTTCT$
Glucocorticoid	${GGTACAN}_{3} TGTTCT$ $GGTACAN Subscript 3 Baseline TGTTCT$
Retinoic acid (some)	${AGGTCAN}_{5} AGGTCA$ $AGGTCAN Subscript 5 Baseline AGGTCA$
Vitamin D	${AGGTCAN}_{3} AGGTCA$ $AGGTCAN Subscript 3 Baseline AGGTCA$
Thyroid hormone	${AGGTCAN}_{3} AGGTCA$ $AGGTCAN Subscript 3 Baseline AGGTCA$
RX^b	$AGGTCANAGGTCANAG GTCANAGGTCA$ $AGGTCANAGGTCANAG GTCANAGGTCA$
^aN represents any nucleotide. ^bForms a dimer with the retinoic acid receptor or vitamin D receptor.

The ligand-binding region of the receptor protein — always at the carboxyl terminus — is specific to the particular receptor. For example, in the ligand-binding region, the glucocorticoid receptor is only 30% similar to the estrogen receptor and 17% similar to the thyroid hormone receptor. The size of the ligand-binding region varies dramatically; in the vitamin D receptor it has only 25 amino acid residues, whereas in the mineralocorticoid receptor it has 603 residues. Mutations that change one amino acid residue in these regions can result in loss of responsiveness to a specific hormone. Some humans unable to respond to cortisol, testosterone, vitamin D, or thyroxine have mutations of this type.

The lncRNAs introduce another dimension to regulation by hormone receptors. An lncRNA called GAS5 (growth arrest specific 5) inhibits transcriptional activation by the glucocorticoid receptor by directly competing with DNA for receptor binding. GAS5 also inhibits activity of the closely related androgen, progesterone, and mineralocorticoid receptors. In addition, GAS5 interacts with and sequesters an miRNA called miR-21, which interacts with and inhibits the activity of some regulatory proteins that act as tumor suppressors. Expression of GAS5 is suppressed in a wide range of tumors, resulting in increased expression of steroid hormones, higher levels of active miR-21, and faster tumor growth. Low GAS5 levels thus correlate with worsened outcomes for cancer patients, making this lncRNA a subject of intense ongoing investigation.

Some hormone receptors, including the human progesterone receptor, activate transcription with the aid of a different lncRNA of ~700 nucleotides that acts as a coactivator — steroid receptor RNA activator (SRA). SRA is part of a ribonucleoprotein complex, but it is the RNA component that is required for transcription coactivation. The detailed set of interactions between SRA and other components of the regulatory systems for these genes remains to be worked out.

Regulation Can Result from Phosphorylation of Nuclear Transcription Factors

We noted in Chapter 12 that the effects of insulin on gene expression are mediated by a series of steps leading ultimately to the activation of a protein kinase in the nucleus that phosphorylates specific DNA-binding proteins, thereby altering their ability to act as transcription factors (see Fig. 12-22). This general mechanism mediates the effects of many nonsteroid hormones. For example, the β-adrenergic pathway that leads to elevated levels of cytosolic cAMP, which acts as a second messenger in both eukaryotes and bacteria (Fig. 28-18), also affects the transcription of a set of genes, each of which is located near a specific DNA sequence called a cAMP response element (CRE). The catalytic subunit of protein kinase A, released when cAMP levels rise (see Fig. 12-6), enters the nucleus and phosphorylates a nuclear protein, the CRE-binding protein (CREB). When phosphorylated, CREB binds to CREs near certain genes and acts as a transcription factor, turning on expression of these genes.

Many Eukaryotic mRNAs Are Subject to Translational Repression

Regulation at the level of translation assumes a much more prominent role in eukaryotes than in bacteria and is observed in a range of cellular situations. In contrast to the tight coupling of transcription and translation in bacteria, the transcripts generated in a eukaryotic nucleus must be processed and transported to the cytoplasm before translation. This can impose a significant delay on the appearance of a protein. When a rapid increase in protein production is needed, a translationally repressed mRNA already in the cytoplasm can be activated for translation without delay. Translational regulation may play an especially important role in regulating certain very long eukaryotic genes (a few are measured in the millions of base pairs), for which transcription and mRNA processing can require many hours. Some genes are regulated at both the transcriptional and translational stages, with the latter playing a role in the fine-tuning of cellular protein levels. In some non-nucleated cells, such as reticulocytes (immature erythrocytes), transcriptional control is entirely unavailable and translational control of stored mRNAs becomes essential. As described below, translational controls can also have spatial significance during development, when the regulated translation of prepositioned mRNAs creates a local gradient of the protein product.

Eukaryotes have at least four main mechanisms of translational regulation:

Translation initiation factors are subject to phosphorylation by protein kinases. The phosphorylated forms are often less active and cause a general depression of translation in the cell.
Some proteins bind directly to mRNA and act as translational repressors, many of them binding at specific sites in the $3^{'}$ $3 prime$ untranslated region ( $3^{'}$ $3 prime$ UTR). So positioned, these proteins interact with other translation initiation factors bound to the mRNA, or with the 40S ribosomal subunit, to prevent translation initiation (Fig. 28-36).
Binding proteins, present in eukaryotes from yeast to mammals, disrupt the interaction between eIF4E and eIF4G (see Fig. 27-27). The mammalian versions are known as 4E-BPs (eIF4E binding proteins). When cell growth is slow, these proteins limit translation by binding to the site on eIF4E that normally interacts with eIF4G. When cell growth resumes or increases in response to growth factors or other stimuli, the binding proteins are inactivated by protein kinase–dependent phosphorylation.
RNA-mediated regulation of gene expression often occurs at the level of translational repression, often by the binding of ncRNAs to mRNAs.

A figure shows the translational regulation of eukaryotic m R N A. — FIGURE 28-36 Translational regulation of eukaryotic mRNA. One of the most important mechanisms for translational regulation in eukaryotes is the binding of translational repressors (RNA-binding proteins) to specific sites in the $3'$ $3 prime$ untranslated region $(3' UTR)$ $left-parenthesis 3 prime UTR right-parenthesis$ of the mRNA. These proteins interact with eukaryotic initiation factors or with the ribosome to prevent or slow translation.

FIGURE 28-36 Translational regulation of eukaryotic mRNA. One of the most important mechanisms for translational regulation in eukaryotes is the binding of translational repressors (RNA-binding proteins) to specific sites in the $3'$ $3 prime$ untranslated region $(3' UTR)$ $left-parenthesis 3 prime UTR right-parenthesis$ of the mRNA. These proteins interact with eukaryotic initiation factors or with the ribosome to prevent or slow translation.

A purple, roughly rectangular 40 S ribosomal subunit is wider at the top than at the bottom. At the lower left, it is bound to a circle labeled e l F 3. At its lower left, there is a structure with vertical pieces at the left and right and a horizontal piece in the middle. The left-hand vertical piece is e l F 4 E, the horizontal piece is e l F 4 A, and the right-hand vertical piece is e l F 4 G. The entire structure is e l F 4 F. A strand of m R N A begins with a red 5 prime cap that runs behind e l F 4 E before becoming green and running horizontally behind e k F 4 G. It continues horizontally out of the right side of the ribosomal subunit, then curves around in a circle. At the two o’clock position, it has a dark green segment labeled A U G. It becomes tan in color at the ten o’clock position and curves up to end above its starting point. Between the ten and eleven o’clock positions, it has two red ovals labeled translational repressors that extend outward from the circle. This tan region is labeled 3 prime untranslated region (U T R). It ends as it reaches the upper part of the ribosome with a string of 9 As before reaching the 3 prime end just above e l F 4 G. There are five yellow oblongs labeled poly (A) binding proteins (P A B) between the As and the strand of m R N A below. All data are approximate.

The variety of translational regulation mechanisms provides flexibility, allowing focused repression of a few mRNAs or global regulation of all cellular translation.

Translational regulation has been particularly well studied in reticulocytes. One such mechanism in these cells involves eIF2, the initiation factor that binds to the initiator tRNA and conveys it to the ribosome; when Met-tRNA has bound to the P site, the factor eIF2B binds to eIF2, recycling it with the aid of GTP binding and hydrolysis. The maturation of reticulocytes includes destruction of the cell nucleus, leaving behind a plasma membrane packed with hemoglobin. Messenger RNAs deposited in the cytoplasm before the loss of the nucleus allow for the replacement of hemoglobin. When reticulocytes become deficient in iron or heme, the translation of globin mRNAs is repressed. A protein kinase called HCR (hemin-controlled repressor) is then activated, catalyzing the phosphorylation of eIF2. When phosphorylated, eIF2 forms a stable complex with eIF2B that sequesters the eIF2, making it unavailable for participation in translation. In this way, the reticulocyte coordinates the synthesis of globin with the availability of heme.

Posttranscriptional Gene Silencing Is Mediated by RNA Interference

In higher eukaryotes, including nematodes, fruit flies, plants, and mammals, microRNAs (miRNAs) mediate the silencing of many genes. In a phenomenon first described and explained by Craig Mello and Andrew Fire, the RNAs function by interacting with mRNAs, often in the $3' UTR$ $3 prime UTR$ , resulting in either degradation of the mRNA or inhibition of translation. In either case, the mRNA, and thus the gene that produces it, is silenced. This form of gene regulation controls developmental timing in at least some organisms. It is also used as a mechanism to protect against invading RNA viruses (particularly important in plants, which lack an immune system) and to control the activity of transposons. In addition, small RNA molecules may play a critical (as yet undefined) role in the formation of heterochromatin.

Many miRNAs are present only transiently during development, and these are sometimes referred to as small temporal RNAs (stRNAs). Thousands of different miRNAs have been identified in higher eukaryotes, and they may affect the regulation of a third of mammalian genes. They are transcribed as precursor RNAs ~70 nucleotides long, with internally complementary sequences that form hairpinlike structures. Details of the pathway for processing of miRNAs were described in Fig. 26-26). The precursors are cleaved by endonucleases such as Drosha and Dicer to form short duplexes of 20 to 25 nucleotides. One strand of the processed miRNA is transferred to the target mRNA (or to a viral or transposon RNA), leading to inhibition of translation or degradation of the mRNA (Fig. 28-37a). Some miRNAs bind to and affect a single mRNA and thus affect expression of only one gene. Others interact with multiple mRNAs and form the mechanistic core of regulons that coordinate the expression of multiple genes.

A figure two-part figure shows gene silencing by R N A interference with part a showing the use of small temporal R N As generated by Dicer-mediated cleavage and part b showing how double-stranded R N A s can interact with a target and function as Dicer substrates. — FIGURE 28-37 Gene silencing by RNA interference. (a) Small temporal RNAs (stRNAs, a class of miRNAs) are generated by Dicer-mediated cleavage of longer precursors that fold to create duplex regions. The stRNAs then bind to mRNAs, leading to degradation of mRNA or inhibition of translation. (b) Double-stranded RNAs designed to interact with a particular target and to function as Dicer substrates can be constructed and introduced into a cell. Dicer processes the duplex RNAs into small interfering RNAs (siRNAs), which interact with the target mRNA. Again, either the mRNA is degraded or translation is inhibited.

Part a shows a precursor as a horizontal piece of double-stranded D N A with two small loops and a large loop at the right end. An arrow points downward accompanied by blue text reading Dicer. This yields s t R N A, which is similar except that the ends have been cut off and the large loop on the right end is no longer present. An arrow points downward to show that the bottom half of the molecule has been removed, leaving only the top half. An arrow points down to show this single-stranded m R N A bound to a strand of R N A labeled silenced m R N A. This is a long strand of green m R N A that ends with A A A (A) subscript italicized n end italics end subscript. Two arrows point downward, one pointing to degradation and one pointing to translation inhibition. Part b shows a horizontal piece of double-stranded D N A with two wavy vertical lines across the middle indicating a cut. This is labeled duplex R N A. An arrow points downward accompanied by blue text reading Dicer. This yields a piece of D N A that is shorter and that has overhanging edges, with the top strand overhanging on the left and the bottom strand overhanging on the right. This is labeled s i R N A. AN arrow points down to show that the bottom strand has been removed, leaving only the top strand. An arrow points down to show that this binds to the right half of the same R N A that the stand from part a bound to. Two arrows point downward, one pointing to degradation and one pointing to translation inhibition.

This gene regulation mechanism has an interesting and very useful practical side. If an investigator introduces into an organism a duplex RNA molecule corresponding in sequence to virtually any mRNA, Dicer cleaves the duplex into short segments, called small interfering RNAs (siRNAs). These bind to the mRNA and silence it (Fig. 28-37b). The process is known as RNA interference (RNAi). In plants, almost any gene can be effectively shut down in this way. Nematodes can readily ingest entire functional RNAs, and simply introducing the duplex RNA into the worm’s diet produces very effective suppression of the target gene. The technique is an important tool in the ongoing efforts to study gene function, because it can disrupt gene function without creating a mutant organism. The procedure can be applied to humans as well. Laboratory-produced siRNAs have been used to block HIV and poliovirus infections in cultured human cells for a week or so at a time. The wider application of RNAi-based pharmaceuticals was initially stymied by the difficulty inherent in delivering RNAi molecules to their required target, given the many nucleases that degrade RNA in human tissues. With recent advances in delivery methods, there are now more than a dozen RNAi pharmaceuticals in advanced clinical trials to treat a range of conditions, from familial amyloidotic polyneuropathy to viral infections and cancer.

RNA-Mediated Regulation of Gene Expression Takes Many Forms in Eukaryotes

All RNAs (regardless of their length) that do not encode proteins, including rRNAs and tRNAs, come under the general designation of ncRNAs. Mammalian genomes encode more ncRNAs than coding mRNAs. The ncRNAs in eukaryotes include miRNAs, described above; snRNAs, involved in RNA splicing (see Fig. 26-16); snoRNAs, involved in rRNA modification (see Fig. 26-24); and lncRNAs, already encountered in this chapter. Not surprisingly, additional functional classes of ncRNAs are still being discovered. Here we describe a few more examples of ncRNAs that participate in gene regulation, which are designated lncRNAs when their length exceeds 200 nucleotides.

Heat shock factor 1 (HSF1) is an activator protein that, in nonstressed cells, exists as a monomer bound by the chaperone Hsp90. Under stress conditions, HSF1 is released from Hsp90 and trimerizes. The HSF1 trimer binds to DNA and activates transcription of genes encoding products required to deal with the stress. An lncRNA called HSR1 (heat shock RNA 1; ∼600 nucleotides) stimulates HSF1 trimerization and DNA binding. HSR1 does not act alone; it functions in a complex with the translation elongation factor eEF1A.

Additional RNAs affect transcription in a variety of ways. A 331 nucleotide lncRNA called 7SK, abundant in mammals, binds to the Pol II transcription elongation factor pTEFb (see Table 26-2) and represses transcript elongation. The ncRNA B2 (∼178 nucleotides) binds directly to Pol II during heat shock and represses transcription. The B2-bound Pol II assembles into stable PICs, but transcription is blocked. The mechanism that allows HSF1-responsive genes to be expressed in the presence of B2 remains to be worked out.

The recognized roles of ncRNAs in gene expression and in many other cellular processes are rapidly expanding. At the same time, the study of the biochemistry of gene regulation is becoming much less protein-centric.

Development Is Controlled by Cascades of Regulatory Proteins

For sheer complexity and intricacy of coordination, the patterns of gene regulation that bring about development of a zygote into a multicellular animal or plant have no peer. Development requires transitions in morphology and protein composition that depend on tightly coordinated changes in expression of the genome. More genes are expressed during early development than in any other part of the life cycle. For example, in the sea urchin, an oocyte has about 18,500 different mRNAs, compared with about 6,000 different mRNAs in the cells of a typical differentiated tissue. The mRNAs in the oocyte give rise to a cascade of events that regulate the expression of many genes across both space and time.

Several organisms have emerged as important model systems for the study of development, because they are easy to maintain in a laboratory and have relatively short generation times. These include nematodes, fruit flies, zebra fish, mice, and the plant Arabidopsis. Here, we provide a brief discussion of the development of fruit flies. Our understanding of the molecular events during development of Drosophila melanogaster is particularly well advanced and can be used to illustrate patterns and principles of general significance.

The life cycle of the fruit fly includes complete metamorphosis during its progression from an embryo to an adult (Fig. 28-38). Among the most important characteristics of the embryo are its polarity (the anterior and posterior parts of the animal are readily distinguished, as are its dorsal and ventral surfaces) and its metamerism (the embryo body is made up of serially repeating segments, each with characteristic features). During development, these segments become organized into a head, thorax, and abdomen. Each segment of the adult thorax has a different set of appendages. Development of this complex pattern is under genetic control, and a variety of pattern-regulating genes have been discovered that greatly affect the organization of the body.

A figure shows the life cycle of the fruit fly, italicized Drosophila melanogaster end italics. — FIGURE 28-38 Life cycle of the fruit fly *Drosophila melanogaster*. *Drosophila* undergoes a complete metamorphosis, which means that the adult insect is radically different in form from its immature stages, a transformation that requires extensive alterations during development. By the late embryonic stage, segments have formed, each containing specialized structures from which the various appendages and other features of the adult fly will develop.

The figure begins with a tiny oval oocyte at the lower left. An arrow labeled fertilization points up to a larger, similar structure labeled zygote and day 0. An arrow labeled embryonic development points up clockwise to a similar but larger structure. It is a yellow, slightly curved oval labeled early embryo – no segments. An arrow points right to show a large oval structure at the twelve o’clock position labeled late embryo – segmented. This structure has a rounded left end with a flat structure to its left and several wide vertical structures below. Lines connecting this to an adult fly below show that this will become the head. The rest of the embryo is divided into segments along the sides with a flat region on top. From left to right, the segments are labeled T subscript 1 end subscript, T subscript 2 end subscript, T subscript 3 end subscript, A subscript 1 end subscript, A subscript 2 end subscript, A subscript 3 end subscript, A subscript 4 end subscript, A subscript 5 end subscript, A subscript 6 end subscript, and A subscript 7 end subscript. Lines extending down to an adult fly below show that the three segments labeled T will develop into the thorax, which connects the head to the abdomen, and that all of the A segments will develop into the abdomen. An arrow labeled Day 1 hatching points to a long green structure labeled larva. Accompanying text reads, three larval stages, separated by molts. An arrow labeled day 5 pupation points down to a brown structure with many visible segments and two protrusions from the left side. This is labeled pupa. An arrow labeled metamorphosis points down to day 9 adult, where an adult fruit fly is shown. The fly has red eyes, a brown thorax, an abdomen with thicker brown regions and narrower black and white bands, clear wings, and six legs. The fly is 2 mm in length. All data are approximate.

The Drosophila egg, along with 15 nurse cells, is surrounded by a layer of follicle cells (Fig. 28-39). As the egg cell forms (before fertilization), mRNAs and proteins originating in the nurse and follicle cells are deposited in the egg cell, where some play a critical role in development. Once a fertilized egg is laid, its nucleus divides and the nuclear descendants continue to divide in synchrony every 6 to 10 min. Plasma membranes are not formed around the nuclei, which are distributed within the egg cytoplasm, forming a syncytium. Between the eighth and eleventh rounds of nuclear division, the nuclei migrate to the outer layer of the egg, forming a monolayer of nuclei surrounding the common yolk-rich cytoplasm; this is the syncytial blastoderm. After a few additional divisions, membrane invaginations surround the nuclei to create a layer of cells that form the cellular blastoderm. At this stage, the mitotic cycles in the various cells lose their synchrony. The developmental fate of the cells is determined by the mRNAs and proteins originally deposited in the egg by the nurse and follicle cells.

A figure shows early development in italicized Drosophila end italics, beginning with an oocyte and nurse cells and ending with the formation of cellular blastoderm. — FIGURE 28-39 Early development in *Drosophila*. During development of the egg, maternal mRNAs and proteins are deposited in the developing oocyte (unfertilized egg cell) by nurse cells and follicle cells. After fertilization, the nuclei of the egg divide in synchrony within the common cytoplasm (syncytium), then migrate to the periphery. Membrane invaginations surround the nuclei to create a monolayer of cells at the periphery; this is the cellular blastoderm stage. During the early nuclear divisions, several nuclei at the far posterior become pole cells, which later become the germ-line cells.

At the top of the figure, a circular structure has a ring of follicle cells around the outside and is divided into six approximately even pieces inside. There is a circular piece in the center and more irregular pieces evenly divided in a circle around it. Most of these pieces are orange and are labeled nurse cells. One is white and is labeled oocyte. The structure is labeled egg chamber. An arrow points down to a series of developmental stages with the anterior side on the left and the posterior side on the right. The first arrow points down from the oocyte to an oval structure with ten nurse cells along the left side and center and the oocyte on the right. There are follicle cells visible around the right half containing the oocyte. A circle is visible in this half. The nurse cells in the center are smaller than those to the left and follicle cells can be seen above and below them. An arrow points down to a similar structure in which the left side of the structure has bent upward to a triangular end that contains the nurse cells. Most of the rest of the structure, which is almost oval except where it is flattened at the end with the nurse cells, has follicle cells around the outside, a clear tan area inside of the follicle cells, and a single circle visible in the clear area. An arrow points down to the next stage, which is a mature oocyte. It is almost completely oval except for a small protrusion to the upper left. A circle is visible in the tan area. The outer boundary is smooth instead of having many lines separating follicle cells. An arrow pointing downward is labeled fertilization. This yields a zygote that looks similar to the mature oocyte except that it is thicker and the circle is in the middle. An arrow pointing downward is labeled nuclear divisions. It yields a similar structure labeled syncytium in which there are many circles visible within the tan central area. An arrow pointing downward is labeled nuclear migration. It yields a similar structure labeled syncytial blastoderm in which there are fewer circles in the center. Four white pole cells are lined up at the upper right corner. The outer boundary now has a line of dots along in inner surface with a dashed line inside of that. An arrow pointing downward is labeled membrane invagination. This yields cellular blastoderm. The structure is similar, but there are fewer circles in the central region and the layer of dots has become a layer of cells.

Proteins that, through changes in local concentration or activity, cause the surrounding tissue to take up a particular shape or structure are sometimes referred to as morphogens; they are the products of pattern-regulating genes. As defined by Christiane Nüsslein-Volhard, Edward B. Lewis, and Eric F. Wieschaus, three major classes of pattern-regulating genes — maternal, segmentation, and homeotic genes — function in successive stages of development to specify the basic features of the Drosophila embryo body. Maternal genes are expressed in the unfertilized egg, and the resulting maternal mRNAs remain dormant until fertilization. These provide most of the proteins needed in very early development, until the cellular blastoderm is formed. Some of the proteins encoded by maternal mRNAs direct the spatial organization of the developing embryo at early stages, establishing its polarity. Segmentation genes, transcribed after fertilization, direct the formation of the proper number of body segments. At least three subclasses of segmentation genes act at successive stages: gap genes divide the developing embryo into several broad regions; pair-rule genes, together with segment polarity genes, define 14 stripes that become the 14 segments of a normal embryo. Homeotic genes are expressed still later; they specify which organs and appendages will develop in particular body segments.

If all cells divided to produce two identical daughter cells, multicellular organisms would never be more than a ball of identical cells. A key event in very early development is establishment of mRNA and protein gradients along the body axes, producing asymmetric cell divisions and different cell fates. Some maternal mRNAs have protein products that diffuse through the cytoplasm to create an asymmetric distribution in the egg. Different cells in the cellular blastoderm therefore inherit different amounts of these proteins, setting the cells on different developmental paths. An example is the bicoid gene. The bicoid gene product is a major anterior morphogen. The mRNA from the bicoid gene is synthesized by nurse cells and deposited in the unfertilized egg near its anterior pole. Translated soon after fertilization, the Bicoid protein diffuses through the cell to create, by the seventh nuclear division, a concentration gradient radiating out from the anterior pole (Fig. 28-40). The Bicoid protein contains a homeodomain (p. 1062), encoded by a gene sequence motif called a homeobox and found in many proteins involved in regulating development. Bicoid is multifunctional — a transcription factor that activates the expression of several segmentation genes and also a translational repressor that inactivates certain mRNAs. The amount of Bicoid protein in various parts of the embryo increases or decreases the expression of other genes in a threshold-dependent manner. As its concentration varies along its gradient, interactions of the bicoid gene product with proteins and RNAs encoded by the nanos, pumilio, caudal, hunchback, and other regulatory genes also vary to produce different effects along the axis of the developing organism. This results in different developmental fates of cells in the blastoderm, depending on their location.

A two-part figure shows the distribution of a material gene product in an italicized Drosophila end italics egg with part a showing normal expression of the relevant gene and part b showing what happens when the gene is not expressed by the mother. — FIGURE 28-40 Distribution of a maternal gene product in a *Drosophila* egg. (a) Micrograph of an immunologically stained egg (top), showing distribution of the *bicoid* (*bcd*) gene product. The graph shows stain intensity along the length of the egg. This distribution is essential for normal development of the anterior structures in the larva (bottom). (b) If the *bcd* gene is not expressed by the mother $(b c d^{-} / b c d^{-} mutant)$ $left-parenthesis b c d Superscript minus Baseline slash b c d Superscript minus Baseline mutant right-parenthesis$ and thus no *bicoid* mRNA is deposited in the egg, the resulting larva has two posteriors (and soon dies). [Republished with permission of Elsevier, from “The bicoid protein determines position in the *Drosophila* embryo in a concentration-dependent manner” by Wolfgang Driever and Christiane Nüsslein-Volhard, *Cell* 54:83–93, July 1, 1988; permission conveyed through Copyright Clearance Center, Inc.]

FIGURE 28-40 Distribution of a maternal gene product in a *Drosophila* egg. (a) Micrograph of an immunologically stained egg (top), showing distribution of the *bicoid* (*bcd*) gene product. The graph shows stain intensity along the length of the egg. This distribution is essential for normal development of the anterior structures in the larva (bottom). (b) If the *bcd* gene is not expressed by the mother $(b c d^{-} / b c d^{-} mutant)$ $left-parenthesis b c d Superscript minus Baseline slash b c d Superscript minus Baseline mutant right-parenthesis$ and thus no *bicoid* mRNA is deposited in the egg, the resulting larva has two posteriors (and soon dies). [Republished with permission of Elsevier, from “The bicoid protein determines position in the *Drosophila* embryo in a concentration-dependent manner” by Wolfgang Driever and Christiane Nüsslein-Volhard, *Cell* 54:83–93, July 1, 1988; permission conveyed through Copyright Clearance Center, Inc.]

Part a shows a micrograph of an immunologically stained egg at the top that is mostly gray but dark toward the upper left, indicating that the italicized bicoid (bcd) end italics gene product is mostly distributed along the anterior of the larva. A graph plots relative concentration of bicoid protein against distance from anterior end. The horizontal axis is labeled distance from anterior end (percent of egg length) and ranges from 0 to 100, labeled in increments of 50. The vertical axis is labeled relative concentration of Bicoid (B c d) protein and ranges from 0 to 100. The graph is labeled normal. The curve begins at (0, 100) and curves down quickly, then begins to level off at (50, 25) to end at (100, 0). The normal larva is shown at the bottom with clearly distinguishable anterior and posterior ends. It is divided into even segments and has a narrower left side and a more rounded right side. Part b shows a micrograph of an immunologically stained italicized bcd minus / bcd minus end italics egg. It is relatively evenly colored throughout. A graph plots relative concentration of bicoid protein against distance from anterior end. The horizontal axis is labeled distance from anterior end (percent of egg length) and ranges from 0 to 100, labeled in increments of 50. The vertical axis is labeled relative concentration of Bicoid (B c d) protein and ranges from 0 to 100. The graph is labeled normal. The curve begins at (0, 0) and runs horizontally across the graph. The picture of a larva is labeled double-posterior larva. It is rounded on both ends. All data are approximate.

Humans do not resemble fruit flies, but the genes and mechanisms involved in development are nevertheless highly conserved. This can be seen in the gene clusters encoding the homeotic or Hox genes, the latter term derived from homeobox. Drosophila has one such cluster, while humans have four (Fig. 28-41), with the genes within the clusters remarkably similar from nematodes to humans.

A two-part figure shows the italicized Hox end italics genes in a fruit fly in part a and compares italicized Hox end italics gene clusters in fruit flies and humans in part b. — FIGURE 28-41 The *Hox* gene clusters and their effects on development. (a) Each *Hox* gene in the fruit fly is responsible for the development of structures in a defined part of the body and is expressed in defined regions of the embryo, as labeled. (b) *Drosophila* has one *Hox* gene cluster; the human genome has four. Many of these genes are highly conserved in multicellular animals. Evolutionary relationships, as indicated by sequence alignments, between genes in the fruit fly *Hox* gene cluster and those in the mammalian *Hox* gene clusters are shown by dashed lines. Similar relationships among the four sets of mammalian *Hox* genes are indicated by vertical alignment. [(a) Information from F. R. Turner, University of Indiana, Department of Biology.]

Part a shows a fruit fly embryo above an adult fruit fly. The larva has a narrow, rounded piece at the head end with a flattened piece to the upper right and across the top. Part of the embryo are colored to match the adult fly below. At the lower right, there is a small light purple piece labeled italicized l a b end italics next to a dark purple piece labeled italicized p b end italics next to a bright green piece that extends over halfway up the side of the fly labeled italicized D f d. These are all labeled head. Light purple italicized l a b end italics becomes the lower mouthparts, bright purple italicized p b end italics becomes the lower part of the head, and bright green italicized D f d end italics becomes the upper part of the head. Next, there is a green piece labeled italicized S e r end italics that is wider at the bottom and narrow on top, then a thicker vertical yellow part labeled italicized A n t p end italics that also runs along the bottom, then a vertical orange piece labeled U b x end italics that is to the right of italicized A n t p end italics and also runs along the bottom just above it, but not quite as far along the bottom of the fly. These are all labeled thorax. Light green italicized S e r end italics becomes the front of the thorax and front leg, yellow italicized A n t p end italics becomes the central part of the thorax and center leg, and italicized U b x end italics becomes the rear of the thorax and hind leg. A red region labeled italicized a d b – A end italics represents a large portion of the abdomen extending to a small, rounded tan piece at the right end labeled italicized a d b – B. These are labeled abdomen. Italicized a d b – A end italics becomes most of the abdomen and italicized a d b – B end italics becomes the posterior end of the abdomen. Part b compares italicized Drosophila end italics and human italicized Hox end italics gene clusters. Each is shown as a horizontal bar. The bar for italicized Drosophila end italics is labeled H O M – C and is blue with the following regions shown from left to right in between blue pieces: italicized l a b end italics in light purple, italicized p b end italics in dark purple, italicized D f d end italics in bright green, italicized S e r end italics in green, italicized A n t p end italics in yellow, two diagonal lines to indicate a break, italicized U b x end italics in orange, italicized a b d – A end italics in red, and italicized a bd b – B end italics in tan. There are four human sets of genes shown. Dashed lines connect human genes to italicized end italics Drosophila genes. Italicized Hox-A end italics: From left to right, the genes are A 1 connected to italicized l a b end italics; A 2 and A 3 both connected to italicized p b end italics; A 4 connected to italicized D f d; A 5 connected to italicized S e r; A 6 connected to italicized A n t p end italics; A 7 connected to italicized U b x end italics; no gene connected to italicized a b d – A end italics; A 9, A 10, A 11, and A 13 all connected to italicized a d b – B end italics. For the remaining three human gene clusters, genes with the same number are aligned with the other human gene clusters. Italicized Hox – B end italics: B 1, B 2, B 3, B 4, B 5, B 6, B 7, B 8, and B 9. Italicized Hox – C end italics: C 4, C 5, C 6, C 8, C 9, C 10, C 11, C 12, C 13. Italicized Hox – D end italics: D 1, D 3, D 4, D 8, D 9, D 10, D 11, D 12, and D 13.

The many regulatory genes in these three classes direct the development of an adult fly, with a head, thorax, and abdomen, with the proper number of segments, and with the correct appendages on each segment. Although embryogenesis takes about a day to complete, all these genes are activated during the first four hours. Some mRNAs and proteins are present for only a few minutes at specific points during this period. Some of the genes code for transcription factors that affect the expression of other genes in a kind of developmental cascade. Regulation at the level of translation also occurs, and many of the regulatory genes encode translational repressors, most of which bind to the $3^{'}$ $3 prime$ UTR of the mRNA (Fig. 28-36). Because many mRNAs are deposited in the egg long before their translation is required, translational repression provides an especially important avenue for regulation in developmental pathways.

Many of the principles of development outlined above apply to other eukaryotes, from nematodes to humans. Some of the regulatory proteins are conserved. For example, the products of the homeobox-containing genes HOXA7 in mouse and antennapedia in fruit fly differ in only one amino acid residue. Of course, although the molecular regulatory mechanisms may be similar, many of the ultimate developmental events are not conserved (humans do not have wings or antennae). The different outcomes are brought about by differences in the downstream target genes controlled by the Hox genes. The discovery of structural determinants with identifiable molecular functions is the first step in understanding the molecular events underlying development. As more genes and their protein products are discovered, the biochemical side of this vast puzzle will be elucidated in increasingly rich detail.

Stem Cells Have Developmental Potential That Can Be Controlled

If we can understand development, and the mechanisms of gene regulation behind it, we can control it. An adult human has many different types of tissues. Many of the cells are terminally differentiated and no longer divide. If an organ malfunctions due to disease, or a limb is lost in an accident, the tissues are not readily replaced. Most cells, because of the regulatory processes in place, or even because of the loss of some or all of the genomic DNA, are not easily reprogrammed. Medical science has made organ transplants possible, but organ donors are a limited resource and organ rejection remains a major medical problem. If humans could regenerate their own organs or limbs or nervous tissue, rejection would no longer be an issue. Cures for kidney failure or neurodegenerative disorders could become reality.

The key to tissue regeneration lies in stem cells — cells that have retained the capacity to differentiate into various tissues. In humans, after an egg is fertilized, the first few cell divisions create a ball of totipotent cells, called the morula, that have the capacity to differentiate individually into any tissue or even into a complete organism (Fig. 28-42). Continued cell division produces a hollow ball, the blastocyst. The outer cells of the blastocyst eventually form the placenta. The inner layers form the germ layers of the developing fetus — the ectoderm, mesoderm, and endoderm. These cells are pluripotent: they can give rise to cells of all three germ layers and can differentiate into many types of tissues. However, they cannot differentiate into a complete organism. Some of these cells are unipotent: they can develop into only one type of cell and/or tissue. It is the pluripotent cells of the blastocyst, the embryonic stem cells, that are currently used in embryonic stem cell research.

A figure compares totipotent and pluripotent stem cells. — FIGURE 28-42 Totipotent and pluripotent stem cells. Cells at the morula stage are totipotent and have the capacity to differentiate into a complete organism. The source of pluripotent embryonic stem cells is the cells in the cavity of the blastocyst. Pluripotent cells give rise to many tissue types but cannot form complete organisms.

An oval peach-colored oocyte is shown with a darker circle inside representing the nucleus. A purple sperm is shown above with an oval head touching the oocyte and a tail extending up above. An arrow points downward to a morula consisting of four similar cells. This is labeled totipotent. An arrow points down to a blastocyst, shown as an outer ring of cells with a clump of similar peach cells at the bottom center. An arrow points down to show three of these small, similar cells. These are labeled pluripotent. Three arrows point down from these cells. The left-hand arrow points down to a human with the circulatory system highlighted with the heading, circulatory system. The central arrow points down to a human with the nervous system highlighted with the heading, nervous system. The right-hand arrow points down to a human with the immune system, including the lymphatic vessels, highlighted with the heading, immune system.

Stem cells have two functions: to replenish themselves and, at the same time, provide cells that can differentiate. These tasks are accomplished in multiple ways (Fig. 28-43a). All or parts of the stem cell population can, in principle, be involved in replenishment, differentiation, or both.

A two-part figure shows different cell division patterns of stem cells in part a and how cells exchange molecular signals to help some cells maintain stem cell properties in part b. — FIGURE 28-43 Stem cell proliferation versus differentiation and development. Stem cells must strike a balance between self-renewal and differentiation. (a) Some possible cell division patterns that allow the replenishment of stem cells and production of some differentiated cells. Each cell may produce one stem cell and one differentiated cell, or two differentiated cells, or two stem cells in defined parts of the tissue or culture. Or a gradient of growth conditions can be established, with cell fates differing from one end of the gradient to the other. (b) Establishing a developmental niche through stem cell contact with a cell or group of cells. Molecular signals provided by the niche cells (in this case, in plants, a distal tip cell) help orient the mitotic spindle for stem cell division and ensure that one daughter cell retains stem cell properties.

Part a shows three similar peach cells each with a visible purple nucleus. An arrow labeled self-renewal points left to three similar cells. An arrow labeled differentiation points right to three similarly-shaped blue cells. Three peach cells are shown below with each one producing two daughter cells, one identical to itself and one that is blue. Accompanying text reads, equal. Three peach cells are shown below with the left-hand cell producing two identical peach cells and the right-hand two cells each producing two blue cells. This is labeled sectional. Three peach cells are shown below with the left-hand cell producing two peach cells, the center cell producing one peach cell and one blue cell, and the right-hand cell producing two blue cells. This is labeled gradient. Part b shows a dividing stem cell as two similar peach cells that are flattened where they meet in the center as they separate. To the left, a purple distal tip cell is shown with three dashed arrows pointing into the left-hand stem cell. An arrow points downward and branches into two halves. The left-hand product is a distal tip cell adjacent to a stem cell like the ones shown above. The right-hand product is a blue differentiated cell.

Other types of stem cells can potentially be used for medical benefit. In the adult organism, adult stem cells, as products of additional differentiation, have a more limited potential for further development than do embryonic stem cells. For example, the hematopoietic stem cells of bone marrow can give rise to many types of blood cells and also to cells with the capacity to regenerate bone. They are referred to as multipotent. However, these cells cannot differentiate into a liver or kidney or neuron. Adult stem cells are often said to have a niche, a microenvironment that promotes stem cell maintenance while allowing differentiation of some daughter cells as replacements for cells in the tissue they serve (Fig. 28-43b). Hematopoietic stem cells in the bone marrow occupy a niche in which signaling from neighboring cells and other cues maintain the stem cell lineage. At the same time, some daughter cells differentiate to provide needed blood cells. Understanding the niche in which stem cells operate, and the signals the niche provides, is essential in efforts to harness the potential of stem cells for tissue regeneration. The identification and culturing of pluripotent stem cells from human blastocysts was reported by James Thomson and colleagues in 1998. This advance led to the long-term availability of established cell lines for research.

All stem cells present problems for human medical applications. Adult stem cells have a limited capacity to regenerate tissues, are generally present in small numbers, and are hard to isolate from an adult human. Embryonic stem cells have much greater differentiation potential and can be cultured to generate large numbers of cells, but their use is accompanied by ethical concerns related to the necessary destruction of human embryos. Identifying a source of plentiful and medically useful stem cells that does not raise such concerns remains a major goal of medical research.

Our ability to culture stem cells (i.e., maintain them in an undifferentiated state), and to manipulate them to grow and differentiate into particular tissues, is very much a function of our understanding of developmental biology.

Thus far, mouse and human embryonic stem cells have been used for most research. Although both types of stem cells are pluripotent, they require very different culture conditions, optimized to allow cell division indefinitely without differentiation. Mouse embryonic stem cells are grown on a layer of gelatin and require the presence of leukemia inhibitory factor (LIF). Human embryonic stem cells are grown on a feeder layer of mouse embryonic fibroblasts and require basic fibroblast growth factor (bFGF, or FGF2). The use of a feeder cell layer implies that the mouse cells are providing a diffusible product or some surface signal, not yet known, that is needed by human stem cells to either promote cell division or prevent differentiation.

A significant advance, reported in 2007, centers on success in reversing differentiation. In effect, skin cells — first from mice, then from humans — have been reprogrammed to take on the characteristics of pluripotent stem cells. The reprogramming involves manipulations to get the cells to express at least four transcription factors, Oct4, Sox2, Nanog, and Lin28, all of which are known to help maintain the stem cell–like state. Gradual improvements in this technology may make the harvesting of embryonic stem cells unnecessary and provide a source of stem cells that is genetically matched to a prospective patient.

Our discussion of developmental regulation and stem cells brings us full circle, back to a biochemical beginning. Evolution appropriately provides the first and last words of this book. If evolution is to generate the kind of changes in an organism that would render it a different species, it is the developmental program that must be affected. Developmental and evolutionary processes are closely allied, each informing the other (Box 28-1). The continuing study of biochemistry has everything to do with enriching the future of humanity and understanding our origins.

BOX 28-1

Of Fins, Wings, Beaks, and Things

South America has several species of seed-eating finches, commonly called grassquits. About 3 million years ago, a small group of grassquits, of a single species, took flight from the continent’s Pacific coast. Perhaps driven by a storm, they lost sight of land and traveled nearly 1,000 km. Small birds such as these might easily have perished on such a journey, but the smallest of chances brought this group to a newly formed volcanic island in an archipelago later to be known as the Galápagos. It was a virgin landscape with untapped plant and insect food sources, and the newly arrived finches survived. Over the years, new islands formed and were colonized by new plants and insects — and by the finches. The birds exploited the new resources on the islands, and groups of birds gradually specialized and diverged into new species. By the time Charles Darwin stepped onto the islands in 1835, many different finch species were to be found on the various islands of the archipelago, feeding on seeds, fruits, insects, pollen, or even blood.

The diversity of living creatures was a source of wonder for humans long before scientists sought to understand its origins. The extraordinary insight handed down to us by Darwin, inspired in part by his encounter with the Galápagos finches, provided a broad explanation for the existence of organisms with a vast array of appearances and characteristics. It also gave rise to many questions about the mechanisms underlying evolution. Answers to those questions have started to appear, first through the study of genomes and nucleic acid metabolism in the last half of the twentieth century, and more recently through an emerging field nicknamed evo-devo — a blend of evolutionary and developmental biology.

In its modern synthesis, the theory of evolution has two main elements: mutations in a population generate genetic diversity; natural selection then acts on this diversity to favor individuals with more useful genomic tools and to disfavor others. Mutations occur at significant rates in every individual’s genome, in every cell (see Section 8.3). Advantageous mutations in single-celled organisms or in the germ line of multicellular organisms can be inherited, and they are more likely to be inherited (that is, passed on to greater numbers of offspring) if they confer an advantage. It is a straightforward scheme. But many have wondered whether it is enough to explain, say, the many different beak shapes in the Galápagos finches or the diversity of size and shape among mammals. Until recent decades, there were several widely held assumptions about the evolutionary process: that many mutations and new genes would be needed to bring about a new physical structure, that more-complex organisms would have larger genomes, and that very different species would have few genes in common. All of these assumptions were wrong.

Modern genomics has revealed that the human genome contains fewer genes than expected — not many more than the fruit fly genome and fewer than some amphibian genomes. The genomes of every mammal, from mouse to human, are surprisingly similar in the number, types, and chromosomal arrangement of genes. Meanwhile, evo-devo is telling us how complex and very different creatures can evolve within these genomic realities.

In the late nineteenth century, English biologist William Bateson studied animals with homeotic mutations — creatures with body parts growing in the wrong location. Bateson used his observations to challenge the Darwinian notion that evolutionary change would have to be gradual. Recent studies of the genes that control organismal development have put an exclamation point on Bateson’s ideas. Subtle changes in regulatory patterns during development, reflecting just one or a few mutations, can result in startling physical changes and fuel surprisingly rapid evolution.

The Galápagos finches provide a wonderful example of the link between evolution and development. There are at least 14 (some specialists list 15) species of Galápagos finches, distinguished in large measure by their beak structure. The ground finches, for example, have broad, heavy beaks adapted to crushing large, hard seeds. The cactus finches have longer, slender beaks ideal for probing cactus fruits and flowers (Fig. 1). Clifford Tabin and colleagues carefully surveyed a set of genes expressed during avian craniofacial development. They identified a single gene, Bmp4, whose expression level correlated with formation of the more robust beaks of the ground finches. More-robust beaks were also formed in chicken embryos when high levels of Bmp4 were artificially expressed in the appropriate tissues, confirming the importance of Bmp4. In a similar study, the formation of long, slender beaks was linked to the expression of calmodulin (see Fig. 12-17) in particular tissues at appropriate developmental stages. Thus, major changes in the shape and function of the beak can be brought about by subtle changes in the expression of just two genes involved in developmental regulation. Very few mutations are required, and the needed mutations affect regulation. New genes are not required.

FIGURE 1 Evolution of new beak structures to exploit new food sources. In the Galápagos finches, the different beak structures of the cactus finch and the large ground finch, which feed on different, specialized food sources, were produced to a large extent by a few mutations that altered the timing and level of expression of just two genes: those encoding calmodulin (CaM) and Bmp4. [Information from A. Abzhanov et al., Nature 442:563, 2006, Fig. 4.]

On the left, an upper beak is shown along axes to show how it can vary in depth, width, and length. The beak is flat towards the front right and points toward the rear left. The length runs toward the tip, the depth runs from bottom to top, and the width runs from one side to the other. At the top, a bird labeled ancestor is shown with an intermediate beak. Accompanying text reads, mixed diet of seeds and insects, low [C a M]: short beak; Low [B m p 4]: low beak depth/ width. Arrows point to the lower left and right. To the lower left, a cactus finch is shown with a long beak. Accompanying text reads, probing cactus flowers/fruit, high [C a M]: elongated beak; low [B m p 4]: low beak depth/width. A double-headed arrow runs the length of the beak. To the lower right, a large ground finch is shown with a short, thick beak. Accompanying text reads, crushing hard/large seeds, low [C a M]: short beak; early/high [B m p 4]: high beak depth/width. A double-headed arrow runs vertically to show the height of the beak.

The system of regulatory genes that guides development is remarkably conserved among all vertebrates. Elevated expression of Bmp4 in the right tissue at the right time leads to more-robust jaw parts in zebrafish. The same gene plays a key role in tooth development in mammals. The development of eyes is triggered by the expression of a single gene, Pax6, in fruit flies and in mammals. The mouse Pax6 gene will trigger the development of fruit fly eyes in the fruit fly, and the fruit fly Pax6 gene will trigger the development of mouse eyes in the mouse. In each organism, these genes are part of the much larger regulatory cascade that ultimately creates the correct structures in the correct locations in each organism. The cascade is ancient; for example, the Hox genes (described in the text) have been part of the developmental program of multicellular eukaryotes for more than 500 million years. Subtle changes in the cascade can have large effects on development, and thus on the ultimate appearance, of the organism. These same subtle changes can fuel remarkably rapid evolution. For example, the 400 to 500 described species of cichlids (spiny-finned fish) in Lake Malawi and Lake Victoria on the African continent are all derived from one or a few populations that colonized each lake in the past 100,000 to 200,000 years. The Galápagos finches simply followed a path of evolution and change that living creatures have been traveling for billions of years.

SUMMARY 28.3 Regulation of Gene Expression in Eukaryotes

In eukaryotes, large changes in chromatin structure accompany the expression of a gene. Transcriptionally inactive heterochromatin is opened up by chromatin remodeling proteins. These eject, replace, or modify nucleosomes to allow other proteins, mainly RNA polymerase components and regulators, to access sites required to initiate transcription.
In eukaryotes, positive regulation is more common than negative regulation.
Promoters for Pol II typically have a TATA box and Inr sequence, as well as multiple binding sites for transcription activators. The latter sites, sometimes located hundreds or thousands of base pairs away from the TATA box, are called upstream activator sequences in yeast and enhancers in higher eukaryotes. To regulate transcriptional activity generally requires large complexes of proteins. These include basal transcription factors, activators, coactivators, architectural regulators, and the enzymes that modify and remodel chromatin. The effects of transcription activators on Pol II are facilitated by coactivator protein complexes such as Mediator.
The well-studied yeast genes involved in galactose metabolism provide examples of both positive and negative regulation in a eukaryote.
The modular structures of the activators have distinct activation and DNA-binding domains.
Hormones affect the regulation of gene expression in one of two ways. Steroid hormones interact directly with intracellular receptors that are DNA-binding regulatory proteins; binding of the hormone has either positive or negative effects on the transcription of targeted genes.
Nonsteroid hormones bind to cell surface receptors, triggering a signaling pathway that can lead to phosphorylation of a regulatory protein, affecting its activity.
Translational regulation is particularly important in eukaryotes. Modulating the translation of an mRNA stored in the cytoplasm affords a more rapid response to cellular challenges than de novo assembly of transcription complexes and mRNA synthesis.
MicroRNAs (miRNAs) are involved in gene silencing during development and as an antiviral defense. The pathway for processing miRNAs from larger precursors has been harnessed by researchers to develop the gene-silencing technology called RNA interference, or RNAi.
Regulation mediated by ncRNAs plays an important role in eukaryotic gene expression, with known mechanisms including interactions with proteins, mRNA, and other ncRNAs.
Development of a multicellular organism presents the most complex regulatory challenge. The fate of cells in the early embryo is determined by establishment of anterior-posterior and dorsal-ventral gradients of proteins that act as transcription activators or translational repressors, regulating the genes required for development of structures appropriate to a particular part of the organism. Sets of regulatory genes operate in temporal and spatial succession, transforming given areas of an egg cell into predictable structures in the adult organism.
The differentiation of stem cells into functional tissues can be controlled by extracellular signals and conditions.