4.3 Protein Tertiary and Quaternary Structures in Chapter 4 The Three-Dimensional Structure of Proteins

4.3 Protein Tertiary and Quaternary Structures

The overall three-dimensional arrangement of all atoms in a protein is referred to as the protein’s tertiary structure. Whereas the term “secondary structure” refers to the spatial arrangement of amino acid residues that are adjacent in a segment of a polypeptide, tertiary structure includes longer-range aspects of amino acid sequence. Amino acids that are far apart in the polypeptide sequence and are in different types of secondary structure may interact within the completely folded structure of a protein. Interacting segments of polypeptide chains are held in their characteristic tertiary positions by several kinds of weak interactions (and sometimes by covalent bonds such as disulfide cross-links) between the segments. Some proteins contain two or more separate polypeptide chains, or subunits, which may be identical or different. The arrangement of these protein subunits in three-dimensional complexes constitutes quaternary structure.

In considering these higher levels of structure, it is useful to designate the major groups into which many proteins can be classified: fibrous proteins, with polypeptide chains arranged in long strands or sheets; globular proteins, with polypeptide chains folded into a spherical or globular shape; membrane proteins, with polypeptide chains embedded in hydrophobic lipid membranes; and intrinsically disordered proteins, with polypeptide chains lacking stable tertiary structures. We focus here on fibrous, globular, and intrinsically disordered proteins; membrane proteins are discussed in Chapter 11. These three groups are structurally distinct. Fibrous proteins usually consist of a single type of secondary structure, and their tertiary structure is relatively simple. Globular proteins often contain several types of secondary structure. Intrinsically disordered proteins can lack secondary structure entirely. The groups also differ functionally: the structures that provide support, shape, and external protection to vertebrates are made of fibrous proteins. Most enzymes are globular proteins, whereas regulatory proteins can be globular, disordered, or contain both globular and disordered segments.

Fibrous Proteins Are Adapted for a Structural Function

α-Keratin, collagen, and silk fibroin nicely illustrate the relationship between protein structure and biological function (Table 4-2). Fibrous proteins share properties that give strength and/or flexibility to the structures in which they occur. In each case, the fundamental structural unit is a simple repeating element of secondary structure. All fibrous proteins are insoluble in water, a property conferred by a high concentration of hydrophobic amino acid residues both in the interior of the protein and on its surface. These hydrophobic surfaces are largely buried, as many similar polypeptide chains are packed together to form elaborate supramolecular complexes. The underlying structural simplicity of fibrous proteins makes them particularly useful for illustrating some of the fundamental principles of protein structure discussed previously.

TABLE 4-2 Secondary Structures and Properties of Some Fibrous Proteins
Structure	Characteristics	Examples of occurrence
α Helix, cross-linked by disulfide bonds	Tough, insoluble protective structures of varying hardness and flexibility	α-Keratin of hair, feathers, nails
β Conformation	Soft, flexible filaments	Silk fibroin
Collagen triple helix	High tensile strength, without stretch	Collagen of tendons, bone matrix

α-Keratin The α-keratins have evolved for strength. Found only in mammals, these proteins constitute almost the entire dry weight of hair, wool, nails, claws, quills, horns, and hooves and much of the outer layer of skin. The α-keratins are part of a broader family of proteins called intermediate filament (IF) proteins. Other IF proteins are found in the cytoskeletons of animal cells. All IF proteins have a structural function and share the structural features exemplified by the α-keratins.

The α-keratin helix is a right-handed α helix, the same helix found in many other proteins. Francis Crick and Linus Pauling, in the early 1950s, independently suggested that the α helices of keratin were arranged as a coiled coil. Two strands of α-keratin, oriented in parallel (with their amino termini at the same end), are wrapped about each other to form a supertwisted coiled coil. The supertwisting amplifies the strength of the overall structure, just as strands are twisted to make a strong rope (Fig. 4-10). The twisting of the axis of an α helix to form a coiled coil explains the discrepancy between the 5.4 Å per turn predicted for an α helix by Pauling and Corey and the 5.15 to 5.2 Å repeating structure observed in the x-ray diffraction of hair (see end-of-chapter problem 2). The helical path of the supertwists is left-handed, opposite in sense to the α helix. The surfaces where the two α helices touch are made up of hydrophobic amino acid residues, their R groups meshed together in a regular interlocking pattern. This permits a close packing of the polypeptide chains within the left-handed supertwist. Not surprisingly, α-keratin is rich in the hydrophobic residues Ala, Val, Leu, Ile, Met, and Phe.

A two-part figure, a and b, shows the structure of hair with a close-up of the alpha keratin helices in part a and a cross-section of a hair in part b. — FIGURE 4-10 Structure of hair. (a) Hair α-keratin is an elongated α helix with somewhat thicker elements near the amino and carboxyl termini. Pairs of these helices are interwound in a left-handed sense to form two-chain coiled coils. These then combine in higher-order structures called protofilaments and protofibrils. About four protofibrils—32 strands of α-keratin in all—combine to form an intermediate filament. The individual two-chain coiled coils in the various substructures also seem to be interwound, but the handedness of the interwinding and other structural details are unknown. (b) A hair is an array of many α-keratin filaments, made up of the substructures shown in (a). [(a) Information from PDB ID 3TNU, C. H. Lee et al., *Nature Struct. Mol. Biol.* 19:707, 2012.]

Part a shows two rough strands wrapped around each other with helices visible inside. Beneath this, a protofilament is shown as two horizontal strands of identical length, each made up of two strands twisted together. The top strand has a short piece on the left. There is a small space, then a much longer similar piece that has two intertwined strands that each end in an oblong structure on the left end. The bottom horizontal line is similar but inverted, with the longer strand on the left and the oblong structures at its right. The width of the two horizontal lines together is 20 to 30 angstroms. A profibril is shown as four horizontal lines. The top line has three pieces each made up of two intertwined strands with each strand ending in two oblongs on the right. The second line has three pieces of the same length and structure, except that the oblongs are on the left. The pieces in the top two lines are lined up so that there are three sets of two strands, one above the other in opposite orientations. The third and fourth lines are similar but shifted so that there are two complete sets of strands with partial strands to their left and right sides. Part b shows a thick cylinder made up of many bundles inside. Lines are visible running along the length of its outer surface and a single thin line is wrapped along its width near the place that it is cut to show the interior. There are many tightly packed bundles inside that all contain many small circles labeled, cells. A close-up of an intermediate filament extending from the cross-sectional end shows that it contains multiple smaller bundles, each containing two circles. A close-up of one of these bundles, labeled protofibril, shows that the smaller circles inside are cut protofilaments. An extended piece of protofilament shows that it contains many linear structures and a close-up of one of these shows that it is two intertwined strands labeled, two-chain coiled coil. Within each, there is a ribbon structure helix labeled, alpha helix.

An individual polypeptide in the α-keratin coiled coil has a relatively simple tertiary structure, dominated by an α-helical secondary structure with its helical axis twisted in a left-handed superhelix. The intertwining of the two α-helical polypeptides is an example of quaternary structure. Coiled coils of this type are common structural elements in filamentous proteins and in the muscle protein myosin (see Fig. 5-26). The quaternary structure of α-keratin can be quite complex. Many coiled coils can be assembled into large supramolecular complexes, such as the arrangement of α-keratin that forms the intermediate filament of hair (Fig. 4-10b).

The strength of fibrous proteins is enhanced by covalent cross-links between polypeptide chains in the multihelical “ropes” and between adjacent chains in a supramolecular assembly. In α-keratins, the cross-links stabilizing quaternary structure are disulfide bonds. In the hardest and toughest α-keratins, such as those of rhinoceros horn, up to 18% of the residues are cysteines involved in disulfide bonds.

Collagen Like the α-keratins, collagen has evolved to provide strength. It is found in connective tissue such as tendons, cartilage, the organic matrix of bone, and the cornea of the eye. In fact, collagen is the most abundant protein in mammals, usually comprising 25% to 35% of total protein content. The collagen helix is a unique secondary structure, quite distinct from the α helix. It is left-handed and has three amino acid residues per turn (Fig. 4-11 and Table 4-1). Collagen is also a coiled coil, but one with distinct tertiary and quaternary structures: three separate polypeptides, called α chains (not to be confused with α helices), are twisted about each other. The superhelical twisting is right-handed in collagen, opposite in sense to the left-handed helix of the α chains.

A two-part figure, a and b, shows the structure of collagen as an illustration of the helix with a close-up of its residues in part a and as a superhelix viewed from one end in part b. — FIGURE 4-11 Structure of collagen. (a) The α chain of collagen has a repeating secondary structure unique to this protein. The repeating tripeptide sequence Gly–X–Y, where X is often Pro and Y is often 4-Hyp, adopts a left-handed helical structure with three residues per turn. Three of these helices (shown here in white, blue, and purple) wrap around one another with a right-handed twist. (b) The three-stranded collagen superhelix shown from one end, in a ball-and-stick representation. Gly residues are shown in red. Glycine, because of its small size, is required at the tight junction where the three chains are in contact. The balls in this illustration do not represent the van der Waals radii of the individual atoms. The center of the three-stranded superhelix is not hollow, as it appears here, but very tightly packed. [Data from PDB ID 1CGD, J. Bella et al., *Structure* 3:893, 1995.]

There are many types of vertebrate collagen. Typically, they contain about 35% Gly, 11% Ala, and 21% Pro and 4-Hyp (4-hydroxyproline, an uncommon amino acid; see Fig. 3-8a). The food product gelatin is derived from collagen. It has little nutritional value as a protein, because collagen is extremely low in many amino acids that are essential in the human diet. The unusual amino acid content of collagen is related to structural constraints unique to the collagen helix. The amino acid sequence in collagen is generally a repeating tripeptide unit, Gly–X–Y, where X is often Pro and Y is often 4-Hyp. Only Gly residues can be accommodated at the very tight junctions between the individual α chains (Fig. 4-11b). The Pro and 4-Hyp residues permit the sharp twisting of the collagen helix. The amino acid sequence and the supertwisted quaternary structure of collagen allow a very close packing of its three polypeptides. 4-Hydroxyproline has a special role in the structure of collagen — and in human history (Box 4-2).

Box 4-2 MEDICINE

Why Sailors, Explorers, and College Students Should Eat Their Fresh Fruits and Vegetables

… from this misfortune, together with the unhealthiness of the country, where there never falls a drop of rain, we were stricken with the “camp-sickness,” which was such that the flesh of our limbs all shrivelled up, and the skin of our legs became all blotched with black, mouldy patches, like an old jack-boot, and proud flesh came upon the gums of those of us who had the sickness, and none escaped from this sickness save through the jaws of death. The signal was this: when the nose began to bleed, then death was at hand.

—The Memoirs of the Lord of Joinville, ca. 1300*

This excerpt describes the plight of Louis IX’s scurvy-weakened army before it was destroyed by the Egyptians toward the end of the Seventh Crusade (1248–1254). What was the nature of the malady afflicting these thirteenth-century soldiers?

Scurvy is caused by lack of vitamin C, or ascorbic acid (ascorbate). Vitamin C is required for, among other things, the hydroxylation of proline and lysine in collagen; scurvy is a deficiency disease characterized by general degeneration of connective tissue. Manifestations of advanced scurvy include numerous small hemorrhages caused by fragile blood vessels; tooth loss, poor wound healing, and the reopening of old wounds; bone pain and degeneration; and eventually heart failure. Milder cases of vitamin C deficiency are accompanied by fatigue, irritability, and an increased severity of respiratory tract infections. Most animals make large amounts of vitamin C, converting glucose to ascorbate in four enzymatic steps. But in the course of evolution, humans and some other animals—gorillas, guinea pigs, and fruit bats—have lost the last enzyme in this pathway and must obtain ascorbate in their diet. Vitamin C is available in a wide range of fruits and vegetables. Until 1800, however, it was often absent in the dried foods and other food supplies stored for winter or for extended travel.

Scurvy came to wide public notice during the European voyages of discovery from 1500 to 1800. In fact, during the first circumnavigation of the globe (1519–1522) by Ferdinand Magellan, more than 80% of his crew were lost to scurvy. Winter outbreaks of scurvy in Europe were gradually eliminated in the nineteenth century as the cultivation of the potato, introduced from South America, became widespread.

In 1747, James Lind, a Scottish surgeon in the Royal Navy, carried out the first controlled clinical study in recorded history. During an extended voyage on the 50-gun warship HMS Salisbury, Lind selected 12 sailors suffering from scurvy and separated them into groups of two. All 12 received the same diet, except that each group was given a different remedy for scurvy from among those recommended at the time. The sailors given lemons and oranges recovered and returned to duty. Lind’s Treatise on the Scurvy was published in 1753, but inaction persisted in the Royal Navy for another 40 years. In 1795, the British admiralty finally mandated a ration of concentrated lime or lemon juice for all British sailors (hence the name “limeys”). Scurvy continued to be a problem in some other parts of the world until 1932, when Hungarian scientist Albert Szent-Györgyi, and W. A. Waugh and C. G. King at the University of Pittsburgh, isolated and synthesized ascorbic acid.

So why is ascorbate so necessary to good health? Of particular interest to us here is its role in the formation of collagen. As noted in the text, collagen is constructed of the repeating tripeptide unit Gly–X–Y, where X and Y are generally Pro or 4-Hyp—the proline derivative l-hydroxyproline, which plays an essential role in the folding of collagen and in maintaining its structure. The proline ring is normally found as a mixture of two puckered conformations, called $mathml alt text278$ $upper C Subscript gamma$ -endo and $mathml alt text279$ $upper C Subscript gamma$ -exo (Fig. 1). The collagen helix structure requires the Pro/4-Hyp residue in the Y positions to be in the $mathml alt text280$ $upper C Subscript gamma$ -exo conformation, and it is this conformation that is enforced by the hydroxyl substitution at C-4 in 4-Hyp. In the absence of vitamin C, cells cannot hydroxylate the Pro at the Y positions. This leads to collagen instability and the connective tissue problems seen in scurvy.

FIGURE 1 The $mathml alt text281$ $upper C Subscript gamma$ -endo conformation of proline and the $mathml alt text282$ $upper C Subscript gamma$ -exo conformation of 4-hydroxyproline.

C subscript gamma endo proline has N at the lower right corner of a bent ring. There is a wavy strand extending down from N. N is attached to the C at the end of a skeletal chain by a solid wedge bond. The chain shows a double bonded O and then becomes wavy. There is a thick bond from N to the left side corner of the ring in the front left, then a solid wedge bond up to the top vertex of the ring, then a line down to the vertex at the rear left and a line to the C that is also joined to the chain to the right and to N of the ring to the left. C subscript gamma exo 4-hydroxyproline has a similar structure, except that the ring is bent down instead of up and the far left vertex of the ring is bonded to O H below the right. The first bond from N to the C to its left is the same, but the next bond goes down instead of up to the C that is bonded to O H below. Therefore, the next bond goes up instead of down.

The hydroxylation of specific Pro residues in procollagen, the precursor of collagen, requires the action of the α-ketoglutarate-dependent enzyme prolyl 4-hydroxylase. In the normal prolyl 4-hydroxylase reaction, one molecule of α-ketoglutarate and one of $mathml alt text285$ $upper O Subscript 2$ bind to the enzyme. The α-ketoglutarate is oxidatively decarboxylated to form $mathml alt text287$ $CO Subscript 2$ and succinate. The remaining oxygen atom is then used to hydroxylate an appropriate Pro residue in procollagen. No ascorbate is needed in this reaction. However, prolyl 4-hydroxylase also catalyzes an oxidative decarboxylation of α-ketoglutarate that is not coupled to proline hydroxylation. During this reaction, the $mathml alt text289$ $Fe Superscript 2 plus$ becomes oxidized, inactivating the enzyme and preventing the proline hydroxylation. Ascorbate is needed to reduce the iron and restore enzyme activity so that proline hydroxylation of procollagen can continue.

Scurvy remains a problem today, not only in remote regions where nutritious food is scarce but, surprisingly, also among young adults with poor eating habits in large cities. A 2009 study of more than 1,100 men and women between the ages of 20 and 29 in Toronto, Canada, found that 1 in 7 young adults had vitamin C deficiency due to unmet dietary needs. Moreover, lower vitamin C levels were associated with higher measures of obesity and blood pressure and fewer servings a day of healthy foods. Just like eighteenth-century sailors, twenty-first-century young adults need to eat their fruits and vegetables!

The tight wrapping of the α chains in the collagen triple helix provides tensile strength greater than that of a steel wire of equal cross section. Collagen fibrils (Fig. 4-12) are supramolecular assemblies consisting of triple-helical collagen molecules (sometimes referred to as tropocollagen molecules) associated in a variety of ways to provide different degrees of tensile strength. The α chains of collagen molecules and the collagen molecules of fibrils are cross-linked by unusual types of covalent bonds involving Lys, HyLys (5-hydroxylysine), or His residues that are present at a few of the X and Y positions. These links create uncommon amino acid residues such as dehydrohydroxylysinonorleucine. The increasingly rigid and brittle character of aging connective tissue results from accumulated covalent cross-links in collagen fibrils.

A figure uses a micrograph and illustration to show how collagen fibers interact and form crosslinks to produce the overall structure. — FIGURE 4-12 Structure of collagen fibrils. Collagen $mathml alt text292$ $left-parenthesis upper M Subscript r Baseline 300,000 right-parenthesis$ is a rod-shaped molecule, about 3,000 Å long and only 15 Å thick. Its three helically intertwined α chains may have different sequences; each chain has about 1,000 amino acid residues. Collagen fibrils are made up of collagen molecules aligned in a staggered fashion and cross-linked for strength. The specific alignment and degree of cross-linking vary with the tissue and produce characteristic cross-striations in an electron micrograph. In the example shown here, alignment of the head groups of every fourth molecule produces striations 640 Å(64 nm) apart.

FIGURE 4-12 Structure of collagen fibrils. Collagen $mathml alt text292$ $left-parenthesis upper M Subscript r Baseline 300,000 right-parenthesis$ is a rod-shaped molecule, about 3,000 Å long and only 15 Å thick. Its three helically intertwined α chains may have different sequences; each chain has about 1,000 amino acid residues. Collagen fibrils are made up of collagen molecules aligned in a staggered fashion and cross-linked for strength. The specific alignment and degree of cross-linking vary with the tissue and produce characteristic cross-striations in an electron micrograph. In the example shown here, alignment of the head groups of every fourth molecule produces striations 640 Å(64 nm) apart.

A figure shows the structure of dehydrohydroxylysinonorleucine.

A polypeptide chain runs vertically along the left side. It extends from the top to N with H bonded to the left, that is bonded to C H that is bonded to a long chain to the right and to C below that is double bonded to O and further bonded below. A similar chain is on the right side but inverted and highlighted. Between them, from left to right, the chain that bonds to the central carbon has 3 C H 2 bonded to C H that has a highlighted double bond to N. This half is labeled, L y s residue minus epsilon amino group (norleucine). The right half of the molecule from this point is highlighted. N is bonded to C H 2 that is bonded to C H that is bonded to O H below and to C H 2 on the right that is further bonded to C H 2 that is bonded to the C H at the middle of the polypeptide chain. The right portion is labeled, H y L y s residue.

A typical mammal has more than 30 structural variants of collagen, particular to certain tissues and each somewhat different in sequence and function. Some human genetic defects in collagen structure illustrate the close relationship between amino acid sequence and three-dimensional structure in this protein. Osteogenesis imperfecta is characterized by abnormal bone formation in babies; at least eight variants of this condition, with different degrees of severity, occur in the human population. Ehlers-Danlos syndrome is characterized by loose joints, and at least six variants occur in humans. The composer Niccolò Paganini (1782–1840) was famed for his seemingly impossible dexterity in playing the violin. He suffered from a variant of Ehlers-Danlos syndrome that rendered him effectively double-jointed. In both disorders, some variants can be lethal, whereas others cause lifelong problems.

All of the variants of both conditions result from the substitution of an amino acid residue with a larger R group (such as Cys or Ser) for a single Gly residue in an α chain in one or another of the collagen proteins (a different Gly residue in each disorder). These single-residue substitutions have a catastrophic effect on collagen function because they disrupt the Gly–X–Y repeat that gives collagen its unique helical structure. Given its role in the collagen triple helix (Fig. 4-11), Gly cannot be replaced by another amino acid residue without substantial deleterious effects on collagen structure.

Fibroin The protein of silk, fibroin, is produced by insects and spiders. Its polypeptide chains are predominantly in the β conformation. Fibroin is rich in Ala and Gly residues, permitting a close packing of β sheets and an interlocking arrangement of R groups (Fig. 4-13). The overall structure is stabilized by extensive hydrogen bonding between all peptide linkages in the polypeptides of each β sheet and by the optimization of van der Waals interactions between sheets. Silk does not stretch, because the β conformation is already highly extended (Fig. 4-5). However, the structure is flexible, because the sheets are held together by numerous weak interactions rather than by covalent bonds such as the disulfide bonds in α-keratins.

A two-part figure, a and b, shows the structure of silk with part a showing how side chains interact in fibroin and part b showing silk emerging from the spinnerets of a spider. — FIGURE 4-13 Structure of silk. The fibers in silk cloth and in a spider web are made up primarily of the protein fibroin. (a) Fibroin consists of layers of antiparallel β sheets rich in Ala and Gly residues. The small side chains interdigitate and allow close packing of the sheets, as shown in the ball-and-stick view. The segments shown here would be just a small part of the fibroin strand. (b) Strands of silk emerge from the spinnerets of a spider in this colorized scanning electron micrograph. [(a) Data from PDB ID 1SLK, S. A. Fossey et al., *Biopolymers* 31:1529, 1991. (b) Tina Weatherby Carvalho/MicroAngela.]

Part a shows three horizontal layers. Each contains five ribbon-shaped arrows that point toward the observer on the left side and the alternate pointing away, then toward, then away, and so on. A strand is shown at the base of the left-hand arrow and another strand is shown leading away from top of the right side arrow. Text reads, antiparallel beta sheet. A close-up shows the strands of amino acids in these chains. The top sheet is shown with A l a side chains that alternate extending down in the front versus extending up in the back. The middle sheet has A l a side chains extending up in the front where the top sheet has A l a side chains extending down in the back and extending down in the front when the top sheet has them extending up in the back. The front amino acid of the first chain of middle chain has an A l a side chain facing up and the opposite side of the chain has a G l y side chain extending down. This corresponds with G l y extending up from the bottom layer, a pattern that is repeated along the layer. Part b shows a micrograph of many tubular structures that end in delicate-looking conical structures from which small or large strands emerge.

Structural Diversity Reflects Functional Diversity in Globular Proteins

In a globular protein, different segments of the polypeptide chain (or multiple polypeptide chains) fold back on each other, generating a more compact shape than is seen in the fibrous proteins (Fig. 4-14). The folding also provides the structural diversity necessary for proteins to carry out a wide array of biological functions. Globular proteins include enzymes, transport proteins, motor proteins, regulatory proteins, immunoglobulins, and proteins with many other functions.

A figure shows a long line representing a beta conformation that is 2,000 times 5 angstroms long, a thicker and shorter line representing alpha helix that is 900 times 11 angstroms long, and a short but thick and lighter native globular form that is almost oval and 100 times 60 angstroms in length. — FIGURE 4-14 Globular protein structures are compact and varied. Human serum albumin $mathml alt text299$ $left-parenthesis upper M Subscript r Baseline 64,500 right-parenthesis$ has 585 residues in a single chain. Given here are the approximate dimensions its single polypeptide chain would have if it occurred entirely in extended β conformation or as an α helix. Also shown is the size of the protein in its native globular form, as determined by x-ray crystallography; the polypeptide chain must be very compactly folded to fit into these dimensions.

FIGURE 4-14 Globular protein structures are compact and varied. Human serum albumin $mathml alt text299$ $left-parenthesis upper M Subscript r Baseline 64,500 right-parenthesis$ has 585 residues in a single chain. Given here are the approximate dimensions its single polypeptide chain would have if it occurred entirely in extended β conformation or as an α helix. Also shown is the size of the protein in its native globular form, as determined by x-ray crystallography; the polypeptide chain must be very compactly folded to fit into these dimensions.

Our discussion of globular proteins begins with the principles gleaned from the first protein structures to be elucidated. This is followed by a detailed description of protein substructure and comparative categorization. Such discussions are possible only because of the vast amount of information available online from publicly accessible databases, particularly the Protein Data Bank, or PDB (Box 4-3).

Box 4-3

The Protein Data Bank

The number of known three-dimensional protein structures is now more than 100,000 and doubles every couple of years. This wealth of information is revolutionizing our understanding of protein structure, the relationship of structure to function, and the evolutionary paths by which proteins arrived at their present state, which can be seen in the family resemblances that come to light as protein databases are sifted and sorted. One of the most important resources available to biochemists is the Protein Data Bank (PDB; www.rcsb.org).

The PDB is an archive of experimentally determined three-dimensional structures of biological macromolecules, containing virtually all of the macromolecular structures (such as proteins, RNAs, and DNAs) elucidated to date. Each structure is assigned an identifying label (a four-character identifier called the PDB ID). Such labels are provided in the figure legends for every PDB-derived structure illustrated in this text so that students and instructors can explore the same structures on their own. The data files in the PDB describe the spatial coordinates of each atom for which the position has been determined (many of the cataloged structures are not complete). Additional data files provide information on how the structure was determined and its accuracy. The atomic coordinates can be converted into an image of the macromolecule by using structure visualization software. Students are encouraged to access the PDB and explore structures, using visualization software linked to the database. Macromolecular structure files can also be downloaded and explored on the desktop, using free software such as JSmol.

Myoglobin Provided Early Clues about the Complexity of Globular Protein Structure

The first breakthrough in understanding the three-dimensional structure of a globular protein came from x-ray diffraction studies of myoglobin carried out by John Kendrew and his colleagues in the 1950s. Myoglobin is a relatively small $mathml alt text301$ $left-parenthesis upper M Subscript r Baseline 16,700 right-parenthesis comma$ oxygen-binding protein of muscle cells. It functions both to store oxygen and to facilitate oxygen diffusion in rapidly contracting muscle tissue. Myoglobin contains a single polypeptide chain of 153 amino acid residues of known sequence and a single iron protoporphyrin, or heme, group. The same heme group that is found in myoglobin is found in hemoglobin, the oxygen-binding protein of erythrocytes, and is responsible for the deep red-brown color of both myoglobin and hemoglobin. Myoglobin is particularly abundant in the muscles of diving mammals such as whales, seals, and porpoises — so abundant that the muscles of these animals are brown. Storage and distribution of oxygen by muscle myoglobin permits diving mammals to remain submerged for long periods. The activities of myoglobin and other globin molecules are investigated in greater detail in Chapter 5.

Figure 4-15 shows several structural representations of myoglobin, illustrating how the polypeptide chain is folded in three dimensions — its tertiary structure. The red group surrounded by protein is heme. The backbone of the myoglobin molecule consists of eight relatively straight segments of α helix interrupted by bends, some of which are β turns. The longest α helix has 23 amino acid residues and the shortest has only 7; all helices are right-handed. More than 70% of the residues in myoglobin are in these α-helical regions. X-ray analysis has revealed the precise position of each of the R groups, which fill up nearly all the space within the folded chain that is not occupied by backbone atoms.

A four-part figure, a, b, c, and d, shows four representations of the tertiary structure of sperm whale myoglobin with part a showing a ribbon representation of the backbone, part b showing a surface contour image, part c showing a ribbon representation that includes the side chains, and part d showing a space-filling model with all of the side chains. — FIGURE 4-15 Tertiary structure of sperm whale myoglobin. Orientation of the protein is similar in (a) through (d); the heme group is shown in red. In addition to illustrating the myoglobin structure, this figure provides examples of several different ways to display protein structure. (a) The polypeptide backbone in a ribbon representation of a type introduced by Jane Richardson, which highlights regions of secondary structure. The α-helical regions are evident. (b) Surface contour image; this is useful for visualizing pockets in the protein where other molecules might bind. (c) Ribbon representation including side chains (yellow) for the hydrophobic residues Leu, Ile, Val, and Phe. (d) Space-filling model with all amino acid side chains. Each atom is represented by a sphere encompassing its van der Waals radius. The hydrophobic residues are again shown in yellow; most are buried in the interior of the protein and thus are not visible. [Data from PDB ID 1MBO, S. E. Phillips, *J. Mol. Biol.* 142:531, 1980.]

Part a shows ribbon structures of helices connected by thin strands with two end strands extending on the right. The overall structure is roughly V shaped with a slight open area in the top, in which a strand from the left side joins to a ball-and-stick model in the center. Part b shows the same ball-and-stick model in a surface contour image that shows the overall shape to be roughly spherical with a bumpy surface but wider on top. This image also shows that the protein is present behind the ball-and-stick model. Part c is the same as part a but side chains are shown filling in the spaces around the helices and the ball-and-stick model has been replaced by a skeletal model of the same structure. Part d is a space-filling model that shows the same overall shape with the areas that had contained side chains a different color from the background color. This shows that they are largely present toward the inside.

Many important conclusions were drawn from the structure of myoglobin. The positioning of amino acid side chains reflects a structure that is largely stabilized by the hydrophobic effect. Most of the hydrophobic R groups are in the interior of the molecule, hidden from exposure to water. All but two of the polar R groups are located on the outer surface of the molecule, and all are hydrated. The myoglobin molecule is so compact that its interior has room for only four molecules of water. This dense hydrophobic core is typical of globular proteins. The fraction of space occupied by atoms in an organic liquid is 0.4 to 0.6. In a globular protein the fraction is about 0.75, comparable to that in a crystal (in a typical crystal the fraction is 0.70 to 0.78, near the theoretical maximum). In this packed environment, weak interactions strengthen and reinforce each other. For example, the nonpolar side chains in the core are so close together that short-range van der Waals interactions make a significant contribution to stabilizing interactions.

Deduction of the structure of myoglobin confirmed some expectations and introduced some new elements of secondary structure. As predicted by Pauling and Corey, all the peptide bonds are in the planar trans configuration. The α helices in myoglobin provided the first direct experimental evidence for the existence of this type of secondary structure. Three of the four Pro residues are found at bends. The fourth Pro residue occurs within an α helix, where it creates a kink necessary for tight helix packing.

The flat heme group rests in a crevice, or pocket, in the myoglobin molecule. Within this pocket, the accessibility of the heme group to solvent is highly restricted. This is important for function, because free heme groups in an oxygenated solution are rapidly oxidized from the ferrous $mathml alt text308$ $left-parenthesis Fe Superscript 2 plus Baseline right-parenthesis$ form, which is active in the reversible binding of $upper O Subscript 2$ $upper O Subscript 2$ , to the ferric $mathml alt text310$ $left-parenthesis Fe Superscript 3 plus Baseline right-parenthesis$ form, which does not bind $upper O Subscript 2$ $upper O Subscript 2$ . As myoglobin structures from many different species were resolved, investigators were able to observe the structural changes that accompany the binding of oxygen or other molecules and thus, for the first time, to understand the correlation between protein structure and function. Hundreds of proteins have now been subjected to similar analysis.

Globular Proteins Have a Variety of Tertiary Structures

Myoglobin illustrates just one of many ways in which a polypeptide chain can fold. Table 4-3 shows the proportions of α helix and β conformation (expressed as percentage of residues in each type) in several small, single-chain, globular proteins. Each of these proteins has a distinct structure, adapted for its particular biological function, but together they share several important properties with myoglobin. Each is folded compactly, and in each case the hydrophobic amino acid side chains are oriented toward the interior (away from water) and the hydrophilic side chains are on the surface. The structures are also stabilized by a multitude of hydrogen bonds and some ionic interactions.

TABLE 4-3 Approximate Proportion of α Helix and β Conformation in Some Single-Chain Proteins
Protein (total residues)	α Helix	β Conformation
	Residues (%)^a
Chymotrypsin (247)	14	45
Ribonuclease (124)	26	35
Carboxypeptidase (307)	38	17
Cytochrome c (104)	39	0
Lysozyme (129)	40	12
Myoglobin (153)	78	0
Source: Data from C. R. Cantor and P. R. Schimmel, Biophysical Chemistry, Part I: The Conformation of Biological Macromolecules, p. 100, W. H. Freeman and Company, 1980. ^aPortions of the polypeptide chains not accounted for by α helix or β conformation consist of bends and irregularly coiled or extended stretches. Segments of α helix and β conformation sometimes deviate slightly from their normal dimensions and geometry.

To understand a complete three-dimensional structure, we need to analyze its folding patterns. We begin by defining two important terms that describe protein structural patterns or elements in a polypeptide chain; then we turn to the folding rules. The first term is motif, also called a fold. A motif or fold is a recognizable folding pattern involving two or more elements of secondary structure and the connection(s) between them. A motif can be very simple, such as two elements of secondary structure folded against each other, and may represent only a small part of a protein. An example is a β-α-β loop (Fig. 4-16a). A motif can also be a very elaborate structure involving scores of protein segments folded together, such as the β barrel (Fig. 4-16b). In some cases, a single large motif may comprise the entire protein. The terms “motif” and “fold” are often used interchangeably, although “fold” is applied more commonly to somewhat more complex folding patterns. The segment defined as a motif or a fold may or may not be independently stable. We have already encountered a well-studied motif, the coiled coil of α-keratin, which is also found in some other proteins. The distinctive arrangement of eight α helices in myoglobin is replicated in all globins and is called the globin fold. Note that a motif is not a hierarchical structural element falling between secondary and tertiary structure. It is simply a folding pattern.

A two-part figure, a and b, shows two motifs with part a showing a beta-alpha-beta loop and part b showing a beta barrel. — FIGURE 4-16 Motifs. (a) A simple motif, the β-α-β loop. (b) A more elaborate motif, the β barrel. This β barrel is a single domain of α-hemolysin (a toxin that kills a cell by creating a hole in its membrane) from the bacterium *Staphylococcus aureus.* [Data from (a) PDB ID 4TIM, M. E. Noble et al., *J. Med. Chem*., 34:2709, 1991; (b) PDB ID 7AHL, L. Song et al., *Science* 274:1859, 1996.]

The second term for describing structural patterns is domain. A domain, as defined by Jane Richardson in 1981, is a part of a polypeptide chain that is independently stable or could undergo movements as a single entity with respect to the entire protein. Polypeptides with more than a few hundred amino acid residues often fold into two or more domains, sometimes with different functions. In many cases, a domain from a large protein will retain its native three-dimensional structure even when separated (for example, by proteolytic cleavage) from the remainder of the polypeptide chain. In a protein with multiple domains, each domain may appear as a distinct globular lobe (Fig. 4-17); more commonly, extensive contacts between domains make individual domains hard to discern. Different domains often have distinct functions, such as the binding of small molecules or interaction with other proteins. Small proteins usually have only one domain (the domain is the protein).

A space-filling model shows the structural domains in the polypeptide troponin C. Two roughly spherical structures are connected by a narrow region. They have a rough surface showing the surface contours. Two circles near the top left and bottom left of the left side are labeled C a 2 plus. — FIGURE 4-17 Structural domains in the polypeptide troponin C. This calcium-binding protein, associated with muscle, has two separate calcium-binding domains, shown here in brown and blue. [Data from PDB ID 4TNC, K. A. Satyshur et al., *J. Biol. Chem.* 263:1628, 1988.]

Folding of polypeptides is subject to an array of physical and chemical constraints, and several rules have emerged from studies of common protein-folding patterns.

The hydrophobic effect makes a large contribution to the stability of protein structures. Burial of hydrophobic amino acid R groups so as to exclude water requires at least two layers of secondary structure. Simple motifs such as the β-α-β loop (Fig. 4-16a) create two such layers.
Where they occur together in a protein, α helices and β sheets generally are found in different structural layers. This is because the backbone of a polypeptide segment in the β conformation (Fig. 4-5) cannot readily hydrogen-bond to an α helix that is adjacent to it.
Segments adjacent to each other in the amino acid sequence are usually stacked adjacent to each other in the folded structure. Distant segments of a polypeptide may come together in the tertiary structure, but this is not the norm.
The β conformation is most stable when the individual segments are twisted slightly in a right-handed sense. This influences both the arrangement of β sheets derived from the twisted segments and the path of the polypeptide connections between them. Two parallel β strands, for example, must be connected by a crossover strand (Fig. 4-18a). In principle, this crossover could have a right-handed or left-handed conformation, but in proteins it is almost always right-handed. Right-handed connections tend to be shorter than left-handed connections and tend to bend through smaller angles, making them easier to form. The twisting of β sheets also leads to a characteristic twisting of the structure formed by many such segments together, as seen in the β barrel (Fig. 4-16b) and the twisted β sheet (Fig. 4-18c), which form the core of many larger structures.

A three-part figure, a, b, and c, shows stable protein folding patterns. Part a shows typical connections in an all-beta motif and a crossover connection that is not observed, part b shows a right-handed connection between beta strands and a left-handed connection between beta strands that is very rare, and part c shows a twisted beta sheet. — FIGURE 4-18 Stable folding patterns in proteins. (a) Connections between β strands in layered β sheets. The strands here are viewed from one end, with no twisting. The connections at a given end (e.g., near the viewer) rarely cross one another. An example of such a rare crossover is illustrated by the red strands in the structure on the right. (b) Because of the right-handed twist in β strands, connections between strands are generally right-handed. Left-handed connections must traverse sharper angles and are harder to form. (c) This twisted β sheet is from a domain of photolyase (a protein that repairs certain types of DNA damage) from *E. coli*. Connecting loops have been removed so as to focus on the folding of the β sheet. [Data from PDB ID 1DNP, H. W. Park et al., *Science* 268:1866, 1995.]

Part a shows typical connections in an all-beta motif on the left. This consists of 8 parallel vertical arrows connected end to end by strings. The first arrow points down and its string loops to arrow 2 pointing upward that connects to arrow 5 (pointing downward), which loops to arrow 6 (pointing upward), which loops to arrow 8 (down), which loops to arrow 7 (up), which loops to arrow 4 (down), which loops to arrow 2 (up). To the right, a crossover connection (not observed) is similar but arrow 1 (down) points to arrow 2 (up) which loops to arrow 5 (down), which loops to arrow 4 (up), which loops to arrow 7 (down), which loops to arrow 8 (up), which loops to arrow 6 (down), which loops to arrow 3 (up). The string leaving arrow 6 runs across the string from arrows 4 to 5 in a highlighted spot. Part b shows a right-handed connection between beta strands as two downward facing arrows with a loop from the point of the right arrow to the base at the top of the left arrow. A left-handed connection between beta strands (very rare) is shown as two similar downward facing arrows, except that the loop runs from the point of the left arrow to the base of the right arrow. Part c shows a twisted beta sheet as five roughly parallel arrows that point toward the right. The first two curve upward, the third one curves downward, the fourth one has a slight wave in the middle, and the fifth one twists slightly to the left.

Following these rules, complex motifs can be built up from simple ones. For example, a series of β-α-β loops arranged so that the β strands form a barrel creates a particularly stable and common motif, the α/β barrel (Fig. 4-19). In this structure, each parallel β segment is attached to its neighbor by an α-helical segment. All connections are right-handed. The α/β barrel is found in many enzymes, often with a binding site (for a cofactor or a substrate) in the form of a pocket near one end of the barrel. Note that domains with similar folding patterns are said to have the same motif, even though their constituent α helices and β sheets may differ in length.

A figure shows how larger motifs are constructed from smaller motifs by showing how a beta-alpha-beta loop is visible within an alpha/beta barrel. — FIGURE 4-19 Constructing large motifs from smaller ones. The α/β barrel is a commonly occurring motif constructed from repetitions of the β-α-β loop motif. This α/β barrel is a domain of pyruvate kinase (a glycolytic enzyme) from rabbit. [Data from PDB ID 1PKN, T. M. Larsen et al., *Biochemistry* 33:6301, 1994.]

Some Proteins or Protein Segments Are Intrinsically Disordered

Although many proteins contain well-folded and stable structures, this is not necessary for the biological function of all proteins. Many proteins or protein segments lack ordered structures in solution. The concept that some proteins function in the absence of a definable three-dimensional structure comes from reassessment of data from many different proteins. As many as a third of all human proteins may be unstructured or may have significant unstructured segments. All organisms have some proteins that fall into this category. Intrinsically disordered proteins have properties that are distinct from those of classical, structured proteins. They often lack a hydrophobic core and instead are characterized by high densities of charged amino acid residues such as Lys, Arg, and Glu. Pro residues are also prominent, as they tend to disrupt ordered structures.

Structural disorder and high charge density can facilitate the function of some proteins as spacers, insulators, or linkers in larger structures. Other disordered proteins are scavengers, binding up ions and small molecules in solution and serving as reservoirs or garbage dumps. However, many intrinsically disordered proteins are at the heart of important protein interaction networks. The lack of an ordered structure can facilitate a kind of functional promiscuity, allowing one protein to interact with multiple or even dozens of partners. Structural disorder allows some inhibitor proteins, such as the mammalian cell division protein p27, to interact with multiple targets in different ways. In solution, p27 lacks definable structure. However, it wraps around and inhibits the action of several enzymes called protein kinases (see Chapter 6) that facilitate cell division. The flexible structure of p27 allows it to accommodate itself to its different target proteins. Human tumor cells, which are cells that have lost the capacity to control cell division normally, generally have reduced levels of p27; the lower the levels of p27, the poorer the prognosis for the cancer patient.

Similarly, intrinsically disordered proteins are often present as hubs or scaffolds at the center of protein networks that constitute signaling pathways (see Fig. 12-30). These proteins, or parts of them, may interact with many different binding partners. They often take on an ordered structure when they interact with other proteins, but the structure they assume may vary with different binding partners. The mammalian protein p53 is also critical in the control of cell division. It contains both structured and unstructured segments, and the different segments interact with dozens of other proteins. An unstructured region of p53 at the carboxyl terminus interacts with at least four different binding partners and assumes a different structure in each of the complexes (Fig. 4-20).

A two-part figure, a and b, shows the binding of p 53 protein to binding partners. Part a shows the p 53 protein, part b shows a plot of P O N D R score against amino acid residues, and part c shows how different structures can be formed depending on which interactions occur. — FIGURE 4-20 Binding of the intrinsically disordered carboxyl terminus of p53 protein to its binding partners. (a) The p53 protein is made up of several different segments. Only the central domain is well ordered. (b) The linear sequence of the p53 protein is depicted as a colored bar. The overlaid graph presents a plot of the PONDR (Predictor of Natural Disordered Regions) score versus the protein sequence. PONDR is one of the best available algorithms for predicting the likelihood that a given amino acid residue is in a region of intrinsic disorder, based on the surrounding amino acid sequence and amino acid composition. A score of 1.0 indicates a probability of 100% that a protein will be disordered. In the actual protein structure, the tan central domain is ordered. The amino-terminal (blue) and carboxyl-terminal (red) regions are disordered. (c) The very end of the carboxyl-terminal region has multiple binding partners and folds when it binds to each of them; however, the three-dimensional structure that is assumed when binding occurs is different for each of the interactions shown, and thus this carboxyl-terminal segment (11 to 20 residues) is shown in a different color in each complex. [Information from V. N. Uversky, *Intl. J. Biochem. Cell Biol.* 43:1090, 2011, Fig. 5. (a) Data from PDB ID 1TUP, Y. Cho et al., *Science* 265:346, 1994. (c) Data from Cyclin A: PDB ID 1H26, E. D. Lowe et al., *Biochemistry* 41:15,625, 2002; sirtuin: PDB ID 1MA3, J. L. Avalos et al., *Mol. Cell* 10:523, 2002; CBP bromodomain: PDB ID 1JSP, S. Mujtaba et al., *Mol. Cell* 13:251, 2004; s100B(ββ): PDB ID 1DT7, R. R. Rustandi et al., *Nature Struct. Biol.* 7:570, 2000.]

Part a shows a faint outer structure of an almost oval protein that is slightly wider at the lower right. It contains several ribbon structure arrows across the middle. The N-terminus is shown as a dotted line coming in from the right behind these arrows. A double helix is shown at the bottom, ending with a dotted line labeled C terminus. Part b has a graph that plots amino acid residues on the horizontal axis ranging from 0 to 400, labeled in increments of 100, against the P O N D R score on the vertical axis ranging from 0 to 1.0, labeled in increments of 0.5. There is a horizontal strip running from 0.5 on the vertical axis that is divided into three segments before it ends with a vertical strip at 380 on the horizontal axis. Segment 1 runs from 0 to 90 on the horizontal axis, segment 2 runs from 90 to 280 on the horizontal axis, and segment 3 runs from 280 to 380 on the horizontal axis. A line runs from the N terminus of the protein in part a to segment 1, a line runs from the double helix in the protein to segment 2, and a line runs from the C terminus to segment 3. The vertical strip has four segments. The first is wider than the others and runs from 0.25 to 0.4 on the vertical axis, the second runs from 0.4 to 0.5 on the vertical axis, the third runs from 0.5 to 0.6 on the vertical axis, and the fourth runs from 0.6 to 0.7 on the vertical axis. Lines run from each segment to short dotted strands that join to the surface contour views of four different proteins. Bottom vertical segment: a roughly comma-shaped protein, s 100 B (beta beta), with a colored portion extending in the smaller curve to the left to end in the short strand; second vertical segment: the strand joins a thin colored portion that joins with an almost spherical portion of a contour surface view of C B P bromo-domain; third segment: the string ends at a small, roughly rectangular colored segment inserted into the bottom of the upper left half of a roughly oval vertical protein shown in surface contour view, labeled sirtuin; top segment: the string runs to a small, roughly rectangular colored portion of an oval protein, cyclin A. All data on the graph are approximate. The curve on the graph begins at (0, 1.0) in segment 1 for the N terminus, drops rapidly to (50, 0.25), rises to (40, 0.8), drops to (50, 0.75), rises to (60, 95), drops slightly at (54, 90), then rises to (75, 1.0) to run straight along the top into the region of segment 2, where it drops rapidly to (101, 0.05), then drops to (105, 0), runs along the horizontal axis, then rises to (160, 80), drops to (180, 25), rises to (185, 0.5), drops to (205, 0.2), rises to (215, 0.5), drops to (235, 0.1), rises to (250, 0.4), drops to (270), 0.3), rises to (270, 0.85), drops slightly as it enters segment 3 for the C terminus to (305, 0.80), rises to (310, 1.0), drops to (330, 0.25), rises to (335, 0.6), drops to (340, 0.25), rises to (350, 0.75), drops to (360, 0.60), rises to (370, 0.95), drops to (385, 0.5) within the vertical strip, and rises to (380, 1.0).

Protein Motifs Are the Basis for Protein Structural Classification

More than 150,000 structures are now archived in the Protein Data Bank (PDB; for a deeper explanation, see Box 4-3). An enormous amount of information about protein structural principles, protein function, and protein evolution is contained in these data. Other databases have organized this information and made it more readily accessible. In the Structural Classification of Proteins database, or SCOP2 (http://scop2.mrc-lmb.cam.ac.uk), all of the protein information in the PDB can be searched within four different categories: (1) protein relationships, (2) structural classes, (3) protein types, and (4) evolutionary events. Figure 4-21 presents examples of protein motifs taken from SCOP2 to illustrate the potential of searching within each category. The figure also introduces another way to represent elements of secondary structure and the relationships among segments of secondary structure in a protein — the topology diagram.

A three-part figure, a, b, and c, shows how proteins are organized based on motifs. Part a shows the structures of alcohol dehydrogenase enzymes from two different organisms, part b shows a topology diagram for an alcohol dehydrogenase enzyme, and part c shows examples of two types of protein folds, all alpha and all beta. — FIGURE 4-21 Organization of proteins based on motifs. A few of the hundreds of known stable motifs. (a) Structural diagrams of the enzyme alcohol dehydrogenase from two different organisms. Such comparisons illustrate evolutionary relationships that conserve structure as well as function. (b) A topology diagram for the alcohol dehydrogenase from *Acinetobacter calcoaceticus*. Topology diagrams provide a way to visualize elements of secondary structure and their interconnections in two dimensions; the diagrams can be very useful in comparing structural folds or motifs. (c) The Structural Classification of Proteins (SCOP2) database (http://scop2.mrc-lmb.cam.ac.uk) organizes protein folds into four classes: all α, all β, α/β, and α + β. Examples of all α folds and all β folds are shown with their structural classification data (PDB ID, fold name, protein name, and source organism) from the SCOP2 database. The PDB ID is the unique accession code given to each structure archived in the Protein Data Bank (www.rcsb.org). [Data from (a) PDB ID 2JHF, R. Meijers et al., *Biochemistry* 46:5446, 2007; (a, b) PDB ID 1F8F, J. C. Beauchamp et al. (c) PDB ID 1BCF, F. Frolow et al., *Nature Struct. Biol.* 1:453, 1994; PDB ID 1PEX, F. X. Gomis-Ruth et al., *J. Mol. Biol.* 264:556, 1996.]

Part a is labeled, structurally similar, different sequence and organism. Two proteins are shown. The left-hand protein is labeled, 2 J H F, alcohol dehydrogenase, italicized Equus italicized caballus, horse. The protein is roughly a vertical oval with the N terminus near the upper left side and the C terminus at the top. It has multiple ribbon arrows at the top, and near the N terminus as well as in a line near the bottom. It also has helices, mostly in the center with two extending below the row of arrows. The right-hand protein is labeled, 1 F 8 F, alcohol dehydrogenase, italicized Acinetobacter, italicized calcoaceticus, bacterium found in human intestinal microbiota. The structure is very similar to the left-hand structure. Part b shows a topology diagram that consists of interconnected cylinders and arrows. There are two antiparallel arrows on the upper left in a rectangle. The N terminus connects to the right-hand arrow, labeled 1, that points down. It connects by a strand to arrow 2, which points up. The string bends right across the top of both arrows and leaves the box, then enters a larger box to the right containing seven arrows, with 11 above 34 on the left both pointing up, then down arrow 3, up arrow 5, down arrow 6, up arrow 13, and up arrow 31. The string crosses above arrows 11 above 34, and enters downward arrow 3. The string then continues out of the box into cylinder 4 before looping back up to upward arrow 5, adjacent to arrow 3. The string leaves arrow 5 and curves down to adjacent down arrow 6, then continues out of the box and enters and leaves cylinders 7, 8, 9, and 10 in a vertical line going downward before looping up along the side of another rectangle to the lower left and back up to arrow 11, which is to the left of arrow 3. It leaves arrow 11 and enters a tiny cylinder labeled, 12 above the box, then crosses above arrows 3, 5, and 6, to loop back down into the box and enter the bottom of arrow 13, then up out of the box to go through small cylinders 14 and 15 and larger cylinder 16 along a vertical line before looping back down along the right side of the diagram and bending back to the left to travel through arrow 17, which points to the left. Arrow 17 is in a box with all of the arrows pointing left. From top to bottom, they are 21, 19, 17, 35, 25, and 27. The arrow continues through cylinder 18 outside of the box, loops back to the base of arrow 19 and then out to cylinder 20, loops back to the base of cylinder 21, then out through cylinder 22, then back down and through the center of the box to enter the back of arrow 23, then out through cylinder 24, then down across the bottom of the box to loop around into arrow 25, then out through cylinder 26, then back into the box to loop into the base of arrow 27, then out through small cylinder 28, then cylinder 29 (colored the same as the arrows unlike the other cylinders), then through small cylinder 30 before looping back to the left and running up to arrow 31 in the rectangle above, then out through small cylinder 32, larger cylinder 33, and looping back to the left above the box and then down to the base of arrow 34 on the left side of the box and then up out of the box to end at the C terminus. Part c shows two molecules. The left-hand molecule is labeled All alpha, 1 B C F, ferritin-like, bacterioferritin (cytochrome italicized b subscript 1), italicized Escherichia, italicized coli. It consists of four roughly vertical helices connected by strands, with one strand extending out of the top and one strand ending with a small horizontal helix that runs across the bottom and extends a short strand to the left. The right-hand molecule is labeled all beta, 1 P E X, four-bladed beta propeller, collagenase-3 (M M P-13), human (italicized Homo, italicized sapiens). The molecule has a roughly circular structure that begins with a small string at the lower left that joins to a short helix that runs to the tip of a right-pointing arrow before looping through three more arrows to a very short curl of a helix on the lower left, then up to seven short arrows, then over to a coil of a helix at the upper right and through an almost horizontal row of four arrows before ending on the right side.

The number of folding patterns is not infinite. Among the tens of thousands of distinct protein structures archived in the PDB, only about 1,400 different folds or motifs are classified by the SCOP2 database. Given the many years of progress in structural biology, new motifs are now discovered only rarely. Many examples of recurring domain or motif structures are available, and these reveal that protein tertiary structure is more reliably conserved than amino acid sequence. The comparison of protein structures can thus provide much information about evolution. Proteins with significant similarity in primary structure and/or with similar tertiary structure and function are said to be in the same protein family. The protein structures in the PDB belong to about 4,000 different protein families. A strong evolutionary relationship is usually evident within a protein family. For example, the globin family has many different proteins with both structural and sequence similarities to myoglobin (as seen in the proteins used as examples in Figures 4-30 and 4-31 and in Chapter 5). Two or more families that have little similarity in amino acid sequence but make use of the same major structural motif and have functional similarities are grouped into superfamilies. An evolutionary relationship among families in a superfamily is considered probable, even though time and functional distinctions — that is, different adaptive pressures — may have erased many of the telltale sequence relationships.

A protein family may be widespread in all three domains of cellular life — the Bacteria, Archaea, and Eukarya — suggesting an ancient origin. Many proteins involved in intermediary metabolism and the metabolism of nucleic acids and proteins fall into this category. Other families may be present in only a small group of organisms, indicating that the structure arose more recently. Tracing the natural history of structural motifs through the use of structural classifications in databases such as SCOP2 provides a powerful complement to sequence analyses in tracing evolutionary relationships. The SCOP2 database is curated manually, with the objective of placing proteins in the correct evolutionary framework based on conserved structural features.

Structural motifs become especially important in defining protein families and superfamilies. Improved protein classification and comparison systems lead inevitably to the elucidation of new functional relationships. Given the central role of proteins in living systems, these structural comparisons can help illuminate every aspect of biochemistry, from the evolution of individual proteins to the evolutionary history of complete metabolic pathways.

Protein Quaternary Structures Range from Simple Dimers to Large Complexes

Many proteins have multiple polypeptide subunits (from two to hundreds). The association of polypeptide chains can serve a variety of functions. Many multisubunit proteins have regulatory roles; the binding of small molecules may affect the interaction between subunits, causing large changes in the protein’s activity in response to small changes in the concentration of substrate or regulatory molecules (Chapter 6). In other cases, separate subunits take on separate but related functions, such as catalysis and regulation. Some associations, such as those seen in the fibrous proteins considered earlier in this chapter and the coat proteins of viruses, serve primarily structural roles. Some very large protein assemblies are the site of complex, multistep reactions. For example, each ribosome, the site of protein synthesis, incorporates dozens of protein subunits along with RNA molecules.

A multisubunit protein can also be referred to as an oligomer or multimer. If an oligomer has nonidentical subunits, the overall structure of the protein can be asymmetric and quite complicated. However, many oligomers have identical subunits or repeating groups of nonidentical subunits, usually in symmetric arrangements. As noted in Chapter 3, the repeating structural unit in such an oligomeric protein, whether a single subunit or a group of subunits, is called a protomer.

The first oligomeric protein to have its three-dimensional structure determined was hemoglobin $mathml alt text365$ $left-parenthesis upper M Subscript r Baseline 64,500 right-parenthesis comma$ which contains four polypeptide chains and four heme prosthetic groups, in which the iron atoms are in the ferrous $mathml alt text366$ $left-parenthesis Fe Superscript 2 plus Baseline right-parenthesis$ state (as we shall see in Chapter 5). The protein portion, the globin, consists of two α chains (141 residues each) and two β chains (146 residues each). Note that in this case, α and β do not refer to secondary structures. In a practice that can be confusing to the beginning student, the Greek letters α and β (and γ, δ, and others) are often used to distinguish two different kinds of subunits within a multisubunit protein, regardless of what kinds of secondary structure may predominate in the subunits. Because hemoglobin is four times as large as myoglobin, much more time and effort were required to solve its three-dimensional structure by x-ray analysis, finally achieved by Max Perutz, John Kendrew, and their colleagues in 1959. The subunits of hemoglobin are arranged in symmetric pairs (Fig. 4-22), each pair having one α subunit and one β subunit. Hemoglobin can therefore be described either as a tetramer or as a dimer of αβ protomers. The role these distinct subunits play in hemoglobin function is discussed extensively in Chapter 5.

A two-part figure, a and b, shows the quaternary structure of deoxyhemoglobin as a ribbon structure in part a and as a surface contour model in part b. — FIGURE 4-22 Quaternary structure of deoxyhemoglobin. X-ray diffraction analysis of deoxyhemoglobin (hemoglobin without oxygen molecules bound to the heme groups) shows how the four polypeptide subunits are packed together. (a) A ribbon representation reveals the secondary structural elements of the structure and the positioning of all the heme prosthetic groups. (b) A surface contour model shows the pockets in which the heme prosthetic groups are bound and helps to visualize subunit packing. The α subunits are shown in shades of gray, the β subunits in shades of blue. Note that the heme groups (red) are relatively far apart. [Data from PDB ID 2HHB, G. Fermi et al., *J. Mol. Biol.* 175:159, 1984.]

Part a shows a roughly circular structure with slight openings at the top center and bottom center, roughly divided into four quadrants. It is made up of many helices. The top half is wider on the left and narrower on the right and is a different color from the bottom half. The top right and bottom right quadrants are darker shades of the same color of the corresponding left-hand quadrants. Small ball and stick models are visible in the center of the upper left and lower right portions and farther back in the lower left and upper right portions. Part b shows the same structure as a surface contour model with a rough surface. The upper left and lower right appear to be in front of the other portions. The ball-and-stick models are still visible in the upper left and lower right, but not in the other portions.

A photo shows Max Perutz and John Kendrew. — Max Perutz, 1914–2002 (left), and John Kendrew, 1917–1997

SUMMARY 4.3 Protein Tertiary and Quaternary Structures

Tertiary structure is the complete three-dimensional structure of a polypeptide chain. Many proteins fall into one of four general classes based on tertiary structure: fibrous, globular, membrane, or disordered.
Insoluble fibrous proteins, such as those that make up keratin, collagen, and silk, have simple repeating elements of secondary structure. In some fibrous proteins, the individual polypeptide chains interact to form complex quaternary structures like coiled coils for strength and flexibility.
Globular proteins have more complicated tertiary structures, often containing several types of secondary structure in the same polypeptide chain, and fulfill many different functional roles in the cell.
The first globular protein structure to be determined, by x-ray diffraction methods, was that of the $mathml alt text379$ $upper O Subscript 2$ -binding protein myoglobin. The myoglobin structure revealed for the first time how protein structure and function are connected.
The complex structures of globular proteins can be analyzed by examination of folding patterns, called motifs or folds. The many thousands of known protein structures are generally assembled from a repertoire of only a few hundred motifs. Domains are regions of a polypeptide chain that can fold stably and independently.
Some proteins or protein segments are intrinsically disordered, lacking definable three-dimensional structure. These proteins often have distinctive amino acid compositions that allow a more flexible structure, which is critical for their biological function.
Based on structural similarities, proteins can be organized into families and superfamilies, which are informative about protein function and evolution.
Quaternary structure results from interactions between the subunits of multisubunit (multimeric) proteins or large supramolecular assemblies. Some multimeric proteins are composed of repeated subunits called protomers.