The overall three-dimensional arrangement of all atoms in a protein is referred to as the protein’s tertiary structure. Whereas the term “secondary structure” refers to the spatial arrangement of amino acid residues that are adjacent in a segment of a polypeptide, tertiary structure includes longer-range aspects of amino acid sequence. Amino acids that are far apart in the polypeptide sequence and are in different types of secondary structure may interact within the completely folded structure of a protein. Interacting segments of polypeptide chains are held in their characteristic tertiary positions by several kinds of weak interactions (and sometimes by covalent bonds such as disulfide cross-links) between the segments. Some proteins contain two or more separate polypeptide chains, or subunits, which may be identical or different. The arrangement of these protein subunits in three-dimensional complexes constitutes quaternary structure.
In considering these higher levels of structure, it is useful to designate the major groups into which many proteins can be classified: fibrous proteins, with polypeptide chains arranged in long strands or sheets; globular proteins, with polypeptide chains folded into a spherical or globular shape; membrane proteins, with polypeptide chains embedded in hydrophobic lipid membranes; and intrinsically disordered proteins, with polypeptide chains lacking stable tertiary structures. We focus here on fibrous, globular, and intrinsically disordered proteins; membrane proteins are discussed in Chapter 11. These three groups are structurally distinct. Fibrous proteins usually consist of a single type of secondary structure, and their tertiary structure is relatively simple. Globular proteins often contain several types of secondary structure. Intrinsically disordered proteins can lack secondary structure entirely. The groups also differ functionally: the structures that provide support, shape, and external protection to vertebrates are made of fibrous proteins. Most enzymes are globular proteins, whereas regulatory proteins can be globular, disordered, or contain both globular and disordered segments.
α-Keratin, collagen, and silk fibroin nicely illustrate the relationship between protein structure and biological function (Table 4-2). Fibrous proteins share properties that give strength and/or flexibility to the structures in which they occur. In each case, the fundamental structural unit is a simple repeating element of secondary structure. All fibrous proteins are insoluble in water, a property conferred by a high concentration of hydrophobic amino acid residues both in the interior of the protein and on its surface. These hydrophobic surfaces are largely buried, as many similar polypeptide chains are packed together to form elaborate supramolecular complexes. The underlying structural simplicity of fibrous proteins makes them particularly useful for illustrating some of the fundamental principles of protein structure discussed previously.
Structure | Characteristics | Examples of occurrence |
---|---|---|
α Helix, cross-linked by disulfide bonds |
Tough, insoluble protective structures of varying hardness and flexibility |
α-Keratin of hair, feathers, nails |
β Conformation |
Soft, flexible filaments |
Silk fibroin |
Collagen triple helix |
High tensile strength, without stretch |
Collagen of tendons, bone matrix |
α-Keratin The α-keratins have evolved for strength. Found only in mammals, these proteins constitute almost the entire dry weight of hair, wool, nails, claws, quills, horns, and hooves and much of the outer layer of skin. The α-keratins are part of a broader family of proteins called intermediate filament (IF) proteins. Other IF proteins are found in the cytoskeletons of animal cells. All IF proteins have a structural function and share the structural features exemplified by the α-keratins.
The α-keratin helix is a right-handed α helix, the same helix found in many other proteins. Francis Crick and Linus Pauling, in the early 1950s, independently suggested that the α helices of keratin were arranged as a coiled coil. Two strands of α-keratin, oriented in parallel (with their amino termini at the same end), are wrapped about each other to form a supertwisted coiled coil. The supertwisting amplifies the strength of the overall structure, just as strands are twisted to make a strong rope (Fig. 4-10). The twisting of the axis of an α helix to form a coiled coil explains the discrepancy between the 5.4 Å per turn predicted for an α helix by Pauling and Corey and the 5.15 to 5.2 Å repeating structure observed in the x-ray diffraction of hair (see end-of-chapter problem 2). The helical path of the supertwists is left-handed, opposite in sense to the α helix. The surfaces where the two α helices touch are made up of hydrophobic amino acid residues, their R groups meshed together in a regular interlocking pattern. This permits a close packing of the polypeptide chains within the left-handed supertwist. Not surprisingly, α-keratin is rich in the hydrophobic residues Ala, Val, Leu, Ile, Met, and Phe.
An individual polypeptide in the α-keratin coiled coil has a relatively simple tertiary structure, dominated by an α-helical secondary structure with its helical axis twisted in a left-handed superhelix. The intertwining of the two α-helical polypeptides is an example of quaternary structure. Coiled coils of this type are common structural elements in filamentous proteins and in the muscle protein myosin (see Fig. 5-26). The quaternary structure of α-keratin can be quite complex. Many coiled coils can be assembled into large supramolecular complexes, such as the arrangement of α-keratin that forms the intermediate filament of hair (Fig. 4-10b).
The strength of fibrous proteins is enhanced by covalent cross-links between polypeptide chains in the multihelical “ropes” and between adjacent chains in a supramolecular assembly. In α-keratins, the cross-links stabilizing quaternary structure are disulfide bonds. In the hardest and toughest α-keratins, such as those of rhinoceros horn, up to 18% of the residues are cysteines involved in disulfide bonds.
Collagen Like the α-keratins, collagen has evolved to provide strength. It is found in connective tissue such as tendons, cartilage, the organic matrix of bone, and the cornea of the eye. In fact, collagen is the most abundant protein in mammals, usually comprising 25% to 35% of total protein content. The collagen helix is a unique secondary structure, quite distinct from the α helix. It is left-handed and has three amino acid residues per turn (Fig. 4-11 and Table 4-1). Collagen is also a coiled coil, but one with distinct tertiary and quaternary structures: three separate polypeptides, called α chains (not to be confused with α helices), are twisted about each other. The superhelical twisting is right-handed in collagen, opposite in sense to the left-handed helix of the α chains.
There are many types of vertebrate collagen. Typically, they contain about 35% Gly, 11% Ala, and 21% Pro and 4-Hyp (4-hydroxyproline, an uncommon amino acid; see Fig. 3-8a). The food product gelatin is derived from collagen. It has little nutritional value as a protein, because collagen is extremely low in many amino acids that are essential in the human diet. The unusual amino acid content of collagen is related to structural constraints unique to the collagen helix. The amino acid sequence in collagen is generally a repeating tripeptide unit, Gly–X–Y, where X is often Pro and Y is often 4-Hyp. Only Gly residues can be accommodated at the very tight junctions between the individual α chains (Fig. 4-11b). The Pro and 4-Hyp residues permit the sharp twisting of the collagen helix. The amino acid sequence and the supertwisted quaternary structure of collagen allow a very close packing of its three polypeptides. 4-Hydroxyproline has a special role in the structure of collagen — and in human history (Box 4-2).
The tight wrapping of the α chains in the collagen triple helix provides tensile strength greater than that of a steel wire of equal cross section. Collagen fibrils (Fig. 4-12) are supramolecular assemblies consisting of triple-helical collagen molecules (sometimes referred to as tropocollagen molecules) associated in a variety of ways to provide different degrees of tensile strength. The α chains of collagen molecules and the collagen molecules of fibrils are cross-linked by unusual types of covalent bonds involving Lys, HyLys (5-hydroxylysine), or His residues that are present at a few of the X and Y positions. These links create uncommon amino acid residues such as dehydrohydroxylysinonorleucine. The increasingly rigid and brittle character of aging connective tissue results from accumulated covalent cross-links in collagen fibrils.
The micrograph shows many strands running in varied directions. Each one appears to have many narrow segments labeled, cross-striations. There are about 4 striations in a 250 nanometers piece. The illustration shows 6 horizontal rows connected by yellow highlighted bonds. Each horizontal row consists of multiple pieces lined up end to end. Each piece has three intertwined helices and rounded ends, two of which are labeled, heads of collagen molecules. A close-up of a piece of one of these segments labeled, section of collagen molecule shows three different intertwined strands that each contain a ribbon-like helix.
A polypeptide chain runs vertically along the left side. It extends from the top to N with H bonded to the left, that is bonded to C H that is bonded to a long chain to the right and to C below that is double bonded to O and further bonded below. A similar chain is on the right side but inverted and highlighted. Between them, from left to right, the chain that bonds to the central carbon has 3 C H 2 bonded to C H that has a highlighted double bond to N. This half is labeled, L y s residue minus epsilon amino group (norleucine). The right half of the molecule from this point is highlighted. N is bonded to C H 2 that is bonded to C H that is bonded to O H below and to C H 2 on the right that is further bonded to C H 2 that is bonded to the C H at the middle of the polypeptide chain. The right portion is labeled, H y L y s residue.
A typical mammal has more than 30 structural variants of collagen, particular to certain tissues and each somewhat different in sequence and function. Some human genetic defects in collagen structure illustrate the close relationship between amino acid sequence and three-dimensional structure in this protein. Osteogenesis imperfecta is characterized by abnormal bone formation in babies; at least eight variants of this condition, with different degrees of severity, occur in the human population. Ehlers-Danlos syndrome is characterized by loose joints, and at least six variants occur in humans. The composer Niccolò Paganini (1782–1840) was famed for his seemingly impossible dexterity in playing the violin. He suffered from a variant of Ehlers-Danlos syndrome that rendered him effectively double-jointed. In both disorders, some variants can be lethal, whereas others cause lifelong problems.
All of the variants of both conditions result from the substitution of an amino acid residue with a larger R group (such as Cys or Ser) for a single Gly residue in an α chain in one or another of the collagen proteins (a different Gly residue in each disorder). These single-residue substitutions have a catastrophic effect on collagen function because they disrupt the Gly–X–Y repeat that gives collagen its unique helical structure. Given its role in the collagen triple helix (Fig. 4-11), Gly cannot be replaced by another amino acid residue without substantial deleterious effects on collagen structure.
Fibroin The protein of silk, fibroin, is produced by insects and spiders. Its polypeptide chains are predominantly in the β conformation. Fibroin is rich in Ala and Gly residues, permitting a close packing of β sheets and an interlocking arrangement of R groups (Fig. 4-13). The overall structure is stabilized by extensive hydrogen bonding between all peptide linkages in the polypeptides of each β sheet and by the optimization of van der Waals interactions between sheets. Silk does not stretch, because the β conformation is already highly extended (Fig. 4-5). However, the structure is flexible, because the sheets are held together by numerous weak interactions rather than by covalent bonds such as the disulfide bonds in α-keratins.
Part a shows three horizontal layers. Each contains five ribbon-shaped arrows that point toward the observer on the left side and the alternate pointing away, then toward, then away, and so on. A strand is shown at the base of the left-hand arrow and another strand is shown leading away from top of the right side arrow. Text reads, antiparallel beta sheet. A close-up shows the strands of amino acids in these chains. The top sheet is shown with A l a side chains that alternate extending down in the front versus extending up in the back. The middle sheet has A l a side chains extending up in the front where the top sheet has A l a side chains extending down in the back and extending down in the front when the top sheet has them extending up in the back. The front amino acid of the first chain of middle chain has an A l a side chain facing up and the opposite side of the chain has a G l y side chain extending down. This corresponds with G l y extending up from the bottom layer, a pattern that is repeated along the layer. Part b shows a micrograph of many tubular structures that end in delicate-looking conical structures from which small or large strands emerge.
In a globular protein, different segments of the polypeptide chain (or multiple polypeptide chains) fold back on each other, generating a more compact shape than is seen in the fibrous proteins (Fig. 4-14). The folding also provides the structural diversity necessary for proteins to carry out a wide array of biological functions. Globular proteins include enzymes, transport proteins, motor proteins, regulatory proteins, immunoglobulins, and proteins with many other functions.
Our discussion of globular proteins begins with the principles gleaned from the first protein structures to be elucidated. This is followed by a detailed description of protein substructure and comparative categorization. Such discussions are possible only because of the vast amount of information available online from publicly accessible databases, particularly the Protein Data Bank, or PDB (Box 4-3).
The Protein Data Bank
The number of known three-dimensional protein structures is now more than 100,000 and doubles every couple of years. This wealth of information is revolutionizing our understanding of protein structure, the relationship of structure to function, and the evolutionary paths by which proteins arrived at their present state, which can be seen in the family resemblances that come to light as protein databases are sifted and sorted. One of the most important resources available to biochemists is the Protein Data Bank (PDB; www.rcsb.org).
The PDB is an archive of experimentally determined three-dimensional structures of biological macromolecules, containing virtually all of the macromolecular structures (such as proteins, RNAs, and DNAs) elucidated to date. Each structure is assigned an identifying label (a four-character identifier called the PDB ID). Such labels are provided in the figure legends for every PDB-derived structure illustrated in this text so that students and instructors can explore the same structures on their own. The data files in the PDB describe the spatial coordinates of each atom for which the position has been determined (many of the cataloged structures are not complete). Additional data files provide information on how the structure was determined and its accuracy. The atomic coordinates can be converted into an image of the macromolecule by using structure visualization software. Students are encouraged to access the PDB and explore structures, using visualization software linked to the database. Macromolecular structure files can also be downloaded and explored on the desktop, using free software such as JSmol.
The first breakthrough in understanding the three-dimensional structure of a globular protein came from x-ray diffraction studies of myoglobin carried out by John Kendrew and his colleagues in the 1950s. Myoglobin is a relatively small oxygen-binding protein of muscle cells. It functions both to store oxygen and to facilitate oxygen diffusion in rapidly contracting muscle tissue. Myoglobin contains a single polypeptide chain of 153 amino acid residues of known sequence and a single iron protoporphyrin, or heme, group. The same heme group that is found in myoglobin is found in hemoglobin, the oxygen-binding protein of erythrocytes, and is responsible for the deep red-brown color of both myoglobin and hemoglobin. Myoglobin is particularly abundant in the muscles of diving mammals such as whales, seals, and porpoises — so abundant that the muscles of these animals are brown. Storage and distribution of oxygen by muscle myoglobin permits diving mammals to remain submerged for long periods. The activities of myoglobin and other globin molecules are investigated in greater detail in Chapter 5.
Figure 4-15 shows several structural representations of myoglobin, illustrating how the polypeptide chain is folded in three dimensions — its tertiary structure. The red group surrounded by protein is heme. The backbone of the myoglobin molecule consists of eight relatively straight segments of α helix interrupted by bends, some of which are β turns. The longest α helix has 23 amino acid residues and the shortest has only 7; all helices are right-handed. More than 70% of the residues in myoglobin are in these α-helical regions. X-ray analysis has revealed the precise position of each of the R groups, which fill up nearly all the space within the folded chain that is not occupied by backbone atoms.
Part a shows ribbon structures of helices connected by thin strands with two end strands extending on the right. The overall structure is roughly V shaped with a slight open area in the top, in which a strand from the left side joins to a ball-and-stick model in the center. Part b shows the same ball-and-stick model in a surface contour image that shows the overall shape to be roughly spherical with a bumpy surface but wider on top. This image also shows that the protein is present behind the ball-and-stick model. Part c is the same as part a but side chains are shown filling in the spaces around the helices and the ball-and-stick model has been replaced by a skeletal model of the same structure. Part d is a space-filling model that shows the same overall shape with the areas that had contained side chains a different color from the background color. This shows that they are largely present toward the inside.
Many important conclusions were drawn from the structure of myoglobin. The positioning of amino acid side chains reflects a structure that is largely stabilized by the hydrophobic effect. Most of the hydrophobic R groups are in the interior of the molecule, hidden from exposure to water. All but two of the polar R groups are located on the outer surface of the molecule, and all are hydrated. The myoglobin molecule is so compact that its interior has room for only four molecules of water. This dense hydrophobic core is typical of globular proteins. The fraction of space occupied by atoms in an organic liquid is 0.4 to 0.6. In a globular protein the fraction is about 0.75, comparable to that in a crystal (in a typical crystal the fraction is 0.70 to 0.78, near the theoretical maximum). In this packed environment, weak interactions strengthen and reinforce each other. For example, the nonpolar side chains in the core are so close together that short-range van der Waals interactions make a significant contribution to stabilizing interactions.
Deduction of the structure of myoglobin confirmed some expectations and introduced some new elements of secondary structure. As predicted by Pauling and Corey, all the peptide bonds are in the planar trans configuration. The α helices in myoglobin provided the first direct experimental evidence for the existence of this type of secondary structure. Three of the four Pro residues are found at bends. The fourth Pro residue occurs within an α helix, where it creates a kink necessary for tight helix packing.
The flat heme group rests in a crevice, or pocket, in the myoglobin molecule. Within this pocket, the accessibility of the heme group to solvent is highly restricted. This is important for function, because free heme groups in an oxygenated solution are rapidly oxidized from the ferrous form, which is active in the reversible binding of , to the ferric form, which does not bind . As myoglobin structures from many different species were resolved, investigators were able to observe the structural changes that accompany the binding of oxygen or other molecules and thus, for the first time, to understand the correlation between protein structure and function. Hundreds of proteins have now been subjected to similar analysis.
Myoglobin illustrates just one of many ways in which a polypeptide chain can fold. Table 4-3 shows the proportions of α helix and β conformation (expressed as percentage of residues in each type) in several small, single-chain, globular proteins. Each of these proteins has a distinct structure, adapted for its particular biological function, but together they share several important properties with myoglobin. Each is folded compactly, and in each case the hydrophobic amino acid side chains are oriented toward the interior (away from water) and the hydrophilic side chains are on the surface. The structures are also stabilized by a multitude of hydrogen bonds and some ionic interactions.
Residues (%)a | ||
---|---|---|
Protein (total residues) | α Helix | β Conformation |
Chymotrypsin (247) |
14 |
45 |
Ribonuclease (124) |
26 |
35 |
Carboxypeptidase (307) |
38 |
17 |
Cytochrome c (104) |
39 |
0 |
Lysozyme (129) |
40 |
12 |
Myoglobin (153) |
78 |
0 |
Source: Data from C. R. Cantor and P. R. Schimmel, Biophysical Chemistry, Part I: The Conformation of Biological Macromolecules, p. 100, W. H. Freeman and Company, 1980. aPortions of the polypeptide chains not accounted for by α helix or β conformation consist of bends and irregularly coiled or extended stretches. Segments of α helix and β conformation sometimes deviate slightly from their normal dimensions and geometry. |
To understand a complete three-dimensional structure, we need to analyze its folding patterns. We begin by defining two important terms that describe protein structural patterns or elements in a polypeptide chain; then we turn to the folding rules. The first term is motif, also called a fold. A motif or fold is a recognizable folding pattern involving two or more elements of secondary structure and the connection(s) between them. A motif can be very simple, such as two elements of secondary structure folded against each other, and may represent only a small part of a protein. An example is a β-α-β loop (Fig. 4-16a). A motif can also be a very elaborate structure involving scores of protein segments folded together, such as the β barrel (Fig. 4-16b). In some cases, a single large motif may comprise the entire protein. The terms “motif” and “fold” are often used interchangeably, although “fold” is applied more commonly to somewhat more complex folding patterns. The segment defined as a motif or a fold may or may not be independently stable. We have already encountered a well-studied motif, the coiled coil of α-keratin, which is also found in some other proteins. The distinctive arrangement of eight α helices in myoglobin is replicated in all globins and is called the globin fold. Note that a motif is not a hierarchical structural element falling between secondary and tertiary structure. It is simply a folding pattern.
Part a shows a beta alpha beta loop as a string coming in from the left, where an arrow is superimposed, running upward to a helix that runs vertically across the top, and running downward to fold back along the first piece with a second arrow above the first pointing to the right just before the end of the string. Part b shows a beta barrel as a structure consisting of diagonal arrows that alternate running to the right and running to the left, connected by strings above, to form a circular structure that is viewed from one side.
The second term for describing structural patterns is domain. A domain, as defined by Jane Richardson in 1981, is a part of a polypeptide chain that is independently stable or could undergo movements as a single entity with respect to the entire protein. Polypeptides with more than a few hundred amino acid residues often fold into two or more domains, sometimes with different functions. In many cases, a domain from a large protein will retain its native three-dimensional structure even when separated (for example, by proteolytic cleavage) from the remainder of the polypeptide chain. In a protein with multiple domains, each domain may appear as a distinct globular lobe (Fig. 4-17); more commonly, extensive contacts between domains make individual domains hard to discern. Different domains often have distinct functions, such as the binding of small molecules or interaction with other proteins. Small proteins usually have only one domain (the domain is the protein).
Folding of polypeptides is subject to an array of physical and chemical constraints, and several rules have emerged from studies of common protein-folding patterns.
Part a shows typical connections in an all-beta motif on the left. This consists of 8 parallel vertical arrows connected end to end by strings. The first arrow points down and its string loops to arrow 2 pointing upward that connects to arrow 5 (pointing downward), which loops to arrow 6 (pointing upward), which loops to arrow 8 (down), which loops to arrow 7 (up), which loops to arrow 4 (down), which loops to arrow 2 (up). To the right, a crossover connection (not observed) is similar but arrow 1 (down) points to arrow 2 (up) which loops to arrow 5 (down), which loops to arrow 4 (up), which loops to arrow 7 (down), which loops to arrow 8 (up), which loops to arrow 6 (down), which loops to arrow 3 (up). The string leaving arrow 6 runs across the string from arrows 4 to 5 in a highlighted spot. Part b shows a right-handed connection between beta strands as two downward facing arrows with a loop from the point of the right arrow to the base at the top of the left arrow. A left-handed connection between beta strands (very rare) is shown as two similar downward facing arrows, except that the loop runs from the point of the left arrow to the base of the right arrow. Part c shows a twisted beta sheet as five roughly parallel arrows that point toward the right. The first two curve upward, the third one curves downward, the fourth one has a slight wave in the middle, and the fifth one twists slightly to the left.
Following these rules, complex motifs can be built up from simple ones. For example, a series of β-α-β loops arranged so that the β strands form a barrel creates a particularly stable and common motif, the α/β barrel (Fig. 4-19). In this structure, each parallel β segment is attached to its neighbor by an α-helical segment. All connections are right-handed. The α/β barrel is found in many enzymes, often with a binding site (for a cofactor or a substrate) in the form of a pocket near one end of the barrel. Note that domains with similar folding patterns are said to have the same motif, even though their constituent α helices and β sheets may differ in length.
There is a beta alpha beta loop on the left. This has two parallel arrows pointing upward and to the right. The left top arrow is shorter and has a string from its tip that coils down to a helix that runs beneath the two arrows and ends with a strong that runs to the base of the bottom arrow, which is longer. Dotted lines from the top and bottom of the structure show how it fits into the bottom center of an alpha / beta barrel that has many similar structures arranged around it, all with helices toward the outsides and arrows toward the inside, forming an overall roughly circular shape.
Although many proteins contain well-folded and stable structures, this is not necessary for the biological function of all proteins. Many proteins or protein segments lack ordered structures in solution. The concept that some proteins function in the absence of a definable three-dimensional structure comes from reassessment of data from many different proteins. As many as a third of all human proteins may be unstructured or may have significant unstructured segments. All organisms have some proteins that fall into this category. Intrinsically disordered proteins have properties that are distinct from those of classical, structured proteins. They often lack a hydrophobic core and instead are characterized by high densities of charged amino acid residues such as Lys, Arg, and Glu. Pro residues are also prominent, as they tend to disrupt ordered structures.
Structural disorder and high charge density can facilitate the function of some proteins as spacers, insulators, or linkers in larger structures. Other disordered proteins are scavengers, binding up ions and small molecules in solution and serving as reservoirs or garbage dumps. However, many intrinsically disordered proteins are at the heart of important protein interaction networks. The lack of an ordered structure can facilitate a kind of functional promiscuity, allowing one protein to interact with multiple or even dozens of partners. Structural disorder allows some inhibitor proteins, such as the mammalian cell division protein p27, to interact with multiple targets in different ways. In solution, p27 lacks definable structure. However, it wraps around and inhibits the action of several enzymes called protein kinases (see Chapter 6) that facilitate cell division. The flexible structure of p27 allows it to accommodate itself to its different target proteins. Human tumor cells, which are cells that have lost the capacity to control cell division normally, generally have reduced levels of p27; the lower the levels of p27, the poorer the prognosis for the cancer patient.
Similarly, intrinsically disordered proteins are often present as hubs or scaffolds at the center of protein networks that constitute signaling pathways (see Fig. 12-30). These proteins, or parts of them, may interact with many different binding partners. They often take on an ordered structure when they interact with other proteins, but the structure they assume may vary with different binding partners. The mammalian protein p53 is also critical in the control of cell division. It contains both structured and unstructured segments, and the different segments interact with dozens of other proteins. An unstructured region of p53 at the carboxyl terminus interacts with at least four different binding partners and assumes a different structure in each of the complexes (Fig. 4-20).
Part a shows a faint outer structure of an almost oval protein that is slightly wider at the lower right. It contains several ribbon structure arrows across the middle. The N-terminus is shown as a dotted line coming in from the right behind these arrows. A double helix is shown at the bottom, ending with a dotted line labeled C terminus. Part b has a graph that plots amino acid residues on the horizontal axis ranging from 0 to 400, labeled in increments of 100, against the P O N D R score on the vertical axis ranging from 0 to 1.0, labeled in increments of 0.5. There is a horizontal strip running from 0.5 on the vertical axis that is divided into three segments before it ends with a vertical strip at 380 on the horizontal axis. Segment 1 runs from 0 to 90 on the horizontal axis, segment 2 runs from 90 to 280 on the horizontal axis, and segment 3 runs from 280 to 380 on the horizontal axis. A line runs from the N terminus of the protein in part a to segment 1, a line runs from the double helix in the protein to segment 2, and a line runs from the C terminus to segment 3. The vertical strip has four segments. The first is wider than the others and runs from 0.25 to 0.4 on the vertical axis, the second runs from 0.4 to 0.5 on the vertical axis, the third runs from 0.5 to 0.6 on the vertical axis, and the fourth runs from 0.6 to 0.7 on the vertical axis. Lines run from each segment to short dotted strands that join to the surface contour views of four different proteins. Bottom vertical segment: a roughly comma-shaped protein, s 100 B (beta beta), with a colored portion extending in the smaller curve to the left to end in the short strand; second vertical segment: the strand joins a thin colored portion that joins with an almost spherical portion of a contour surface view of C B P bromo-domain; third segment: the string ends at a small, roughly rectangular colored segment inserted into the bottom of the upper left half of a roughly oval vertical protein shown in surface contour view, labeled sirtuin; top segment: the string runs to a small, roughly rectangular colored portion of an oval protein, cyclin A. All data on the graph are approximate. The curve on the graph begins at (0, 1.0) in segment 1 for the N terminus, drops rapidly to (50, 0.25), rises to (40, 0.8), drops to (50, 0.75), rises to (60, 95), drops slightly at (54, 90), then rises to (75, 1.0) to run straight along the top into the region of segment 2, where it drops rapidly to (101, 0.05), then drops to (105, 0), runs along the horizontal axis, then rises to (160, 80), drops to (180, 25), rises to (185, 0.5), drops to (205, 0.2), rises to (215, 0.5), drops to (235, 0.1), rises to (250, 0.4), drops to (270), 0.3), rises to (270, 0.85), drops slightly as it enters segment 3 for the C terminus to (305, 0.80), rises to (310, 1.0), drops to (330, 0.25), rises to (335, 0.6), drops to (340, 0.25), rises to (350, 0.75), drops to (360, 0.60), rises to (370, 0.95), drops to (385, 0.5) within the vertical strip, and rises to (380, 1.0).
More than 150,000 structures are now archived in the Protein Data Bank (PDB; for a deeper explanation, see Box 4-3). An enormous amount of information about protein structural principles, protein function, and protein evolution is contained in these data. Other databases have organized this information and made it more readily accessible. In the Structural Classification of Proteins database, or SCOP2 (http://scop2.mrc-lmb.cam.ac.uk), all of the protein information in the PDB can be searched within four different categories: (1) protein relationships, (2) structural classes, (3) protein types, and (4) evolutionary events. Figure 4-21 presents examples of protein motifs taken from SCOP2 to illustrate the potential of searching within each category. The figure also introduces another way to represent elements of secondary structure and the relationships among segments of secondary structure in a protein — the topology diagram.
Part a is labeled, structurally similar, different sequence and organism. Two proteins are shown. The left-hand protein is labeled, 2 J H F, alcohol dehydrogenase, italicized Equus italicized caballus, horse. The protein is roughly a vertical oval with the N terminus near the upper left side and the C terminus at the top. It has multiple ribbon arrows at the top, and near the N terminus as well as in a line near the bottom. It also has helices, mostly in the center with two extending below the row of arrows. The right-hand protein is labeled, 1 F 8 F, alcohol dehydrogenase, italicized Acinetobacter, italicized calcoaceticus, bacterium found in human intestinal microbiota. The structure is very similar to the left-hand structure. Part b shows a topology diagram that consists of interconnected cylinders and arrows. There are two antiparallel arrows on the upper left in a rectangle. The N terminus connects to the right-hand arrow, labeled 1, that points down. It connects by a strand to arrow 2, which points up. The string bends right across the top of both arrows and leaves the box, then enters a larger box to the right containing seven arrows, with 11 above 34 on the left both pointing up, then down arrow 3, up arrow 5, down arrow 6, up arrow 13, and up arrow 31. The string crosses above arrows 11 above 34, and enters downward arrow 3. The string then continues out of the box into cylinder 4 before looping back up to upward arrow 5, adjacent to arrow 3. The string leaves arrow 5 and curves down to adjacent down arrow 6, then continues out of the box and enters and leaves cylinders 7, 8, 9, and 10 in a vertical line going downward before looping up along the side of another rectangle to the lower left and back up to arrow 11, which is to the left of arrow 3. It leaves arrow 11 and enters a tiny cylinder labeled, 12 above the box, then crosses above arrows 3, 5, and 6, to loop back down into the box and enter the bottom of arrow 13, then up out of the box to go through small cylinders 14 and 15 and larger cylinder 16 along a vertical line before looping back down along the right side of the diagram and bending back to the left to travel through arrow 17, which points to the left. Arrow 17 is in a box with all of the arrows pointing left. From top to bottom, they are 21, 19, 17, 35, 25, and 27. The arrow continues through cylinder 18 outside of the box, loops back to the base of arrow 19 and then out to cylinder 20, loops back to the base of cylinder 21, then out through cylinder 22, then back down and through the center of the box to enter the back of arrow 23, then out through cylinder 24, then down across the bottom of the box to loop around into arrow 25, then out through cylinder 26, then back into the box to loop into the base of arrow 27, then out through small cylinder 28, then cylinder 29 (colored the same as the arrows unlike the other cylinders), then through small cylinder 30 before looping back to the left and running up to arrow 31 in the rectangle above, then out through small cylinder 32, larger cylinder 33, and looping back to the left above the box and then down to the base of arrow 34 on the left side of the box and then up out of the box to end at the C terminus. Part c shows two molecules. The left-hand molecule is labeled All alpha, 1 B C F, ferritin-like, bacterioferritin (cytochrome italicized b subscript 1), italicized Escherichia, italicized coli. It consists of four roughly vertical helices connected by strands, with one strand extending out of the top and one strand ending with a small horizontal helix that runs across the bottom and extends a short strand to the left. The right-hand molecule is labeled all beta, 1 P E X, four-bladed beta propeller, collagenase-3 (M M P-13), human (italicized Homo, italicized sapiens). The molecule has a roughly circular structure that begins with a small string at the lower left that joins to a short helix that runs to the tip of a right-pointing arrow before looping through three more arrows to a very short curl of a helix on the lower left, then up to seven short arrows, then over to a coil of a helix at the upper right and through an almost horizontal row of four arrows before ending on the right side.
The number of folding patterns is not infinite. Among the tens of thousands of distinct protein structures archived in the PDB, only about 1,400 different folds or motifs are classified by the SCOP2 database. Given the many years of progress in structural biology, new motifs are now discovered only rarely. Many examples of recurring domain or motif structures are available, and these reveal that protein tertiary structure is more reliably conserved than amino acid sequence. The comparison of protein structures can thus provide much information about evolution. Proteins with significant similarity in primary structure and/or with similar tertiary structure and function are said to be in the same protein family. The protein structures in the PDB belong to about 4,000 different protein families. A strong evolutionary relationship is usually evident within a protein family. For example, the globin family has many different proteins with both structural and sequence similarities to myoglobin (as seen in the proteins used as examples in Figures 4-30 and 4-31 and in Chapter 5). Two or more families that have little similarity in amino acid sequence but make use of the same major structural motif and have functional similarities are grouped into superfamilies. An evolutionary relationship among families in a superfamily is considered probable, even though time and functional distinctions — that is, different adaptive pressures — may have erased many of the telltale sequence relationships.
A protein family may be widespread in all three domains of cellular life — the Bacteria, Archaea, and Eukarya — suggesting an ancient origin. Many proteins involved in intermediary metabolism and the metabolism of nucleic acids and proteins fall into this category. Other families may be present in only a small group of organisms, indicating that the structure arose more recently. Tracing the natural history of structural motifs through the use of structural classifications in databases such as SCOP2 provides a powerful complement to sequence analyses in tracing evolutionary relationships. The SCOP2 database is curated manually, with the objective of placing proteins in the correct evolutionary framework based on conserved structural features.
Structural motifs become especially important in defining protein families and superfamilies. Improved protein classification and comparison systems lead inevitably to the elucidation of new functional relationships. Given the central role of proteins in living systems, these structural comparisons can help illuminate every aspect of biochemistry, from the evolution of individual proteins to the evolutionary history of complete metabolic pathways.
Many proteins have multiple polypeptide subunits (from two to hundreds). The association of polypeptide chains can serve a variety of functions. Many multisubunit proteins have regulatory roles; the binding of small molecules may affect the interaction between subunits, causing large changes in the protein’s activity in response to small changes in the concentration of substrate or regulatory molecules (Chapter 6). In other cases, separate subunits take on separate but related functions, such as catalysis and regulation. Some associations, such as those seen in the fibrous proteins considered earlier in this chapter and the coat proteins of viruses, serve primarily structural roles. Some very large protein assemblies are the site of complex, multistep reactions. For example, each ribosome, the site of protein synthesis, incorporates dozens of protein subunits along with RNA molecules.
A multisubunit protein can also be referred to as an oligomer or multimer. If an oligomer has nonidentical subunits, the overall structure of the protein can be asymmetric and quite complicated. However, many oligomers have identical subunits or repeating groups of nonidentical subunits, usually in symmetric arrangements. As noted in Chapter 3, the repeating structural unit in such an oligomeric protein, whether a single subunit or a group of subunits, is called a protomer.
The first oligomeric protein to have its three-dimensional structure determined was hemoglobin which contains four polypeptide chains and four heme prosthetic groups, in which the iron atoms are in the ferrous state (as we shall see in Chapter 5). The protein portion, the globin, consists of two α chains (141 residues each) and two β chains (146 residues each). Note that in this case, α and β do not refer to secondary structures. In a practice that can be confusing to the beginning student, the Greek letters α and β (and γ, δ, and others) are often used to distinguish two different kinds of subunits within a multisubunit protein, regardless of what kinds of secondary structure may predominate in the subunits. Because hemoglobin is four times as large as myoglobin, much more time and effort were required to solve its three-dimensional structure by x-ray analysis, finally achieved by Max Perutz, John Kendrew, and their colleagues in 1959. The subunits of hemoglobin are arranged in symmetric pairs (Fig. 4-22), each pair having one α subunit and one β subunit. Hemoglobin can therefore be described either as a tetramer or as a dimer of αβ protomers. The role these distinct subunits play in hemoglobin function is discussed extensively in Chapter 5.
Part a shows a roughly circular structure with slight openings at the top center and bottom center, roughly divided into four quadrants. It is made up of many helices. The top half is wider on the left and narrower on the right and is a different color from the bottom half. The top right and bottom right quadrants are darker shades of the same color of the corresponding left-hand quadrants. Small ball and stick models are visible in the center of the upper left and lower right portions and farther back in the lower left and upper right portions. Part b shows the same structure as a surface contour model with a rough surface. The upper left and lower right appear to be in front of the other portions. The ball-and-stick models are still visible in the upper left and lower right, but not in the other portions.