In this chapter we have presented many types of protein structures. How were these structures determined? Structural biology is the study of the three-dimensional structures of biomolecules, including proteins, nucleic acids, lipid membranes, and oligosaccharides. Structural biologists combine biochemical approaches with physical tools and computational methods to obtain these structures. Structural biology is extraordinarily powerful for elucidating the relationships between the structure and function of proteins, the molecular basis for enzymatic catalysis and ligand binding, and evolutionary relationships between proteins. Here we focus primarily on three commonly used methods in structural biology: x-ray crystallography, nuclear magnetic resonance (NMR), and cryo-electron microscopy (cryo-EM). Which method a structural biologist uses depends on the system being studied and what information is to be learned. Often structural biologists combine multiple methods to provide a more complete view of function.
Increasingly, computational methods such as molecular dynamics simulations and in silico protein folding are proving to be essential for structural biologists, and these are discussed in Box 4-5.
The spacing of atoms in a crystal lattice can be determined by measuring the locations and intensities of spots produced on a detector by a beam of x-rays of given wavelength, after the beam has been diffracted by the electrons of the atoms. For example, x-ray analysis of sodium chloride crystals shows that and ions are arranged in a simple cubic lattice. The spacing of the different kinds of atoms in complex organic molecules, even very large ones such as proteins, can also be analyzed by x-ray diffraction methods. However, the technique for analyzing crystals of complex molecules is far more laborious than the technique for analyzing simple salt crystals. When the repeating pattern of the crystal is a molecule as large as, say, a protein, the numerous atoms in the molecule yield thousands of diffraction spots that must be analyzed by computer.
Consider how images are generated in a light microscope. Light from a point source is focused on an object. The object scatters the light waves, and these scattered waves are recombined by a series of lenses to generate an enlarged image of the object. The smallest object whose structure can be determined by such a system — that is, the resolving power of the microscope — is determined by the wavelength of the light, in this case visible light, with wavelengths in the range of 400 to 700 nm. Objects smaller than half the wavelength of the incident light cannot be resolved. To resolve objects as small as proteins we must use x-rays, with wavelengths in the range of 0.7 to 1.5 Å (0.07 to 0.15 nm). However, there are no lenses that can recombine x-rays to form an image; instead, the pattern of diffracted x-rays is collected directly and an image is reconstructed by mathematical techniques.
The amount of information obtained from x-ray crystallography depends on the degree of structural order in the sample. Some important structural parameters were obtained from early studies of the diffraction patterns of the fibrous proteins arranged in regular arrays in hair and wool. However, the orderly bundles formed by fibrous proteins are not crystals — the molecules are aligned side by side, but not all are oriented in the same direction. More-detailed three-dimensional structural information about proteins requires a highly ordered protein crystal. The structures of many proteins are not yet known, simply because they have proved difficult to crystallize. Practitioners have compared making protein crystals to holding together a stack of bowling balls with cellophane tape.
Operationally, there are several steps in x-ray structural analysis (Fig. 4-30). A crystal is placed in an x-ray beam between the x-ray source and a detector, and a regular array of spots, called reflections, is generated. The spots are created by the diffracted x-ray beam, and each atom in a molecule makes a contribution to each spot. An electron-density map of the protein is reconstructed from the overall diffraction pattern of spots by a mathematical technique called a Fourier transform. In effect, the computer acts as a “computational lens.” A model for the structure is then built that is consistent with the electron-density map.
Part a shows a bright circle against a dark background. There is a dark rod extending up from its center out of the frame. There are many small bright dots. A lighter circular ring is visible within the overall bright circle. An arrow beneath a picture of a computer screen points to part b, which shows an irregular shape made of many lines. There is a circle in the center within an X-shaped opening in a larger roughly circular structure with protrusions on all sides and openings in the center of the top, left, and right portions. An arrow beneath a picture of a computer screen points to part c, which shows the same structure with a chemical structure overlain on it. At the top, left, right, and bottom, there are hexagonal rings. These surround the openings on the top, left, and right portions. Each ring has its vertex pointing into the center and is connected to the next ring by two segments that forms a V with its apex outward. Each ring has two vertices joined with the rings on its sides and a short segment extending from one vertex. The left-hand ring has a V-shaped segment extending from the bottom left vertex, the top ring has a similar structure extending from the top left vertex, the right-hand ring has a longer segment that extends right and then down extending from its lower right vertex, and the bottom ring has a longer segment similar to that of the right ring extending from its right bottom vertex. An arrow beneath a picture of a computer screen points to part d, where helices are shown with a ball-and-stick model in the upper center right that contains four large rings in the center and four smaller rings toward the outside. The ball-and-stick model resembles the structure from part c.
John Kendrew found that the x-ray diffraction pattern of crystalline myoglobin (isolated from muscles of the sperm whale) is highly complex, with nearly 25,000 reflections. Computer analysis of these reflections took place in stages. The resolution improved at each stage until, in 1959, the positions of virtually all the nonhydrogen atoms in the protein had been determined. The amino acid sequence of the protein, obtained by chemical analysis, was consistent with the molecular model. Over 100,000 protein structures, many of them much more complex than myoglobin, have since been determined to a similar level of resolution by x-ray crystallography.
The physical environment in a crystal, of course, is not identical to that in solution or in a living cell. A crystal imposes a space and time average on the structure deduced from its analysis, and x-ray diffraction studies provide little information about molecular motion within the protein. The conformation of proteins in a crystal can also be affected by nonphysiological factors such as incidental protein-protein contacts within the crystal. However, when structures derived from the analysis of crystals are compared with structural information obtained by other means (such as NMR, as described below), the crystal-derived structure almost always represents a functional conformation of the protein.
An advantage of nuclear magnetic resonance (NMR) studies is that they are carried out on macromolecules in solution, whereas x-ray crystallography is limited to molecules that can be crystallized. NMR can also illuminate the dynamic side of protein structure, including conformational changes, protein folding, and interactions with other molecules.
NMR is a manifestation of nuclear spin angular momentum, a quantum mechanical property of atomic nuclei. Only certain atoms, including , , , , and , have the kind of nuclear spin that gives rise to an NMR signal. Nuclear spin generates a magnetic dipole. When a strong, static magnetic field is applied to a solution containing a single type of macromolecule, the magnetic dipoles are aligned in the field in one of two orientations: parallel (low energy) or antiparallel (high energy). A short (∼10 μs) pulse of electromagnetic energy of suitable frequency (the resonant frequency, which is in the radio frequency range) is applied at right angles to the nuclei aligned in the magnetic field. Some energy is absorbed as nuclei switch to the high-energy state, and the absorption spectrum that results contains information about the identity of the nuclei and their immediate chemical environment. The data from many such experiments on a sample are averaged, increasing the signal-to-noise ratio, and an NMR spectrum such as that in Figure 4-31 is generated.
Part a shows a graph with superscript 1 H chemical shift on the horizontal axis ranging from 10 to beyond minus 2, labeled in increments of 2. The line begins with small peaks, then has two slightly larger peaks at 8 and just past 8, then rises to two peaks at 7.4 and 6.3, then decreases with two small peaks, then has a peak at about one-fourth the height of the vertical axis at 4.9, then the entire line forms a curve with many small peaks up to a height of just under half the height of the vertical axis at 4.0, then it decreases to a height of about one-eighth the height of the vertical axis at 3.2, has a sharp peak at just under half the height of the vertical axis at 3.0, then increases to a sharp peak of one-half the height of the vertical axis at 1.8, then has multiple peaks up to seven eights of the height of the vertical axis at 1.5, drops to about half the height of the vertical axis at 1.4, then to about one-third the height of the vertical axis before a sharp peak at almost the top of the vertical axis at 0.8, then a peak at just over half the height of the vertical axis at 0.5, then a decrease with smaller peaks to just above the horizontal axis at 0.0 with a few smaller peaks until minus 1 and then a flat horizontal line until two small peaks at minus 2.2. Part b is a graph that plots superscript 1 H chemical shift in p p m on the horizontal axis from 10.0 to beyond minus 2, labeled in increments of 2, against superscript 1 H chemical shift in p p m on the right vertical axis with the same scale. A diagonal line extends from just past (10.0, 10.0) to (minus 1.0, minus 1.0). Small dots are visible through the graph, especially in a region between (5.0, 5.0) and (1.0, 5.0) at the bottom and (5.0, 1.0), (1.0, 1.0) at the top. A dotted vertical line extends from (7, 8) on the diagonal line to a square labeled 1 at (7.0, 4.0). A dotted horizontal line extends from the diagonal line at (4, 4) to the same square. A dotted vertical line extends from (9.5, 9.5) on the diagonal line to square 2 at (9.5, minus 1). A dotted horizontal line extends from (minus 1.0, minus 1.0) to the same point. All data are approximate. An arrow under a computer points down to part c, which shows a ball-and-stick model of multiple rings in a plane next to another ball-and-stick model with several spheres connected by lines, with helices in the background. An arrow under a computer points to part d, which shows a structure outlined by many lines forming roughly four pieces at the upper left, upper right, lower left, and lower right and with a highlighted multiring structure resembling the ring structure from part c at the upper left.
is particularly important in NMR experiments because of its high sensitivity and natural abundance. For macromolecules, NMR spectra can become quite complicated. Even a small protein has hundreds of atoms, typically resulting in a one-dimensional NMR spectrum too complex for analysis. Structural analysis of proteins became possible with the advent of two-dimensional NMR techniques (Fig. 4-31b, c, d). These methods allow measurement of distance-dependent coupling of nuclear spins in nearby atoms through space (the nuclear Overhauser effect (NOE), in a method dubbed NOESY) or the coupling of nuclear spins in atoms connected by covalent bonds (total correlation spectroscopy, or TOCSY).
Translating a two-dimensional NMR spectrum into a complete three-dimensional structure can be a laborious process. The NOE signals provide some information about the distances between individual atoms, but for these distance constraints to be useful, the atoms giving rise to each signal must be identified. Complementary TOCSY experiments can help identify which NOE signals reflect atoms that are linked by covalent bonds. Certain patterns of NOE signals have been associated with secondary structures such as α helices. Genetic engineering (Chapter 9) can be used to prepare proteins that contain the rare isotopes or . The new NMR signals produced by these atoms, and the coupling with signals resulting from these substitutions, help in the assignment of individual NOE signals. The process is also aided by a knowledge of the amino acid sequence of the polypeptide.
To generate a three-dimensional structure, researchers feed the distance constraints into a computer along with known geometric constraints such as chirality, van der Waals radii, and bond lengths and angles. The computer generates a family of closely related structures that represent the range of conformations consistent with the NOE distance constraints (Fig. 4-31d). The uncertainty in structures generated by NMR is in part a reflection of the molecular vibrations (known as breathing) within a protein structure in solution, discussed in more detail in Chapter 5. Normal experimental uncertainty can also play a role.
Our understanding of highly complex processes such as gene expression, mitochondrial respiration, or viral infection is aided immensely by knowing the detailed molecular structures of the proteins that participate in these processes. However, it is often difficult to determine the molecular structure of large, dynamic, macromolecular complexes that contain dozens of individual protein subunits. Moreover, integral membrane proteins often resist crystallization once they are removed from their lipid environment, making their structures difficult to solve by x-ray diffraction, and many are too large for NMR. In principle, discrete objects in the diameter range 100 to 300 Å can be visualized by electron microscopy (EM). In practice, the high intensity of the EM beam often damages the specimen before a high-resolution image can be obtained. In cryo-electron microscopy (cryo-EM), a sample containing many individual copies of the structure of interest is quick-frozen in vitreous (or noncrystalline) ice and kept frozen while being observed in two dimensions with the electron microscope, greatly reducing damage to the specimen by the electron beam.
Particles such as purified, multisubunit enzymes, arranged randomly on the microscope grid, are visualized with the cryo-electron microscope. When cryo-EM is combined with powerful algorithms for transforming the two-dimensional structures of tens of thousands of individual, randomly oriented complexes into a three-dimensional composite, it is sometimes possible to determine molecular structures at a level comparable to that obtained by x-ray crystallography (Fig. 4-32). In favorable cases, the repetitive aspects—choice of objects to be included in the analysis, imaging of each object individually, and calculations to produce a three-dimensional structure from the huge number of two-dimensional images—can be automated. The EMDataResource (www.emdataresource.org) is a unified resource for accessing structure maps deposited into data banks and assigned EMDataBank (EMDB) accession codes.
Part a shows a micrograph with a grainy background with two types of structures visible, with examples enclosed in circles. One type is round with repeating subunits forming a ring. The other looks like a cylinder made up of four horizontal rings stacked together. Part b has an arrow pointing down from the cylinder to show a roughly square molecular structure with some open spaces between parts and an arrow pointing down from the circle to show a round, rough molecular structure with a circular opening in its center.
Many novel structures have now been obtained by cryo-EM without models based on prior x-ray or NMR structures. Since cryo-EM relies on imaging of single molecules of a complex, this technique can also be used to computationally sort the imaged particles and simultaneously determine structures of multiple conformational states. Cryo-EM has now been used to solve the structures of some of the most dynamic and largest molecular complexes in the cell, such as the human telomerase enzyme (Fig. 4-33). Telomerase is an essential enzyme for maintaining chromosome integrity in humans (see Chapter 26) and is the target of significant medical research due to its roles in aging and cancer. Cryo-EM was critical for the laboratories of Eva Nogales and Kathleen Collins in determining the architecture of telomerase due to the heterogeneity of the complex and because only minute quantities could be purified from human cells — far too little for crystallization, but enough to observe single molecules by cryo-EM.
The structure has an almost oval subunit at the upper left with smaller subunits above and below and a larger circular subunit below that joins with a small subunit to the left and a long thin subunit, the R N A, to the right that bends to the left at its bottom with a different subunit to its right. Within these subunits, many ribbons and pleated sheets are visible.