What is the Richardson Lab Excited about Now? · Seeing the Invisible: 50 Years of Macromolecular Visualization

All-atom contact visualization and the MolProbity model-validation website

The MolProbity web service has been used by the worldwide structural biology community since 2002 to measure overall model quality and to identify and correct local errors, in protein and nucleic acid three-dimensional structures (Williams 2018). Central to that validation is the RLab’s all-atom contact method (Word 1999), which adds and optimizes all of the hydrogen (H) atoms in the structure and then calculates and visualizes (with color-coded contact dots and spikes) both good and bad contacts between atoms.

Figure 1- a) Key to MolProbity visualization of good and bad all-atom contacts; b) In very high-resolution structures, contacts are typically near-ideal (PDB file 1gci); c) MolProbity includes other model validation criteria to complement the all-atom contacts; d) The combined visual validation-outlier markup guides correction of local modeling errors (PDB file 1j58).

Resolution: the most crucial variable

Resolution quantifies how close are the features that can be seen as separate. For molecules, the natural unit of resolution is Ångström (1Å=10^-10m), because 1-2 Å is the distance between most covalently bonded atoms. There are 10 billion Å in a meter.

Figure 2 – The appearance of electron density for parts of the T4 phage lysozyme structure at different resolutions: a) At 4Å, backbone connectivity is pretty unambiguous, and the shape of protein α-helices and DNA or RNA double-helices is fairly clear, but very few sidechains can be identified; b) At 3Å, beta-sheets, loops, and most sidechains are seen, but specific backbone and sidechain conformations are unclear; c) At 2Å (or at 2.5Å), the carbonyl oxygens along the backbone are finally visible, defining the orientation of the peptides, and sidechain shapes clearly identify the amino-acid sequence; d) At 1Å (rarely achieved), almost all atoms can be seen separately. Source: 4gbr 3.99Å, 5zbh 3.0Å, 1lyi 2.0Å, 5jdt 1.0Å.

At 2Å resolution, the most common for crystal structures, the electron density is sufficiently detailed for building a good atomic model. If all of MolProbity’s criteria score well and the fit of model to map is good, then that model is (almost surely) essentially correct. That assurance, however, vanishes by 3Å resolution, at which point sidechain shapes, and crucially the carbonyl oxygens, disappear into blobs or tubes and thus the articulation between backbone peptide units is unclear and sidechains are often mispositioned.

The recent cryoEM revolution, a marvel and a danger

Over the last few years, revolutionary advances in both hardware and software have enabled single-particle electron microscopy at cryogenic temperatures (cryoEM) to jump from imaging large blobs to achieving 4Å resolution or better. This approach is especially valuable for very large, dynamic molecular machines and for membrane proteins, both of which are extremely important to biology and medicine yet are very difficult to crystallize.

Figure 3 - CryoEM structure at 4Å resolution of a membrane receptor that responds both to cold and to menthol (PDB file 6nr2; Yin 2019). (Above) Side and top views of the three-dimensional density map reconstructed from thousands of two-dimensional EM images. (Below) Ribbon diagram of the atomic model. Seok-Yong Lee’s group collected this data at the Duke cryoEM facility.

The challenge is that, as seen in Figure 2, density at 3 to 4Å is very broad and is equally compatible with many rather different atomic models. Discriminating correctly among those model possibilities requires a great deal of additional information. CryoEM (or X-ray) models at these resolutions are often provided with extra information by refinement against traditional validation criteria. The resulting model is somewhat better, on average, but that refinement makes those criteria meaningless for validation. It can even make local conformations worse by hiding outliers without fixing the underlying problems (Richardson 2018). With all outliers refined away, MolProbity has been misleadingly assigning much better scores to these structures than they actually deserve.

This situation is dangerous, because adding up local errors makes larger-scale errors more likely, such as wrong chain connectivity or stretches of sequence out of register from the correct placement. Those errors, in turn, can invalidate biological conclusions and lead to paper retractions, as happened in crystallography around 1990 and motivated the initial development of validation methods.

New validation methods to the rescue

To lessen this danger, new validation methods that cover wider regions than one repeating unit and that are not already used as refinement targets are thus needed for cryoEM. The Richardson lab feels partly responsible for the situation and is well placed to help, so this issue is their primary current focus. They have already developed and tested the first such new validation, called CaBLAM (Williams 2018; Prisant 2020). It uses the pattern formed by five successive Cα atoms (the most reliably modeled atoms at these resolutions) to figure out what the local conformation should be, and then tests whether the peptide orientations (measured by the relative position of carbonyl oxygens along the backbone) are compatible with that local Cα conformation. In the 2019 Model Metric Challenge, % CaBLAM outliers was the metric best correlated with model accuracy. Importantly, many CaBLAM outliers can be fixed.

Figure 4 – (Left) Two successive CaBLAM outliers (magenta) at the end of an α-helix (PDB file 4hel). (Right) Fixing the modeling error by reorienting the central carbonyl oxygen (red ball) is shown correct by a higher-resolution structure (PBD file 3osx).

Several additional independent validation metrics for 3-4Å are needed to make serious errors less likely. The Richardson lab and other groups are thus hard at work on such developments.

References

Prisant MG, Williams CJ, Chen VB, Richardson JS, Richardson DC (2020) Protein Sci 29: 315-329
Richardson JS, Williams CJ, Videau LL, Chen VB, Richardson DC (2018) J Struct Biol 204: 301-312
Williams CJ, Hintze BJ, Headd JJ, Moriarty NW, Chen VB, et al. (2018) Protein Sci 27: 293-315
Word JM, Lovell SC, LaBean TH, Zalis ME, Presley BK, et al. (1999) J Mol Biol 285: 1711-1733
Yin Y, Le SC, Hsu AL, Borgnia MJ, Yang H, Lee S-Y (2019) Science 363: eaav9334
2019 Model Metric Challenge