This validation report was created based on the guidelines and recommendations from IHM TaskForce (Berman et al. 2019). The first version of the PDB-Dev validation report consists of four categories as follows:
1.1 Model composition : This section outlines model details and includes information on ensembles deposited, chains and residues of domains, model representation, software, protocol, and methods used. All deposited structures have this section.
1.2. Data quality assessment : Data quality assessments are only available for Small Angle Scattering datasets (SAS). This section was developed in collaboration with the SASBDB community. For details on the metrics, guidelines, and recommendations used, refer the 2017 community article (Trewhella et al. 2017). All experimental datasets used to build the model are listed, however, validation criteria for other experimental datasets are currently under development.
1.3. Model quality assessment : Model quality for models at atomic resolution is assessed using Molprobity (Williams et al. 2018), consistent with PDB. Model quality for coarse-grained or multi-resolution structures are assessed by computing excluded volume satisfaction based on reported distances and sizes of beads in the structures.
1.4. Fit to data used to build the model : Fit to data used to build the model is only available for SAS datasets. This section was developed in collaboration with the SASBDB (Valentini et al. 2015). For details on the metrics, guidelines, and recommendations used, refer the 2017 community article (Trewhella et al. 2017). All experimental datasets used to build the model are listed, however, validation criteria for other experimental datasets are currently under development.
A fifth category, fit to data used to validate the model, is under development.
2.1 Overall Quality Assessment : This is a set of plots that represent a snapshot view of the validation results. There are four tabs, one for each validation criterion: (i) model quality, (ii) data quality, (iii) fit to data used for modeling, and (iv) fit to data used for validation.
2.1.1. Model quality : For atomic structures, MolProbity is used for evaluation. We evaluate bond outliers, side chain outliers, clash score, rotamer satisfaction, and ramachandran dihedral satisfaction (Williams et al. 2018) . Details on MolProbity evaluation and tables can be found here. For coarse-grained structures of beads, we evaluate excluded volume satisfaction. An excluded volume violation or overlap between two beads occurs if the distance between the two beads is less than the sum of their radii (S. J. Kim et al. 2018). Excluded volume satisfaction is the percentage of pair distances in a structure that are not violated (higher values are better).
2.1.2. Data quality : Data quality assessments are only available for SAS datasets. The current plot displays radius of gyration (Rg) for each dataset used to build the model. Rg is obtained from both a P(r) analysis (see more here), and a Guinier analysis (see more here).
4.1. SAS: Scattering Profiles : Data from solutions of biological macromolecules are presented as both log I(q) vs q and log I(q) vs log (q) based on SAS validation task force (SASvtf) recommendations (Trewhella et al. 2017). I(q) is the intensity (in arbitrary units) and q is the modulus of the scattering vector.
4.3. SAS: Flexibility Analysis : Flexibility of chains are assessed by inferring Porod-Debye and Kratky plots. In a Porod-Debye plot, a clear plateau is observed for globular (partial or fully folded) domains, whereas fully unfolded domains are devoid of any discernible plateau. For details, refer to Figure 5 in Rambo and Tainer, 2011 (Rambo and Tainer 2011). In a Kratky plot, a parabolic shape is observed for globular (partial or fully folded) domains and a hyperbolic shape is observed for fully unfolded domains.
4.4. SAS: P(r) Analysis : P(r) represents the distribution of distances between all pairs of atoms within the particle weighted by the respective electron densities (Moore 1980) . P(r) is the Fourier transform of I(s) (and vice versa). Rg can be estimated from integrating the P(r) function. Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Rg is a measure for the overall size of a macromolecule; e.g. a protein with a smaller Rg is more compact than a protein with a larger Rg, provided both have the same molecular weight (Mw). The point where P(r) is decaying to zero is called Dmax and represents the maximum size of the particle. The value of P(r) should be zero beyond r=Dmax.
4.5. SAS: Guinier Analysis : Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. The linearity of the Guinier plot is a sensitive indicator of the quality of the experimental SAS data; a linear Guinier plot is a necessary but not sufficient demonstration that a solution contains monodisperse particles of the same size. Deviations from linearity usually point to strong interference effects, polydispersity of the samples or improper background subtraction (Feigin and Svergun 1987). Residual value plot and coefficient of determination (R2) are measures to assess linear fit to the data. A perfect fit has an R2 value of 1. Residual values should be equally and randomly spaced around the horizontal axis.
Excluded volume assessments are performed for coarse-grained structures and MolProbity analysis is performed for atomic structures.
5.1a. Excluded Volume Analysis : Excluded volume violation is defined as percentage of overlaps between coarse-grained beads in a structure. This percentage is obtained by dividing the number of overlaps/violations by the total number of pair distances in a structure. An overlap or violation between two beads occurs if the distance between the two beads is less than the sum of their radii (S. J. Kim et al. 2018).
5.1b. Molprobity Analysis : Molprobity analysis for atomic structures reported is consistent with PDB standards for X-ray structures (Williams et al. 2018). Summarized information is available in both the HTML and PDF reports. Detailed information is available for download as csv files, both from the HTML and the PDF reports. Please refer to the PDB user guide for details.
6.2. SAS: Cormap Analysis : ATSAS datcmp (Manalastas-Cantos et al. 2021) was used for hypothesis testing, using the null hypothesis that all data sets (i.e. the fit and the data collected) are similar. The reported p-value is a measure of evidence against the null hypothesis; the smaller the value, the stronger the evidence that the null hypothesis should be rejected.
8.3.1. Atomic structural coverage : Percentage of modeled structure or residues for which atomic structures are available. These structures can include X-ray, NMR, EM, and other comparative models.
8.3.2. Rigid bodies : A rigid body consists of multiple coarse-grained (CG) beads or atomic residues. In a rigid body, the beads (or residues) have their relative distances constrained during conformational sampling.
8.3.3. Flexible units : Flexible units consist of strings of beads that are restrained by the sequence connectivity.
8.3.4. Interface units : An automatic definition based on identified interface for each model. Applicable to models built with HADDOCK.
8.3.5. Resolution : An automatic definition based on identified interface for each model. Applicable to models built with HADDOCK.
8.5.1. Sampling validation : Validation metrics used to assess sampling convergence for stochastic sampling. Sampling precision is defined as the largest allowed Root-mean-square deviation (RMSD) between the cluster centroid and a model within any cluster in the finest clustering for which each sample contributes structures proportionally to its size (considering both the significance and magnitude of the difference) and for which a sufficient proportion of all structures occur in sufficiently large clusters (Viswanath et al. 2017).
8.5.10. Excluded volume satisfaction :Assessment of excluded volume satisfaction of coarse-grained beads in the modeled structure. Excluded volume between two beads not connected in sequence are satisfied if the distance between them is greater than that of the sum of their radii. See section 5 for more details.