As of version 9.0, it is possible to perform PCA of NMR data sets directly from within the Mnova User Interface without having to resort to third party applications. The basic PCA functionality has been previously covered in this blog (see Chemometrics under Mnova 9 – PCA)and in this entry we are going to discuss in more detail some more practical aspects, particularly on the different binning, filtering and scaling options.
What follows has been kindly written by Silvia Mari (project leader of the PCA module) and Isaac Iglesias, who programmed this module in Mnova.
Introduction
Matrix generation from an array of NMR spectra is the core step in chemometric analysis. This procedure involves several options that the user should chose. In this entry we want to focus on the practical aspects concerning matrix preparation from NMR data. Broadly speaking, we can consider three main issues:
Choice of binning method: Sum vs Peak
Filtering or not filtering?
Choice of Scaling strategy
Choice of binning method: Sum vs Peak
When dealing with high resolution NMR spectra it is in general impracticable to work with the entire data points of the spectra which are usually in the order of 32Kb and bigger. The most common strategy used to reduce the number of variables consists in dividing each spectrum in a defined number of regions, the so called bins. Several binning strategies are available today, from regular binning, where bins have fixed width, to more sophisticated strategies such as gaussian or dynamic adaptive binning [1]. But even for these cases, when dealing with particularly crowded spectra, it usually happens that shifts in peaks close to bin boundaries can cause dramatic quantitative changes in adjacent bins. A good help in solving this problem could come from peak deconvolution strategies. Generally speaking, a deconvolved peak is a mathematical entity characterized by a chemical shift (frequency), intensity and half-height line width. The integral of a peak can be automatically derived assuming a peak shape (i.e. Lorentzian) and the intensity and line width. For this reason, binning a spectrum of deconvolved peaks reads out virtually completely the problem of bin boundaries as illustrated in figure 1.
Figure 1 – Binning real peaks versus binning deconvolved peaks
When dealing with an array of NMR spectra, whilst regular binning of a number b of bins over stacked spectra containing s spectra will generate a matrix bxs (see figure 2), it is not possible to generate a similar matrix using directly deconvolved peaks (peak list) since the number and position of peaks varies from spectrum to spectrum
Figure 2 – Matrix generation from regular binning or peak list.
To encompass this problem there are two main strategies: (1) provide algorithms for peak alignment over the spectra series, as well as strategies for dealing with missing peaks in order to end up with the same number of peaks and the same peak positions for all the spectra; (2) perform binning over the peak table.
In the PCA module available in Mnova, we adopt the second solution. User can decide whether to use regular binning (Sum) or binning over deconvolved peaks (Peak) from the binning options. An example of better classification is qualitatively represented in figure 3, where score plots are represented for binning using Sum method (panel A) and binning using Peak method (panel B).
Figure 3 – Score plots obtained using same bin width of 0.03ppm; in both cases data were normalized by the sum and pareto scaled. In panel A bins were obtained directly as integration of real spectra; in panel B bins were obtained by binning of the corresponding peak list obtained after global spectral deconvolution.
Filtering or not filtering?
When reducing bin width to approximate spectral resolution, and hence increasing the number of variables, it is generally required to introduce filtering strategies in order to filter out those variables that do not show significantly changes. There are established filtering strategies that are commonly applied to genomics type of data and that could also be successfully used for NMR-based type of data[1]. In the PCA module we have implemented five filtering options, namely:
Standard Deviation
Median Absolute Deviation
Interquartile Range
Mean Value
Median Value
In the first three cases a fixed fraction (default 10%) of the bins is discarded (e.g. if the matrix is composed by 100 bins it means that 10 bins are discarded) and the selection is based on the Filter method chosen. In the case of Mean Value or Median Value, user is asked to input a value for the Mean or the Median. By doing so, only bins that display a lower value of the inputted one are discarded. In the following figure, the difference in clustering capability when the filtering is applied or not is illustrated. Finally, it worth noting that very often, NMR data can contain regions which should discarded and included into the so called blind regions; these regions will not be taken into account in the principal component calculation.
Figure 4 - Score plots obtained using same bin width of 0.01ppm; in both cases data were normalized by the sum and pareto scaled. In panel A no filter was applied; in panel B filtering strategy based on Mean Value was applied. A cut-off value of 100 was used.
Choice of Scaling strategy
Scaling is an operation that is performed on the variables (columns) of the matrix. Scaling strategy depends from one hand from the biological information we wish to extract, but on the other hand also on the data analysis method chosen (in our case PCA). As a first approach the so-called Centeringis generally applied to every analysis. With Centering all bin values fluctuate around zero instead of around the mean of each bin; therefore Centering is a method that adjusts for differences in the offset between high and low abundant compounds. There are several methods available in literature for scaling [3], and generally centering is applied in combination with these methods. Scaling strategies could be divided in two subclasses:methods that use data dispersion (such as standard deviation) as scaling factor; and methods that use size measure (such as the mean). For the first group Mnova includes Auto, Pareto andVast scaling strategies. For the second group Range and Level scaling are available. Generally speaking, when dealing with PCA analysis, the first group is normally preferred. Figure 5 shows score plot differences between PCA that used Pareto scaling (A panel) in comparison with PCA that used Level scaling
Figure 5 - Score plots obtained using same bin width of 0.05 ppm and normalization by the sum. In panel A Pareto scaling was applied; in panel B Level scaling was applied.
References
[1] Amber J Hackstadt, Filtering for increased power for microarray data analysis. BMC Bioinformatics 2009, 10:11
[2] Paul E. Anderson, Metabolomics, Volume 7, Issue 2, pp 179-190 (2010)
[3] Robert A van den Berg, Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 2006, 7:142
[NMR paper] Practical aspects of NMR signal assignment in larger and challenging proteins.
Practical aspects of NMR signal assignment in larger and challenging proteins.
Related Articles Practical aspects of NMR signal assignment in larger and challenging proteins.
Prog Nucl Magn Reson Spectrosc. 2014 Apr;78C:47-75
Authors: Frueh DP
Abstract
NMR has matured into a technique routinely employed for studying proteins in near physiological conditions. However, applications to larger proteins are impeded by the complexity of the various correlation maps necessary to assign NMR signals. This article reviews the data analysis...
nmrlearner
Journal club
0
02-19-2014 03:12 PM
Practical aspects of high-sensitivity multidimensional 13C MAS NMR spectroscopy of perdeuterated proteins
Practical aspects of high-sensitivity multidimensional 13C MAS NMR spectroscopy of perdeuterated proteins
April 2012
Publication year: 2012
Source:Journal of Magnetic Resonance, Volume 217</br>
</br>
The double nucleus enhanced recoupling (DONER) experiment employs simultaneous irradiation of protons and deuterons to promote spin diffusion processes in a perdeuterated protein. This results in 4–5times higher sensitivity in 2D 13C–13C correlation experiments as compared to PDSD . Here, a quantitative comparison of PDSD, 1H-DARR, 2H-DARR, and 1H+ 2H DONER has been...
Practical Aspects of High-Sensitivity Multidimensional 13C MAS NMR Spectroscopy of Perdeuterated Proteins
Practical Aspects of High-Sensitivity Multidimensional 13C MAS NMR Spectroscopy of Perdeuterated Proteins
Publication year: 2012
Source:Journal of Magnetic Resonance</br>
Ümit Akbey, Barth-Jan van Rossum, Hartmut Oschkinat</br>
The double nucleus enhanced recoupling (DONER) experiment employs simultaneous irradiation of protons and deuterons to promote spin diffusion processes in a perdeuterated protein. This results in 4-5 times higher sensitivity in 2D 13C-13C correlation experiments as compared to PDSD. Here, a quantitative comparison of PDSD, 1H-DARR, 2H-DARR, and...
nmrlearner
Journal club
0
03-09-2012 09:16 AM
Expanding the utility of NMR restraints with paramagnetic compounds: Background and practical aspects
Expanding the utility of NMR restraints with paramagnetic compounds: Background and practical aspects
Publication year: 2011
Source:Progress in Nuclear Magnetic Resonance Spectroscopy, Volume 59, Issue 4</br>
Julia Koehler, Jens Meiler</br>
Graphical Abstract
http://ars.sciencedirect.com/content/image/1-s2.0-S0079656511000410-fx1.jpg Graphical abstract Highlights
nmrlearner
Journal club
0
03-09-2012 09:16 AM
Practical Aspects of High-Sensitivity MultidimensionalC MAS NMR Spectroscopy of Perdeuterated Proteins
Practical Aspects of High-Sensitivity MultidimensionalC MAS NMR Spectroscopy of Perdeuterated Proteins
Publication year: 2012
Source: Journal of Magnetic Resonance, Available online 1 March 2012</br>
Ümit*Akbey, Barth-Jan*van Rossum, Hartmut*Oschkinat</br>
Thedouble nucleus enhanced recoupling(DONER) experiment employs simultaneous irradiation of protons and deuterons to promote spin diffusion processes in a perdeuterated protein. This results in 4-5 times higher sensitivity in 2DC-C correlation experiments as compared to PDSD.Here, a quantitative comparison of PDSD,H-DARR,H-DARR,...
nmrlearner
Journal club
0
03-01-2012 11:03 PM
Expanding the utility of NMR restraints with paramagnetic compounds: Background and practical aspects
Expanding the utility of NMR restraints with paramagnetic compounds: Background and practical aspects
Publication year: 2011
Source: Progress in Nuclear Magnetic Resonance Spectroscopy, In Press, Accepted Manuscript, Available online 27 May 2011</br>
Julia, Koehler , Jens, Meiler</br>
*Highlights:*? introduction of a lanthanide ion into a protein leads to paramagnetic effects and partial alignment. ? Paramagnetic Relaxation Enhancements (PREs), Residual Dipolar Couplings (RDCs), and Pseudo-Contact Shifts (PCSs), among others, can be measured. ? amplitude of paramagnetic effects...