Liberation chemist hands! Automatic spectra analysis software available, a data processing 60 seconds! NMR resolved, then went to the artificial intelligence it!

In the synthesis of organic molecules and natural products, to determine the structure is a very challenging task. The close structural isomers and diastereomers non differences in the 1D NMR spectrum very fine, in order to distinguish them have spend a lot of time and effort. NMR spectrum identified by computer assistance to provide a large number of researchers, the principle is the density functional theory (DFT) calculation of all non-structural uncertainty NMR displacement enantiomers, using correlation coefficient based on mean absolute error ( MAE) and mean absolute error correction (CMAE) these parameters are compared with the predicted results published spectral data. Wherein, Analysis of DP4 is a particularly powerful tool that can not only predict the chemical properties of the molecule isotactic, may also be given every possible structure is correct the probability, and the natural product has Drug Synthesis there are successfully applied. Since its release, the calculation of DP4 has been greatly simplified, user input less and less. However, the most energy-consuming user remains the ownership of NMR spectra, which is not only very time-consuming and error-prone. few commercial software, such as Mestrelab Mnova, while providing a IH NMR spectrum attributable to the algorithm, but can not automatically processed and the original home NMR data.

The results presented in

Based on the above analysis, Cambridge Professor Jonathan M. Goodman TF the raw data for 1H and 13C NMR, proposed an automatic processing and spectra attribution method DP4-AI , it can be predicted organic molecule isotactic structure and chemical properties of the auto ambiguity. Number of molecules found NMR-AI can in about one minute processed NNR raw data , which had the same task takes about 8 hours, the rate improved 480-fold, can be processed per day an increase of 60 times, which makes high-throughput NMR spectral analysis possible, through machine learning to discover new molecular structure paved the way. 解放化学家双手!自动解谱软件问世,60秒处理一个数据!解析核磁,以后就交给人工智能吧!

DP4-AI and structure calculation process

FIG.(A) DP4-AI structure; example (b) has the stereochemistry can be integrated in the DP4-AI PyDP4 automatically predicted.

DP4-AI containing NMR-AI and PyDP4 two parts, wherein the raw NMR data NMR-AI responsible for handling user input, and the chemical shifts were assigned, PyDP4 is correct or not this attribution probability It is calculated so as to automatically set forth the stereochemistry of the molecule.

an overall configuration 解放化学家双手!自动解谱软件问世,60秒处理一个数据!解析核磁,以后就交给人工智能吧!
2. DP4-AI in FIG. The NMR data of the original will be a series of processes, and the first shift value to obtain multiple experimental integral value, then the program every atom in the molecule using the DFT calculation of chemical shifts, and attributed to the experimental displacement, and finally DP4 program calculates probability of each diastereomer isomers of this attribution.

DP4-AI flow of the NMR data processing is as follows: when the user enters the raw NMR data, the program first on the phase and baseline correction, and then extracts the chemical shift values ​​of each peak, and calculating the integral value, chemical shifts are calculated for each atom of the DFT method and subjected to home, the final analysis of the probability of such DP4 belongs, and gives the chemical structure of the substance.

DP4-AI NMR peak in the extraction process

Figure 3. The peaks of the extraction process. If the peak value is below the threshold of the second derivative (orange) and higher than the intensity threshold (blue), the peak is extracted. The final selection of the peaks in green.

When extracting 1H NMR shift peaks, using first and second order derivatives of the raw data is performed: If the first derivative peak to zero, minimum second derivative, and the peak amplitude threshold value of the second derivative value or less, and the peak above the second threshold is extracted. Two thresholds may be set very low peak when extracted in this manner, in the case where as much as possible to filter out noise, as little loss of signal.

Figure 4. multiplet extracted (blue) and deconvolution exemplary model (orange). Signal peaks highlighted in blue, is determined as the peak noise highlighted in red.

In order to avoid noise mistaken for a signal peak, the researchers developed a model using the targetSelect the algorithm to eliminate noise. Interval is less than 18 Hz is extracted peaks are grouped together to form a signal area, for each zone, a plurality of linear generalized Lorenz function constructing linear model, the model parameters of each region changes iterates until Model integral converges to within 1% of the corresponding spectral region. If the Bayesian Information Model is below the threshold value, it is considered that these parameters describe the noise, the corresponding summit is deleted.

DP4-AI in the NMR peak attributable

Figure 5. probability assignment matrix M calculated displacement assigned to the experimental peaks. (A) simulated spectrum calculated peak (blue) assigned to the peak (orange) in the experimental spectrum; (b) calculation of the matrix M, and calculate the best home (cyan); (C) In this example finally found a home.

Researchers believe DP4-AI development process The most challenging task is to develop a home algorithm , the algorithm will assign each atom molecule diastereomers in to extract the spectral peaks. GIAO researchers use different methods for allocating peaks core attribution allocation algorithm is to compute the probability matrix M, the matrix element Mij is the probability calculated chemical shifts i j corresponding to the peak of the experiment. Most probable distribution matrix M results found by linear and Hungary minimization method.

Figure 6. The minimum value of the second derivative of the amplitude probability density function (right), peak (left) according to the amplitude of packets (falls between dashed lines). In this example simulation, the number of carbon atoms in the structure is 9, the group is calculated for each cumulative total peak above the boundary, the weight assigned to each group is weighted in the number of carbon atoms in the structure divided by the value and the maximum weight is fixed at 1.

13 C NMR algorithm also takes into account the amplitude of the experimental peaks. Each element Mij M is multiplied by a weighting factor Aj j from the peak amplitude of the right experiments obtained. 13 C NMR spectrum of the peak is typically divided into three groups, can be distinguished by the magnitude of: noise, 1- and atomic signal peak corresponding to a plurality of equivalent carbon atoms. To capture this change, the researchers estimated probability density function of the spectrum of the peak amplitude, when the minimum value of the second derivative of the function when located between the peak amplitude of these peaks are grouped, and each group using the peakNumber and the number of carbon atoms in the structure of the expected amplitude calculated weights.

Evaluation of the performance of DP4-AI

FIG molecular structure 7.47 DP4-AI for evaluating performance. Molecular AT3, TS3A, TS4 NL1A and 1H NMR of the corresponding data only, all other molecules have the 1H and 13C NMR data; molecular JB7, JB11, JB5 and JB8 spectra were obtained in solvent such as methanol, benzene, DMSO, and methanol , while all other molecules are obtained in CDCl3.

Researchers NMR-AI To evaluate the performance of the test group was constructed consisting of 47 molecules (an average of 3.49 per molecule stereocenters) thereof, wherein the carbon backbone comprising various structures. Test group contained debris structure of natural products, natural products and synthetic intermediates to include more types of organic molecular structures possible. In order to describe the error probability for DP4-AI NMR prediction, investigators tested four different statistical models, found a single region 3 Gaussian model derived optimal prediction error. Correct prediction rate

FIG. 8. FIG. 7 compound, DP4-AI (orange), the home pair algorithm (blue).

at the highest level of theory test, DP4-AI reliability and time-consuming algorithm similar pairs of ownership, while the latter requires a trained chemist to complete. In the test data set, the correct stereochemical effective attribution probability is about 3 × 10-8, DP4-AI showed very reliable performance. Most impressive is that, DP4-AI 32 and 64 in the non-enantiomeric molecules of the correct chemical properties of the isotactic NP1 and NP2 were assigned.

FIG 9. NMR-AI NNR data processing rate comparison.

NMR-AI can be processed in about 1 minute NNR data, but before the same task takes about 8 hours, the number of molecules which corresponds to the process can be increased 60 times a day.


In order to quickly and efficiently process the raw NMR data, Cambridge Jonathan M. Goodman Professor spectra group proposed an automatic processing method of home and DP4-AI, this method and the NMR-AI PyDP4 two parts, the user only needs to input the raw NMR data, the program will automatically extract various peak, and its home directly gives the most likely probability of this molecular structure and attribution. Researchers tested the group consisting of a constructed molecule composed of 47, the probability of discovery procedure is correct and effective home stereochemistry is about 3 × 10-8, and the correct molecular NP1 and NP2 isotactic chemical characteristics were assigned. Requires only 1 minute, NMR-AI NNR data can be processed, compared to previous methods, the rate of increase of 480 times, the number of molecules of the processing can be increased 60 times per day. Original link: https: //