Liberation chemist hands! Automatic spectra analysis software available, a data processing 60 seconds! NMR resolved, then went to the artificial intelligence it!
In the synthesis of organic molecules and natural products, to determine the structure is a very challenging task. The close structuraland diastereomers non differences in the 1D NMR spectrum very fine, in order to distinguish them have spend a lot of time and effort. NMR spectrum identified by assistance to provide a large number of researchers, the principle is the density functional theory (DFT) calculation of all non-structural uncertainty NMR displacement enantiomers, using correlation coefficient based on mean absolute error ( MAE) and mean absolute error correction (CMAE) these parameters are compared with the predicted results published spectral data. Wherein, Analysis of DP4 is a particularly powerful tool that can not only predict the chemical properties of the molecule isotactic, may also be given every possible structure is correct the probability, and the natural product has Drug Synthesis there are successfully applied. Since its release, the calculation of DP4 has been greatly simplified, user input less and less. However, the most energy-consuming user remains the ownership of NMR spectra, which is not only very time-consuming and error-prone. few commercial software, such as Mestrelab Mnova, while providing a IH NMR spectrum attributable to the algorithm, but can not automatically processed and the original home NMR data.
The results presented in
Based on the above analysis, Cambridge Professor Jonathan M. Goodman TF thefor 1H and 13C NMR, proposed an automatic processing and spectra attribution method , it can be predicted organic molecule isotactic structure and chemical properties of the auto ambiguity. Number of molecules found NMR- can in about one minute processed NNR raw data , which had the same task takes about 8 hours, the rate improved 480-fold, can be processed per day an increase of 60 times, which makes high-throughput NMR spectral analysis possible, through machine learning to discover new molecular structure paved the way.
DP4-AI and structure calculation process
DP4-AI containing NMR-AI and PyDP4 two parts, wherein the raw NMR data NMR-AI responsible for handling user input, and the chemical shifts were assigned, PyDP4 is correct or not this attribution probability It is calculated so as to automatically set forth the stereochemistry of the molecule.
DP4-AI flow of the NMR data processing is as follows: when the user enters the raw NMR data, the program first on the phase and baseline correction, and then extracts the chemical shift values of each peak, and calculating the integral value, chemical shifts are calculated for each atom of the DFT method and subjected to home, the final analysis of the probability of such DP4 belongs, and gives the chemical structure of the substance.
DP4-AI NMR peak in the extraction process
When extracting 1H NMR shift peaks, using first and second order derivatives of the raw data is performed: If the first derivative peak to zero, minimum second derivative, and the peak amplitude threshold value of the second derivative value or less, and the peak above the second threshold is extracted. Two thresholds may be set very low peak when extracted in this manner, in the case where as much as possible to filter out noise, as little loss of signal.
In order to avoid noise mistaken for a signal peak, the researchers developed a model using the targetSelect the algorithm to eliminate noise. Interval is less than 18 Hz is extracted peaks are grouped together to form a signal area, for each zone, a plurality of linear generalized Lorenz function constructing linear model, the model parameters of each region changes iterates until Model integral converges to within 1% of the corresponding spectral region. If the Bayesian Information Model is below the threshold value, it is considered that these parameters describe the noise, the corresponding summit is deleted.
DP4-AI in the NMR peak attributable
Researchers believe DP4-AI development process The most challenging task is to develop a home algorithm , the algorithm will assign each atom molecule diastereomers in to extract the spectral peaks. GIAO researchers use different methods for allocating peaks core attribution allocation algorithm is to compute the probability matrix M, the matrix element Mij is the probability calculated chemical shifts i j corresponding to the peak of the experiment. Most probable distribution matrix M results found by linear and Hungary minimization method.
13 C NMR algorithm also takes into account the amplitude of the experimental peaks. Each element Mij M is multiplied by a weighting factor Aj j from the peak amplitude of the right experiments obtained. 13 C NMR spectrum of the peak is typically divided into three groups, can be distinguished by the magnitude of: noise, 1- and atomic signal peak corresponding to a plurality of equivalent carbon atoms. To capture this change, the researchers estimated probability density function of the spectrum of the peak amplitude, when the minimum value of the second derivative of the function when located between the peak amplitude of these peaks are grouped, and each group using the peakNumber and the number of carbon atoms in the structure of the expected amplitude calculated weights.
Evaluation of the performance of DP4-AI
Researchers NMR-AI To evaluate the performance of the test group was constructed consisting of 47 molecules (an average of 3.49 per molecule stereocenters) thereof, wherein the carbon backbone comprising various structures. Test group contained debris structure of natural products, natural products and synthetic intermediates to include more types of organic molecular structures possible. In order to describe the error probability for DP4-AI NMR prediction, investigators tested four different statistical models, found a single region 3 Gaussian model derived optimal prediction error. Correct prediction rate
at the highest level of theory test, DP4-AI reliability and time-consuming algorithm similar pairs of ownership, while the latter requires a trained chemist to complete. In the test data set, the correct stereochemical effective attribution probability is about 3 × 10-8, DP4-AI showed very reliable performance. Most impressive is that, DP4-AI 32 and 64 in the non-enantiomeric molecules of the correct chemical properties of the isotactic NP1 and NP2 were assigned.
NMR-AI can be processed in about 1 minute NNR data, but before the same task takes about 8 hours, the number of molecules which corresponds to the process can be increased 60 times a day.
In order to quickly and efficiently process the raw NMR data, Cambridge Jonathan M. Goodman Professor spectra group proposed an automatic processing method of home and DP4-AI, this method and the NMR-AI PyDP4 two parts, the user only needs to input the raw NMR data, the program will automatically extract various peak, and its home directly gives the most likely probability of this molecular structure and attribution. Researchers tested the group consisting of a constructed molecule composed of 47, the probability of discovery procedure is correct and effective home stereochemistry is about 3 × 10-8, and the correct molecular NP1 and NP2 isotactic chemical characteristics were assigned. Requires only 1 minute, NMR-AI NNR data can be processed, compared to previous methods, the rate of increase of 480 times, the number of molecules of the processing can be increased 60 times per day. Original link: https: //pubs.rsc.org/en/content/articlehtml/2020/sc/d0sc00442a