Chemical shift prediction in 13C NMR spectroscopy using ensembles of message passing neural networks (MPNNs)

David Williamson, Santiago Ponte, Isaac Iglesias, Carlos Cobas, Nicola M. Tonge, E. Kate Kemsley

Research output: Working paperPreprint

Abstract

This study reports a deep learning approach utilising graph convolutional neural networks with four message-passing layers for predicting chemical shifts in 13C NMR spectra of small molecules. The networks were trained on two distinct datasets: one with approximately 4,000 labelled structures and another with over 40,000. To mitigate stochastic variation, an ensemble framework was implemented, which is simple to deploy on multiple nodes of a high-performance computing facility.
The results emphasise the critical role of training set size and diversity. While prediction performance was comparable on test partitions from within each dataset, the larger dataset maintained its accuracy when challenged with crossover holdout sets, unlike the smaller dataset, which showed a notable decline. This difference is attributed to the greater diversity of atomic environments in the larger dataset.
The larger dataset also enabled more robust modelling of various error properties, providing a quantitative foundation for spectral assignment and verification in two ways. First, a clear relationship was identified between prediction errors and the frequency of different node feature vectors in the training data, from which an estimated error can be associated with any node given node type. Such estimates may be used as weights in a modified cityblock distance metric during the assignment of observed to predicted shifts. Second, the mean absolute prediction error calculated at the structure level is well-fitted by a Gaussian kernel cumulative distribution. This leads to a probabilistic assessment of whether the predicted shifts and assigned observations are consistent with originating from the same molecular structure.
Original languageEnglish
PublisherSSRN
Publication statusPublished - 12 Sep 2024

Cite this