Chemical Shift Prediction using Message Passing Neural Networks

Carlos Cobas, Isaac Iglesias, E. Kate Kemsley (Lead Author), Marcel Lachenmann, Santi Ponte, Nicola Tonge, David Williamson

Research output: Contribution to conferencePosterpeer-review

Abstract

We are using advanced artificial intelligence approaches - artificial neural networks, ensembles, and deep learning - to enhance chemical shift prediction, spectral assignment, and automated structural verification in NMR spectroscopy, collectively known as the ‘forward’ problems. Message Passing Neural Networks (MPNNs) have emerged as a promising architecture for this purpose. These networks naturally handle molecular structures as graphs, with atoms as nodes and bonds as edges. The key advantage of MPNNs is their simultaneous use of node feature information and their connectivity as described by the graph adjacency matrix.

Our ongoing work involves training MPNNs on large (>10,000s) collections of molecular structures, fully annotated with experimentally observed proton (1H) and carbon (13C) chemical shifts. The stochastic nature of the approach allows for improved performance by pooling predictions from ensembles of trained MPNNs for each target nucleus. This is conveniently executed in parallel on the multiple GPU nodes of an HPC facility. Initial results for both nuclei have yielded prediction errors that compare favourably with those reported in the literature. For example, from application to a large test set (n ~ 28,000 nodes) of previously unseen structures, the median absolute error in prediction is ~1.2 ppm for 13C. For 1H, the median absolute error is 0.09 ppm. The error distributions are fat-tailed compared to the normal distribution but are smooth, symmetric, and can be well-represented by a Gaussian kernel density method. This suggests a data-driven, probabilistic route to structural assignment and verification.

Key areas for further research include: investigating the balance of node subgraph representations in the training set and their impact on prediction performance; exploring alternative graph-theoretical representations of molecular structures to better characterize molecular diversity; and extending the capabilities of the model beyond diastereotopic protons to address stereoisomerism more widely.
Original languageEnglish
Publication statusPublished - 15 Sep 2024
EventSMASH Small Molecule Conference - Hotel Champlain, Burlington, United States
Duration: 15 Sep 202418 Sep 2024
Conference number: 23rd
https://smashnmr.org/

Conference

ConferenceSMASH Small Molecule Conference
Abbreviated titleSMASH
Country/TerritoryUnited States
CityBurlington
Period15/09/2418/09/24
Internet address

Cite this