Leveraging Deep Learning to Simulate Coronavirus Spike proteins has the potential to predict future Zoonotic sequences

Research output: Working paperPreprint


Motivation Coronaviridae are a family of positive-sense RNA viruses capable of infecting humans and animals. These viruses usually cause a mild to moderate upper respiratory tract infection, however, they can also cause more severe symptoms, gastrointestinal and central nervous system diseases. These viruses are capable of flexibly adapting to new environments, hence health threats from coronavirus are constant and long-term. Immunogenic spike proteins are glyco-proteins found on the surface of Coronaviridae particles that mediate entry to host cells. The aim of this study was to train deep learning neural networks to produce simulated spike protein sequences, which may be able to aid in knowledge and/or vaccine design by creating alternative possible spike sequences that could arise from zoonotic sources in future.

Results Here we have trained deep learning recurrent neural networks (RNN) to provide computer-simulated coronavirus spike protein sequences in the style of previously known sequences and examine their characteristics. Training used a dataset of alpha, beta, gamma and delta coronavirus spike sequences. In a test set of 100 simulated sequences, all 100 had most significant BLAST matches to Spike proteins in searches against NCBI non-redundant dataset (NR) and also possessed concomitant Pfam domain matches.

Conclusions Simulated sequences from the neural network may be able to guide us in future with prospective targets for vaccine discovery in advance of a potential novel zoonosis. We may effectively be able to fast-forward through evolution using neural networks to investigate sequences that could arise.
Original languageEnglish
Publication statusPublished - 20 Apr 2020
  • Deep Recurrent Neural Networks for the Generation of Synthetic Coronavirus Spike Protein Sequences

    Crossman, L. C., 2022, Computational Intelligence Methods for Bioinformatics and Biostatistics: 17th International Meeting, CIBB 2021, Virtual Event, November 15–17, 2021, Revised Selected Papers. Chicco, D., Facchiano, A., Tavazzi, E., Longato, E., Vettoretti, M., Bernasconi, A., Avesani, S. & Cazzaniga, P. (eds.). Springer, p. 217-226 10 p. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); vol. 13483 LNBI).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cite this