Abstract
In this paper, we propose an attention-based approach to short text classification, which we have created for the practical application of Twitter mining for public health monitoring. Our goal is to automatically filter Tweets which are relevant to the syndrome of asthma/difficulty breathing. We describe a bi-directional Recurrent Neural Network architecture with an attention layer (termed ABRNN) which allows the network to weigh words in a Tweet differently based on their perceived importance. We further distinguish between two variants of the ABRNN based on the Long Short Term Memory and Gated Recurrent Unit architectures respectively, termed the ABLSTM and ABGRU. We apply the ABLSTM and ABGRU, along with popular deep learning text classification models, to a Tweet relevance classification problem and compare their performances. We find that the ABLSTM outperforms the other models, achieving an accuracy of 0.906 and an F1-score of 0.710. The attention vectors computed as a by-product of our models were also found to be meaningful representations of the input Tweets. As such, the described models have the added utility of computing document embeddings which could be used for other tasks besides classification. To further validate the approach, we demonstrate the ABLSTM’s performance in the real world application of public health surveillance and compare the results with real-world syndromic surveillance data provided by Public Health England (PHE). A strong positive correlation was observed between the ABLSTM surveillance signal and the real-world asthma/difficulty breathing syndromic surveillance data. The ABLSTM is a useful tool for the task of public health surveillance.
Original language | English |
---|---|
DOIs | |
Publication status | Published - 16 May 2019 |
Event | International Work-Conference on Artificial Neural Networks - , Spain Duration: 12 Jun 2019 → 14 Jun 2019 |
Conference
Conference | International Work-Conference on Artificial Neural Networks |
---|---|
Abbreviated title | IWANN |
Country/Territory | Spain |
Period | 12/06/19 → 14/06/19 |
Keywords
- Syndromic Surveillance
- Sequence modelling
- Machine Learning
- Deep Learning
- Natural Language Processing
- Short-Text Classification
Profiles
-
Beatriz De La Iglesia
- School of Computing Sciences - Professor & Head of School
- Norwich Institute for Healthy Aging - Member
- Norwich Epidemiology Centre - Member
- Data Science and AI - Member
Person: Research Group Member, Research Centre Member, Academic, Teaching & Research
-
Iain Lake
- School of Environmental Sciences - Professor
- Tyndall Centre for Climate Change Research - Member
- Environmental Social Sciences - Member
- ClimateUEA - Member
Person: Research Group Member, Academic, Teaching & Research