TY - GEN
T1 - Deep Learning for Relevance Filtering in Syndromic Surveillance: A Case Study in Asthma/Difficulty Breathing
AU - Edo-Osagie, Oduwa
AU - De La Iglesia, Beatriz
AU - Lake, Iain
AU - Edeghere, Obaghe
N1 - Funding Information:
We acknowledge support from NHS 111 and NHS Digital for their assistance and support with the NHS 111 system; Out-of-Hours providers submitting data to the GPOOH syndromic surveillance and Advanced Heath & Care. The authors also acknowledge support from the Public Health England Real-time Syndromic Surveillance Team. Beatriz De La Iglesia and Iain Lake receive support from the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Emergency Preparedness and Response.
Publisher Copyright:
© 2019 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.
PY - 2019
Y1 - 2019
N2 - In this paper, we investigate deep learning methods that may extract some word context for Twitter mining for syndromic surveillance. Most of the work on syndromic surveillance has been done on the flu or Influenza- Like Illnesses (ILIs). For this reason, we decided to look at a different but equally important syndrome, asthma/difficulty breathing, as this is quite topical given global concerns about the impact of air pollution. We also compare deep learning algorithms for the purpose of filtering Tweets relevant to our syndrome of interest, asthma/difficulty breathing. We make our comparisons using different variants of the F-measure as our evaluation metric because they allow us to emphasise recall over precision, which is important in the context of syndromic surveillance so that we do not lose relevant Tweets in the classification. We then apply our relevance filtering systems based on deep learning algorithms, to the task of syndromic surveillance and compare the results with real-world syndromic surveillance data provided by Public Health England (PHE).We find that the RNN performs best at relevance filtering but can also be slower than other architectures which is important for consideration in real-time application. We also found that the correlation between Twitter and the real-world asthma syndromic surveillance data was positive and improved with the use of the deep- learning-powered relevance filtering. Finally, the deep learning methods enabled us to gather context and word similarity information which we can use to fine tune the vocabulary we employ to extract relevant Tweets in the first place.
AB - In this paper, we investigate deep learning methods that may extract some word context for Twitter mining for syndromic surveillance. Most of the work on syndromic surveillance has been done on the flu or Influenza- Like Illnesses (ILIs). For this reason, we decided to look at a different but equally important syndrome, asthma/difficulty breathing, as this is quite topical given global concerns about the impact of air pollution. We also compare deep learning algorithms for the purpose of filtering Tweets relevant to our syndrome of interest, asthma/difficulty breathing. We make our comparisons using different variants of the F-measure as our evaluation metric because they allow us to emphasise recall over precision, which is important in the context of syndromic surveillance so that we do not lose relevant Tweets in the classification. We then apply our relevance filtering systems based on deep learning algorithms, to the task of syndromic surveillance and compare the results with real-world syndromic surveillance data provided by Public Health England (PHE).We find that the RNN performs best at relevance filtering but can also be slower than other architectures which is important for consideration in real-time application. We also found that the correlation between Twitter and the real-world asthma syndromic surveillance data was positive and improved with the use of the deep- learning-powered relevance filtering. Finally, the deep learning methods enabled us to gather context and word similarity information which we can use to fine tune the vocabulary we employ to extract relevant Tweets in the first place.
KW - Syndromic Surveillance
KW - Machine Learning
KW - Text Classification
KW - Tweet Classification
KW - Deep Learning
UR - http://www.scopus.com/inward/record.url?scp=85064629501&partnerID=8YFLogxK
U2 - 10.5220/0007366904910500
DO - 10.5220/0007366904910500
M3 - Conference contribution
AN - SCOPUS:85174822754
SN - 9789897583513
T3 - International Conference on Pattern Recognition Applications and Methods
SP - 491
EP - 500
BT - Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods ICPRAM
A2 - De Marsico, Maria
A2 - Sanniti di Baja, Gabriella
A2 - Fred, Ana L. N.
PB - Science and Technology Publications, Lda
T2 - 8th International Conference on Pattern Recognition Applications and Methods , ICPRAM 2019
Y2 - 19 February 2019 through 21 February 2019
ER -