Incorporating topic membership in review rating prediction from unstructured data: A gradient boosting approach

Nan Yang, Nikolaos Korfiatis, Dimitris Zissis, Konstantina Spanaki

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
1 Downloads (Pure)

Abstract

Rating prediction is a crucial element of business analytics as it enables decision-makers to assess service performance based on expressive customer feedback. Enhancing rating score predictions and demand forecasting through incorporating performance features from verbatim text fields, particularly in service quality measurement and customer satisfaction modelling is a key objective in various areas of analytics. A range of methods has been identified in the literature for improving the predictability of customer feedback, including simple bag-of-words-based approaches and advanced supervised machine learning models, which are designed to work with response variables such as Likert-based rating scores. This paper presents a dynamic model that incorporates values from topic membership, an outcome variable from Latent Dirichlet Allocation, with sentiment analysis in an Extreme Gradient Boosting (XGBoost) model used for rating prediction. The results show that, by incorporating features from simple unsupervised machine learning approaches (LDA-based), an 86% prediction accuracy (AUC based) can be achieved on objective rating values. At the same time, a combination of polarity and single-topic membership can yield an even higher accuracy when compared with sentiment text detection tasks both at the document and sentence levels. This study carries significant practical implications since sentiment analysis tasks often require dictionary coverage and domain-specific adjustments depending on the task at hand. To further investigate this result, we used Shapley Additive Values to determine the additive predictability of topic membership values in combination with sentiment-based methods using a dataset of customer reviews from food delivery services.

Original languageEnglish
Pages (from-to)631-662
Number of pages32
JournalAnnals of Operations Research
Volume339
Issue number1-2
Early online date19 Jun 2023
DOIs
Publication statusPublished - Aug 2024

Keywords

  • Latent dirichlet allocation
  • Machine learning
  • Online reviews
  • Sentiment analysis
  • XGBoost

Cite this