Heterogeneous machine learning ensembles for predicting train delays

Mostafa Al Ghamdi, Gerard Parr, Wenjia Wang

Research output: Contribution to journalArticlepeer-review

3 Downloads (Pure)


Train delays have been a serious persisting problem in the UK and also many other countries. Due to increasing demand, rail networks are running close to their full capacity. As a consequence, an initial delay can cause many knock-on delays to other trains, and this is the main reason for the overall deterioration in the performance of the rail networks. Therefore, it is really useful to have an AI-based method that can predict delays accurately and reliably, to help train controllers to make and apply alternative plans in time to reduce or prevent further delays, when a delay occurs. However, existing machine learning models are not only inaccurate but more importantly unreliable. In this study, we have proposed a new approach to build heterogeneous ensembles with two novel model selection methods based on accuracy and diversity. We tested our heterogeneous ensembles using the real-world data and the results indicated that they are more accurate and robust than single models and state-of-the-art homogeneous ensembles, e.g. Random Forest and XGBoost. We then verified their performances with an independent dataset from a different train operating company and found that they achieved the consistent and accurate results.
Original languageEnglish
JournalIEEE Transactions on Intelligent Transportation Systems
Early online date1 Jan 2024
Publication statusE-pub ahead of print - 1 Jan 2024


  • Train delay prediction
  • Machine learning
  • Ensemble
  • Boosting classifiers
  • Random forest
  • Atmospheric modeling
  • Predictive models
  • random forest
  • heterogeneous ensemble
  • Rails
  • diversity
  • Data models
  • Delays
  • Ensemble learning

Cite this