Abstract
Train delays have been a serious persisting problem in the UK and also many other countries. Due to increasing demand, rail networks are running close to their full capacity. As a consequence, an initial delay can cause many knock-on delays to other trains, and this is the main reason for the overall deterioration in the performance of the rail networks. Therefore, it is really useful to have an AI-based method that can predict delays accurately and reliably, to help train controllers to make and apply alternative plans in time to reduce or prevent further delays, when a delay occurs. However, existing machine learning models are not only inaccurate but more importantly unreliable. In this study, we have proposed a new approach to build heterogeneous ensembles with two novel model selection methods based on accuracy and diversity. We tested our heterogeneous ensembles using the real-world data and the results indicated that they are more accurate and robust than single models and state-of-the-art homogeneous ensembles, e.g. Random Forest and XGBoost. We then verified their performances with an independent dataset from a different train operating company and found that they achieved the consistent and accurate results.
Original language | English |
---|---|
Pages (from-to) | 5138-5153 |
Number of pages | 16 |
Journal | IEEE Transactions on Intelligent Transportation Systems |
Volume | 25 |
Issue number | 6 |
Early online date | 1 Jan 2024 |
DOIs | |
Publication status | Published - Jun 2024 |
Keywords
- Train delay prediction
- Machine learning
- Ensemble
- Boosting classifiers
- Random forest
- Atmospheric modeling
- Predictive models
- random forest
- heterogeneous ensemble
- Rails
- diversity
- Data models
- Delays
- Ensemble learning