Methods: A deep learning RA area CMR contouring model was trained in a multicentre cohort of 365 patients with pulmonary hypertension, left ventricular pathology and healthy subjects. Inter-study repeatability (intraclass correlation coefficient (ICC)) and agreement of contours (DICE similarity coefficient (DSC)) were assessed in a prospective cohort (n = 36). Clinical testing and mortality prediction was performed in n = 400 patients that were not used in the training nor prospective cohort, and the correlation of automatic and manual RA measurements with invasive haemodynamics assessed in n = 212/400. Radiologist quality control (QC) was performed in the ASPIRE registry, n = 3795 patients. The primary QC observer evaluated all the segmentations and recorded them as satisfactory, suboptimal or failure. A second QC observer analysed a random subcohort to assess QC agreement (n = 1018).
Results: All deep learning RA measurements showed higher interstudy repeatability (ICC 0.91 to 0.95) compared to manual RA measurements (1st observer ICC 0.82 to 0.88, 2nd observer ICC 0.88 to 0.91). DSC showed high agreement comparing automatic artificial intelligence and manual CMR readers. Maximal RA area mean and standard deviation (SD) DSC metric for observer 1 vs observer 2, automatic measurements vs observer 1 and automatic measurements vs observer 2 is 92.4 ± 3.5 cm2, 91.2 ± 4.5 cm2 and 93.2 ± 3.2 cm2, respectively. Minimal RA area mean and SD DSC metric for observer 1 vs observer 2, automatic measurements vs observer 1 and automatic measurements vs observer 2 was 89.8 ± 3.9 cm2, 87.0 ± 5.8 cm2 and 91.8 ± 4.8 cm2. Automatic RA area measurements all showed moderate correlation with invasive parameters (r = 0.45 to 0.66), manual (r = 0.36 to 0.57). Maximal RA area could accurately predict elevated mean RA pressure low and high-risk thresholds (area under the receiver operating characteristic curve artificial intelligence = 0.82/0.87 vs manual = 0.78/0.83), and predicted mortality similar to manual measurements, both p < 0.01. In the QC evaluation, artificial intelligence segmentations were suboptimal at 108/3795 and a low failure rate of 16/3795. In a subcohort (n = 1018), agreement by two QC observers was excellent, kappa 0.84.
Conclusion: Automatic artificial intelligence CMR derived RA size and function are accurate, have excellent repeatability, moderate associations with invasive haemodynamics and predict mortality.
- Artificial intelligence
- Cardiovascular magnetic resonance
- Clinical testing
- Convolutional neural networks
- Deep learning training
- Mortality prediction
- Repeatability assessment
- Right atrial area