Objectives We have developed a new grading system for hip osteoarthritis using clinical computed tomography (CT). This technique was compared with Kellgren and Lawrence (K&L) grading and minimum joint space width (JSW) measurement in digitally reconstructed radiographs (DRRs) from the same CT data. In this paper we evaluate and compare the accuracy and reliability of these measures in the assessment of radiological disease. Design CT imaging of hips from 30 female volunteers aged 66 ± 17 years were used in two reproducibility studies, one testing the reliability of the new system, the other testing K&L grading and minimum JSW measurement in DRRs. Results Intra- and inter-observer reliability was substantial for CT grading according to weighted kappa (0.74 and 0.75 respectively), while intra- and inter-observer reliability was at worst moderate (0.57) and substantial (0.63) respectively for DRR K&L grading. Bland–Altman analysis showed a systematic difference in minimum JSW measurement of 0.82 mm between reviewers, with a least detectable difference of 1.06 mm. The area under the curve from ROC analysis was 0.91 for our CT composite score. Conclusions CT grading of hip osteoarthritis (categorised as none, developing and established) has substantial reliability. Sensitivity was increased when CT features of osteoarthritis were assigned a composite score (0 = none to 7 = severest) that also performed well as a diagnostic test, but at the cost of reliability. Having established feasibility and reliability for this new CT system, sensitivity testing and validation against clinical measures of hip osteoarthritis will now be performed.