Background: To develop patient-reported outcome instruments, statistical techniques (e.g., principal components analysis; PCA) are used to examine correlations among items and identify interrelated item subsets (empirical factors). However, interpretation and labelling of empirical factors is a subjective process, lacking precision or conceptual basis. We report a novel and reproducible method for mapping between theoretical and empirical factor structures. We illustrate the method using the pilot Aberdeen Glaucoma Questionnaire (AGQ), a new measure of glaucoma-related disability developed using the International Classification of Functioning and Disability (ICF) as a theoretical framework and tested in a sample representing the spectrum of glaucoma severity. Methods: We used the ICF to code AGQ item content before mailing the AGQ to a UK sample (N = 1349) selected to represent people with a risk factor for glaucoma and people with glaucoma across a range of severity. Reflecting uncertainty in the theoretical framework (items with multiple ICF codes), an exploratory PCA was conducted. The theoretical structure informed our interpretation of the empirical structure and guided the selection of theoretically-derived factor labels. We also explored the discrimination of the AGQ across glaucoma severity groups. Results: 656 (49%) completed questionnaires were returned. The data yielded a 7-factor solution with a simple structure (using cut-off point of a loading of 0.5) that together accounted for 63% of variance in the scores. The mapping process resulted in allocation of the following theoretically-derived factor labels: 1) Seeing Functions: Participation; 2) Moving Around & Communication; 3) Emotional Functions; 4) Walking Around Obstacles; 5) Light; 6) Seeing Functions: Domestic & Social Life; 7) Mobility. Using the seven factor scores as independent variables in a discriminant function analysis, the AGQ scores resulted in correct glaucoma severity grading of 32.5% of participants (p < 0.001). Conclusions: This paper addresses a methodological gap in the application of classical test theory (CTT) techniques, such as PCA, in instrument development. Labels for empirically-derived factors are often selected intuitively whereas they can inform existing bodies of knowledge if selected on the basis of theoretical construct labels, which are more explicitly defined and which relate to each other in ways that are evidence based.