Vocabulary differences early in development are highly predictive of later language learning as well as achievement in school. Early word learning emerges in the context of tightly coupled social interactions between the early learner and a mature partner. In the present study, we develop and apply a novel paradigm-dual head-mounted eye tracking-to record momentary gaze data from both parents and infants during free-flowing toy-play contexts. With fine-grained sequential patterns extracted from continuous gaze streams, we objectively measure both joint attention and sustained attention as parents and 9-month-old infants played with objects and as parents named objects during play. We show that both joint attention and infant sustained attention predicted vocabulary sizes at 12 and 15 months, but infant sustained attention in the context of joint attention, not joint attention itself, is the stronger unique predictor of later vocabulary size. Joint attention may predict word learning because joint attention supports infant attention to the named object.