Infants, children and adults have been shown to track co-occurrence across ambiguous naming situations to infer the referents of new words. The extensive literature on this cross-situational word learning (CSWL) ability has produced support for two theoretical accounts — associative learning (AL) and hypothesis testing (HT) — but no comprehensive model of the behaviour. We propose WOLVES, a formal account of CSWL grounded in psychological processes of memory and attention that explicitly models the dynamics of looking at a moment-to-moment scale and learning across trials. Here we use WOLVES to capture data from 12 studies of CSWL with adults and children, thereby providing a comprehensive account of data purported to support both AL and HT accounts. Moreover, we offer the first developmental account of CSWL, offering insights into how underlying processes change from infancy through adulthood. WOLVES shows that selective attention in CSWL is both dependent on and indicative of learning. Further, learning is driven by real-time synchrony of words and gaze-fixations and constrained by memory processes operating over multiple timescales. Additionally, WOLVES explains a) how performance is impacted by the structure of test paradigms, b) how partial knowledge boosts learning of new words, c) how within- and across-trial competition produces mutual exclusivity; and d) how previously observed individual differences can emerge from learning in the task. The larger theoretical framework in which WOLVES is situated, Dynamic Field Theory, provides neural grounding and ties to other visual processing phenomena like novelty detection and habituation as well as multiple early word learning behaviours.