Real-world environments are multisensory, meaningful, and highly complex. To parse these environments in a highly efficient manner, a subset of this information must be selected both within and across modalities. However, the bulk of attention research has been conducted within sensory modalities, with a particular focus on vision. Visual attention research has made great strides, with over a century of research methodically identifying the underlying mechanisms that allow us to select critical visual information. Spatial attention, attention to features, and object-based attention have all been studied extensively. More recently, research has established semantics (meaning) as a key component to allocating attention in real-world scenes, with the meaning of an item or environment affecting visual attentional selection. However, a full understanding of how semantic information modulates real-world attention requires studying more than vision in isolation. The world provides semantic information across all senses, but with this extra information comes greater complexity. Here, we summarize visual attention (including semantic-based visual attention), crossmodal attention, and argue for the importance of studying crossmodal semantic guidance of attention.
- attentional prioritization
- crossmodal attention