Empirical evaluation: Towards an automated index of lexical variety

Vander Viana, A. Giordani, Sonia Zyngier

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review


This chapter proposes an objective approach to the formal analysis of literary prose in English in order to investigate the relation between lexical density and judgments of canonicity. Based on the concepts of literariness proposed by the Russian Formalists and lexical variety, a mathematical index is designed, relating three variables which take the materiality of text into consideration: (a) relative frequency of lexical bundles, (b) lexical bundle type/token ratio, and (c) word type/token ratio. The index is described and illustrated with 46 canonical and non-canonical literary works. Statistical analysis shows no significant relation between lexical richness and decisions of what has been classified as canonical, indicating that these judgments may be influenced by factors other than the text itself.
Original languageEnglish
Title of host publicationDirections in empirical literary studies
Subtitle of host publicationIn honor of Willie van Peer
EditorsSonia Zyngier, Marisa Bortolussi, Anna Chesnokova, Jan Auracher
Place of PublicationAmsterdam
Publication statusPublished - 2008

Publication series

NameLinguistic Approaches to Literature
PublisherJohn Benjamins
ISSN (Print)1569-3112


  • Lexical variety
  • Corpus linguistics
  • Literary discourse
  • Lexical bundles
  • Empirical study
  • Canonicity

Cite this