Age tagging and word frequency for learners' dictionaries

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

In contemporary lexicography, particularly in learners' dictionaries, word frequency information from large corpora has been used for entry selection, sense ranking, and collocation identification as well as selecting defining vocabulary. However, age information in linguistic corpora has not been adequately highlighted or exploited. Early experiments have demonstrated that word retrieval in long-term memory is much more influenced by the age of acquisition than word frequency. For EFL English learners, it is necessary to know what words native speakers tend to use at different ages besides frequent words. Core vocabulary contains not simply those words with high frequency but also those with even distribution in different age groups. Learners' dictionaries with this kind of core vocabulary will be of much help for English learning and teaching as well as research in core vocabulary. Our research makes use of the age group information in the British National Corpus XML Edition (BNC XML 2007). It turns out that higher lexical coverage can be achieved when we select core vocabulary by the combined parameters of a word's dispersion index and distributed frequency in different age groups rather than raw frequency only. Moreover, our study shows that the young age group under 15 rely more on core vocabulary than adults due to its fundamental role in language learning. For the age group over 15 years old, core vocabulary occupies a stable proportion of their vocabulary size despite age increase. Another interesting finding is that each age group tends to acquire more core words selected on a frequency-age basis than those on a rawfrequency basis.

Original languageEnglish
Pages (from-to)157-173
Number of pages17
JournalLanguage and Computers
Volume73
DOIs
Publication statusPublished - 1 Dec 2011

Fingerprint

Dive into the research topics of 'Age tagging and word frequency for learners' dictionaries'. Together they form a unique fingerprint.

Cite this