Exploring the BNC with SARA

British National Corpus HCU, 13 Banbury Road, Oxford OX2 6NN, UK

Lou Burnard HCU, 13 Banbury Road, Oxford OX2 6NN, UK


What's the plural of "corpus"? In what social situations is "wicked" a term of approval? Why does it "sound wrong" to say "The good weather set in on Thursday" although "The bad weather set in on Thursday" is perfectly acceptable? If I can say "I live a stone's throw away from here", can I also say "I'm going a stone's throw away from here"?

Large language corpora can help provide answers for these kinds of questions -- if only because they encourage linguists, lexicographers, and all who work with language to ask them. The purpose of a language corpus is to provide language workers with evidence of how language is really used, evidence that can then be used to inform and substantiate individual theories about what words might or should mean. Traditional grammars and dictionaries tell us what a word "ought to mean", but only experience can tell us what a word is used to mean. This is why dictionary publishers, grammar writers, language teachers, and developers of natural language processing software alike have been turning to corpus evidence as a means of extending and organizing that experience.

The British National Corpus (BNC) is a collection of over 4000 different text samples, of all kinds, both written and spoken, containing in all six and a quarter million sentences, and over 100 million words of current British English. Work on building the corpus began in 1991, and was completed in 1994. The project was funded by the Science and Engineering Council (now EPSRC) and the Department of Trade and Industry under the Joint Framework for Information Technology (JFIT) programme. The project was carried out and is managed by an industrial/academic consortium lead by Oxford University Press, of which the other members are major dictionary publishers Addison-Wesley Longman and Larousse Kingfisher Chambers; academic research centres at<A Oxford and Lancaster Universities; and the British Library's Research and Innovation Centre. A description of the way in which the corpus was built and a wealth of information about its contents is available from the project's web pages at http://info.ox.ac.uk/bnc.

This presentation will demonstrate the new BNC Online service provided by the British Library and managed by the OUCS. This allows anyone with access to the internet to search the entire corpus for words, phrases, patterns, contextual details, etc. using a user-friendly but powerful windows-based SGML-aware retrieval application called SARA and developed at Oxford with the needs of language teachers and researchers in mind.

The presentation will focus in particular on uses made of the SARA system by advanced language learners, and the pedagagogic implications of the learning styles it encourages.


A PC running windows (preferably windows95) with an internet connexion is needed.