Corpus Linguistics

Corpus research at UCSB is unique in the extent to which it is both theory-driven and pervasive across the subfields of the department. Most department faculty and graduate students are involved in one way or another in developing and using a corpus—a body of attested instances of naturally occurring language use. The corpus may be large or small, written or spoken, automatically assembled from pre-existing online texts, or meticulously transcribed by the researcher in the hands-on process of documenting a previously unwritten language. In line with theoretical goals of seeking functional explanations for language, more and more linguists are demanding that explanatory generalizations about language be built on firm empirical foundations, and have come to see corpus data and research methods as critical tools for serious research. Current research at UCSB makes use of corpora in studies of syntax, morphology, phonology, discourse, pragmatics, cognition, child language, interaction, sociocultural function, historical linguistics, and lexicography. Methods range from close qualitative analysis of interactional structures and their pragmatic functions in richly transcribed conversational data to the application of powerful new quantitative computational tools and innovative statistical techniques for systematically identifying the most important and reliable patterns of grammatical constructions in very large corpora. Corpus-related resources and activities at UCSB include the Santa Barbara Corpus of Spoken American English and editing of the journal Corpus Linguistics and Linguistic Theory.

Core Faculty

Wallace Chafe, John W. Du Bois, Stefan Th. Gries, Fermin Moscoso, Sandra A. Thompson

Courses

Linguistics 120: Corpus Linguistics
Linguistics 201: Research Methodology and Statistics in Linguistics
Linguistics 202: Advanced Research Methods and Statistics
Linguistics 204: Statistical Methodology
Linguistics 210: Computational Linguistics
Linguistics 212: Discourse Transcription
Linguistics 218: Corpus Linguistics
Linguistics 219: Corpus Construction

Links

Santa Barbara Corpus of Spoken American English
Corpus Linguistics and Linguistic Theory
Google Group: Corpus Linguistics with R
Google Group: Statistics for Linguists with R