University of California, Santa Barbara
Linguistics at UCSB
homepeopleresearchProgramsCoursesEventsNewsContact
research panel 2
Research Areas
Discourse & Grammar
Typology
Cognitive Linguistics
Field linguistics
Language Documentation
Prosody
Evolutionary linguistics
Sociocultural Linguistics
Applied Linguistics
Transcription
Language Areas
Santa Barbara Papers in Linguistics
Santa Barbara Corpus of Spoken American English
research
The Santa Barbara Corpus of Spoken American English
AcknowledgmentsCitationContentsSummariesRecordingsObtaining

Obtaining the Santa Barbara Corpus

Parts 1-4 of the Santa Barbara Corpus of Spoken American English are now available, for a total of approximately 249,000 words.  The Santa Barbara Corpus includes transcriptions, audio, and timestamps which correlate transcription and audio at the level of individual intonation units.

The Santa Barbara Corpus of Spoken American English may be purchased from the Linguistic Data Consortium on CD’s and DVD’s, or downloaded from TalkBank, at the following web pages:

Part 1: LDC Catalog No. LDC2000S85

Part 2: LDC Catalog No. LDC2003S06

Part 3: LDC Catalog No. LDC2004S10

Part 4: LDC Catalog No. LDC2005S25

For Santa Barbara Corpus transcripts which have been reformatted in the TalkBank “CHAT” format, download the file SBCSAE.zip from the TalkBank website at http://www.talkbank.org/data/Conversation/

For the corresponding audio in WAV file format, go to http://www.talkbank.org/media/conversation/SBCSAE/