University of California, Santa Barbara
Linguistics at UCSB
homepeopleresearchProgramsCoursesEventsNewsContact
research panel 2
Research Areas
Discourse & Grammar
Typology
Cognitive Linguistics
Field linguistics
Language Documentation
Prosody
Evolutionary linguistics
Sociocultural Linguistics
Applied Linguistics
Transcription
Language Areas
Santa Barbara Papers in Linguistics
Santa Barbara Corpus of Spoken American English
research
The Santa Barbara Corpus of Spoken American English
AcknowledgmentsCitationContentsSummariesRecordingsObtaining

Obtaining the Santa Barbara Corpus

Parts 1-4 of the Santa Barbara Corpus of Spoken American English are now available, for a total of approximately 249,000 words.  The Santa Barbara Corpus includes transcriptions, audio, and timestamps which correlate transcription and audio at the level of individual intonation units.

*NEW*
The easiest way to access the Santa Barbara Corpus of Spoken American English is to download it for free here [SBCorpus_1.0.zip]. Note: To keep the files size small, this version is text only. For the full version with audio, check back this Fall, when we plan to make the audio available for free download too.
Creative Commons License

SBCSAE by John W. Du Bois is licensed under a Creative Commons Attribution-No Derivative Works 3.0 United States License

The Santa Barbara Corpus of Spoken American English may be purchased from the Linguistic Data Consortium on CD’s and DVD’s, or downloaded from TalkBank, at the following web pages:

Part 1: LDC Catalog No. LDC2000S85

Part 2: LDC Catalog No. LDC2003S06

Part 3: LDC Catalog No. LDC2004S10

Part 4: LDC Catalog No. LDC2005S25

For Santa Barbara Corpus transcripts which have been reformatted in the TalkBank “CHAT” format, download the file SBCSAE.zip from the TalkBank website at http://www.talkbank.org/data/Conversation/

For the corresponding audio in WAV file format, go to http://www.talkbank.org/media/conversation/SBCSAE/