Obtaining the Santa Barbara Corpus
Parts 1-4 of the Santa Barbara Corpus of Spoken American English are now available, for a total of approximately 249,000 words. The Santa Barbara Corpus includes transcriptions, audio, and timestamps which correlate transcription and audio at the level of individual intonation units.
*NEW*
The easiest way to access the Santa Barbara Corpus of Spoken American English is to download it for free here [SBCorpus_1.0.zip]. Note: To keep the files size small, this version is text only. For the full version with audio, check back this Fall, when we plan to make the audio available for free download too. |
The Santa Barbara Corpus of Spoken American English may be purchased from the Linguistic Data Consortium on CD’s and DVD’s, or downloaded from TalkBank, at the following web pages:
Part 1: LDC Catalog No. LDC2000S85
Part 2: LDC Catalog No. LDC2003S06
Part 3: LDC Catalog No. LDC2004S10
Part 4: LDC Catalog No. LDC2005S25
For Santa Barbara Corpus transcripts which have been reformatted in the TalkBank “CHAT” format, download the file SBCSAE.zip from the TalkBank website at http://www.talkbank.org/data/Conversation/
For the corresponding audio in WAV file format, go to http://www.talkbank.org/media/conversation/SBCSAE/
|