Most of the audio recordings were originally made on Digital Audio Tape (DAT), recorded in stereo at 32 kHz or 48 kHz, on Sony TCD-D6 or TCD-D7 portable DAT recorders, using small, high quality stereo microphones. (A few early recordings were made on high quality analog cassette recorders.)
The audio data as published by the Linguistic Data Consortium consist of 16-bit, stereo, 22.05 kHz audio files in WAV format (PCM).
Personal names of speakers on the recordings, as well as other identifying information such as telephone numbers, have been replaced by pseudonyms in the transcripts, and have been altered to preserve the anonymity of the speakers by filtering the audio files to make these portions of the recordings unrecognizable. Pitch information is still recoverable from these filtered portions of the recordings, but the amplitude levels in these regions have been reduced relative to the original signal. A separate filter list file (e.g. SBC001.flt) associated with each transcription/waveform file pair (e.g. SBC001.trn, SBC001.wav) is provided to list the beginning and ending times of the filtered regions. (The file SBC040.flt is empty indicating there was no personal information to filter out.)
The filtering was done using a digital FIR low-pass filter, with the cut-off frequency set at 400 Hz. The effect of the filter was gradually faded in and out at the beginning and end of the regions over a 1,000 sample region, roughly 45 milliseconds, to avoid abrupt transitions in the resulting waveform.
The following additional files are included on the published CD’s and DVD’s from the Linguistic Data Consortium:
segment.txt explanation of the information contained in segment.tbl
segment.tbl information about the speech event context
segment_summaries.txt brief summary of the content of each discourse segment
speaker.txt explanation of the information in speaker.tbl
speaker.tbl speaker demographic information
table.txt description of file names and informal titles
annotations.txt list of conventions and prosodic annotations
|