Stefan Th. Gries

Companion website of

Home

Quantitative corpus linguistics with R: a practical introduction

Contact information

published by Routledge, Taylor & Francis Group

Disclaimer

Here is the companion newsgroup for this book

Last updated: 25 September 2013

Errors, broken links, feedback, suggestions ... please let me know!


General information (plz read this first!)


This is the companion website of the following publication: Gries, Stefan Th. 2009. Quantitative corpus linguistics with R: a practical introduction. Routledge, Taylor and Francis Group. This website contains all the files you will need for making the most of this book and the case study assignments hosted here: all input files, all the R code, and all output files. There are two ways in which you can download the files: one way is to just download these three separate zip files: all input files , all the code and exercise boxes, and all output files. Another way allows you to pick and choose so that you (i) create target directories as suggested below (or similar ones if you use a Macintosh or a Linux/Unix machine), (ii) download the files you want to your computer, and (iii) extract the zipped files into the specified target directories.

To unzip the encrypted zipped files, use the passwords provided in the book and, say, 7-zip (Windows users), p7zip or zipeg (Macintosh users), or File-roller (Unix/Linux users). Ideally, you have also already downloaded and installed the other applications you will need most for the book. In case you haven't, here are links to the most relevant applications, R, Tinn-R or Notepad ++ (for Windows users), and LibreOffice. Many more corpus-linguistically relevant links can be found here.

Then, in the instructors' corner below the main download area providing the files for this book, instructors can find additional assignments, for which they can request an answer key from me. Finally, a list of errata can be found here.


Main download area

File type 1

File type 2

File

Target directory

Input files
(book & assignments)

corp_: corpus data to be searched


dat_: data for chapter 3/4 + Range
stat_: data for chapter 5

BNC corpus files and frequency list (SGML format)
BNC corpus file (XML format)
other corpus files
data files
statistics data files

C:/_qclwr/_inputfiles/


Code & output files
(book only)

all code and exercise boxes
output files

code, exercise boxes, and answer keys to exercise boxes
output files

C:/_qclwr/_scripts/
C:/_qclwr/_outputfiles/


Case studies
(assignments only)

assignments















scripts















output files

morphology 1: -ic/-ical adjectives
morphology 2: morphophonological reduction processes
syntax 1: the order of temporal subordinate clauses
syntax 2: prenominal adjective order
syntax 3: the locative alternation
semantics 1: antonymous adjectives
semantics 2: semantic prosody
semantics 3: significant collocations
pragmatics/text linguistics 1: dispersion plots
pragmatics/text linguistics 2: key words
other applications 1: mean lengths of utterances
other applications 2: lexical frequency profiles
other applications 3: accessing known websites
other applications 4: accessing search engine's hits
other applications 5: compiling a web-based corpus


morphology 1: -ic/-ical adjectives
morphology 2: morphophonological reduction processes
syntax 1: the order of temporal subordinate clauses
syntax 2: prenominal adjective order
syntax 3: the locative alternation
semantics 1: antonymous adjectives
semantics 2: semantic prosody
semantics 3: significant collocations
pragmatics/text linguistics 1: dispersion plots
pragmatics/text linguistics 2: key words
other applications 1: mean lengths of utterances
other applications 2: lexical frequency profiles
other applications 3: accessing known websites
other applications 4: accessing search engine's hits
other applications 5: compiling a web-based corpus

morphology 1: -ic/-ical adjectives
morphology 2: morphophonological reduction processes
syntax 1: the order of temporal subordinate clauses
syntax 2: prenominal adjective order
syntax 3: the locative alternation
semantics 1: antonymous adjectives
semantics 2: semantic prosody
semantics 3: significant collocations
pragmatics/text linguistics 1: dispersion plots
pragmatics/text linguistics 2: key words
other applications 1: mean lengths of utterances
other applications 2: lexical frequency profiles
other applications 5: compiling a web-based corpus

C:/_qclwr/_assignments/















C:/_qclwr/_scripts/















C:/_qclwr/_outputfiles/














Instructors' corner (additional assignments; solutions are available upon request to instructors only)

Assignment

Files

N-grams
Co-occurrence of alphabetical and order
Average sentence and word lengths
Split infinitives
Web concordancing
Indexing
Zero-derivation: run(s) vs. walk(s)
Chat files: extensions
BNC XML: lemma-tag frequency list
Replacing all sorts of numbers
Increase context in a previous concordance
Tagging subordinating conjunctions
Retrieving Spanish data from an XML-annotated parallel corpus
Fichtner's C
Spanish verb paradigms
Authorship attribution 1
Authorship attribution 2
Translation alignment from Europarl
Vocabulary growth / Type-token ratios
Larger concordance contexts
Multiple concordance matches
Retrieving a language from a dictionary file
Adding info on speakers from header
Retrieving all (overlapping) matches
(Non)-hyphenated words
Pronunciations from dictionary
maybe more soon …

no files needed other than the ones coming with the book
BNC (with SGML or with XML annotation)
no files needed other than the ones coming with the book
no files needed other than the ones coming with the book
no files needed other than the ones coming with the book
Pages from the book
no files needed other than the ones coming with the book
CORAL-ROM files
BNC (with XML annotation)
none
these input files plus the file <corp_gpl_long.txt> from above
BNC (with XML annotation) plus the file <corp_gpl_long.txt> from above
Version 3 of the United Nations General Assembly Resolutions Parallel Corpus
no files needed other than the ones coming with the book
Spanish verbs and their forms
Sample targets and reference texts
Sample targets and reference texts
Europarl Version 5
Hamlet and Macbeth
no files needed other than the ones coming with the book
no files needed other than the ones coming with the book
see assignment
BNC (with SGML annotation, at least file KB1)
no files needed other than the ones coming with the book
no files needed other than the ones coming with the book
dictionary file