Stefan Th. Gries
Home
Contact information
Disclaimer
Last updated: 8 July 2007

Teaching LSA.346 at the 2007 LSA Linguistic Institute at Stanford University


LSA.346: Quantitative corpus linguistics: a practical introduction with R

Overview

This course was a hands-on introduction to how to perform quantitative corpus-linguistic analyses with the open source software tool R. It was broadly based on my textbook Quantitative corpus linguistics with R: a practical introduction, scheduled to be published by Routledge next year. It had three parts. The first was a short general introduction to corpus linguistics as a discipline. The second and largest part introduced fundamentals of the open source programming language R: data structures as well as input and output, string/character operations as well as regular expressions, and some elementary programming. The participants then wrote their own small R scripts to process different corpora to generate various kinds of frequency lists, concordances etc. The third part provided a necessarily brief introduction to some fundamental aspects of statistical analysis with R. The participants did simple distributional tests to evaluate the kinds of corpus-linguistic data obtained in the second part of the course.


Links


R (with Tinn-R and SciViews); OpenOffice.org; the CorpLing with R Google group, which I moderate and which will host the companion website of my book