Stefan Th. Gries 
Teaching at the University of California, Santa Barbara 
Ling 104: Statistical Methods in Linguistics (Fall 2017)
Syllabus and overview 
This course is a handson introduction to fundamentals of statistical and data mining / machine learning methods in linguistics, it is based largely on the second edition (2013) of my Statistics for linguistics with R: a practical introduction. We begin by looking at a few basic notions of statistical analyses (e.g., variables, hypotheses, significance etc) and then discuss the logic of quantitative studies using the nullhypothesis falsification approach as well as how data should be set up for subsequent statistical evaluation. Then, we will explore data preparation and processing with the opensource programming language and environment R. The largest part is concerned with a variety of classification and regression tools such as different kinds of regression models, classification and regression trees, missing data imputation, and unsupervised learning. We use the open source software tool R ; note, therefore, that the course requires computer literacy beyond swiping, pinching, longtapping, and uploading/sending something to/via Facebook, Instagram, Snapchat, or whatever: If you install a program or download a file and you don't know 'where the program/file is' then or what unzipping a file means, you're wrong in this course. 



