Stefan Th. Gries

Links

Home

General info

Contact information

Corpus linguistics

Disclaimer

Statistics

Last updated: 31 December 2011

General software


General information

General overview


The LinguistList
The Linguistic Society of America (LSA)


Corpus linguistics

General overview


David Lee, Manuel Barbera, a corpus survey


List of references



Przemek Kaszubski (Adam Mickiewicz University)


Online books



A companion to digital humanities, Developing linguistic corpora: a guide to good practice


Journals



Corpus Linguistics and Linguistic Theory, Corpora, International Journal of Corpus Linguistics, ICAME Journal, Computational Linguistics, Literary and Linguistic Computing, Language Resources and Evaluation (formerly known as Computers and the Humanities), Computer Speech and Language, Empirical Language Research, Corpus; cf. also Citeseer and the ACL Anthology




Data




Corpora, databases, and web interfaces (English)



British National Corpus (BNC), Phrases in English (BNC), Variation in English words and phrases (BNC), BYU Corpus of American English, Christine, Collins Bank of English and Corpus Concordance Sampler: Free Demo, American National Corpus, Business Letter Corpus (BLC), Corpus of Late Modern English Texts, ICAME (incl. Brown and Frown, LOB and FLOB, Helsinki, and others), ICE, International Corpus of Learner English (ICLE), Just the Word, MICASE, MICUSP, (Parsed) Corpus of Early English Correspondence, Susanne, Santa Barbara Corpus of Spoken American English, The Switchboard Corpus, Time Magazine Corpus, Word Neighbours


Corpora, data, and web interfaces (other languages)



Croatian: Croatian National Corpus; Czech: Czech National Corpus; German: Cosmas German corpora, The NEGRA Corpus, The TIGER corpus, and the Leizpig Corpora Collection; Greek: Greek National Corpus; Hungarian: Hungarian National Corpus; Italian: La Repubblica Corpus; Polish: IPI PAN corpus of Polish and Polish subcorpus of the ICLE; Portuguese: Corpus do Português; Russian: Russian National Corpus; Scottish: Scottish Corpus of Texts and Speech; Spanish: Corpus del Español


(Specialized) Corpora, data, and web interfaces (multilingual)



The CELEX Database, CHILDES, JRC Acquis Multilingual Parallel Corpus, Linguistic Data Consortium (LDC) (commercial!), Corpus-based Multilingual Dictionaries, TalkBank, WaCKy corpora


Text collections



Etext center at the University of Virginia, FullBooks.com, Oxford Text Archive (OTA), Project Gutenberg, ReadPrint




Software




Overview



Kenji Kita (Tokushima University)


Concordancing software (freeware)



AntConc (Unix/Linux, Win, and Mac), ConcApp (Win), Corpus Wizard (Win), KWiCFinder (Win), Simple Concordance Program (Win), TextSTAT (Win), Xaira (Win), Concordance line for DOS, Poliqarp (Win, Linux/Unix), Multilingual concordancer (Java), Corpus Search 2 (Java), aConCorde (Java), Conc (MacOS), Concorder (MacOS); cf. Corpus Linguistics and Linguistic Theory 2.1:107-27 for a comparative review of many concordance programs


Taggers



ApplePieParser (Win, freeware), Morphy (Win, freeware), QTag, Sparse 2 (Win, freeware), WinBrill Tagger (Win, freeware)


Various



My Corpus Linguistics with R Google group, a good overview of a lot of annotation software, Bonito, Compleat Lexical Tutor, Culler corpus tool, Dexter annotator, ELAN, EXMARaLDA annotator, JBootTag, KfNgram, Linguistic Tree Constructor, MMAX2, N-gram software, Natural Language Software Registry, Natural Language Toolkit, NITE XML Toolkit, Range and WordCounter, Sense clusters, Summer Institute of Linguistics, Toolbox, UCS tool kit, UAM corpus tool, Web Concordancer
This is not really corpus linguistics but I include it anyway: Speech error database at the MPI-PsyLing


Statistics

Software (freeware)


the all-purpose tool R and RStudio


Information



My Statistics for linguists with R Google group, Handbook of Statistics (Tulsa, OK, USA), Virtual Statistics Lab (Rice University), Statistics.com, Simple Interactive Statistical Analysis


Scripts by myself



cf. here


General software

Kubuntu, Linux Mint, LibreOffice , KeepNote, VUE