Stefan Th. Gries
Home
Contact information
Disclaimer
Last updated: 11 July 2008

Links
General info
Corpus linguistics
Statistics
General software


General information

General overview

The LinguistList
The Linguistic Society of America (LSA)


Corpus linguistics

General overview

David Lee, Manuel Barbera, a corpus survey


List of references


Przemek Kaszubski (Adam Mickiewicz University)


Journals


Corpus Linguistics and Linguistic Theory, Corpora, International Journal of Corpus Linguistics, ICAME Journal, Computational Linguistics, Literary and Linguistic Computing, Language Resources and Evaluation (formerly known as Computers and the Humanities), Computer Speech and Language, Empirical Language Research; cf. also Citeseer and the ACL Anthology




Data



Corpora, databases, and web interfaces (English)


British National Corpus (BNC), Phrases in English (BNC), Variation in English words and phrases (BNC), BYU Corpus of American English, Collins Bank of English and Corpus Concordance Sampler: Free Demo, American National Corpus, Business Letter Corpus (BLC), Corpus of Late Modern English Texts, ICAME (incl. Brown and Frown, LOB and FLOB, Helsinki, and others), ICE, International Corpus of Learner English (ICLE), Just the Word, MICASE, MICUSP, (Parsed) Corpus of Early English Correspondence, The Switchboard Corpus, Time Magazine Corpus, Word Neighbours


Corpora, data, and web interfaces (other languages)


Croatian: Croatian National Corpus; Czech: Czech National Corpus; German: Cosmas German corpora, The NEGRA Corpus, The TIGER corpus, and the Leizpig Corpora Collection; Greek: Greek National Corpus; Hungarian: Hungarian National Corpus; Italian: La Repubblica Corpus; Polish: IPI PAN corpus of Polish and Polish subcorpus of the ICLE; Portuguese: Corpus do Português; Russian: Russian National Corpus; Scottish: Scottish Corpus of Texts and Speech; Spanish: Corpus del Español


(Specialized) Corpora, data, and web interfaces (multilingual)


The CELEX Database, CHILDES, JRC Acquis Multilingual Parallel Corpus, Linguistic Data Consortium (LDC) (commercial!), Corpus-based Multilingual Dictionaries, TalkBank, WaCKy corpora


Text collections


Etext center at the University of Virginia, FullBooks.com, Oxford Text Archive (OTA), Project Gutenberg, ReadPrint




Software



Overview


Kenji Kita (Tokushima University)


Concordancing software - freeware :-)


AntConc (Win), ConcApp (Win), Corpus Wizard (Win), KWiCFinder (Win), Simple Concordance Program (Win), TextSTAT (Win), WinConcord (Win), Xaira (Win), Concordance line for DOS, Poliqarp (Win, Linux/Unix), Multilingual concordancer (Java), Corpus Search 2 (Java), aConCorde (Java), Conc (MacOS), Concorder (MacOS); cf. Corpus Linguistics and Linguistic Theory 2.1:107-27 for a comparative review of many concordance programs


Concordancing software - commercial :-(


Collocate (Win), Concgram (Win), Concordance (Win), MonoConc Pro (Win), ParaConc (Win), WordCruncher (Win), WordSmith Tools (Win), Corpus Presenter (Win; here's a review with further 'comments'); cf. Corpus Linguistics and Linguistic Theory 2.1:107-27 for a comparative review of many concordance programs


Taggers


ApplePieParser (Win, freeware), Morphy (Win, freeware), QTag, Sparse 2 (Win, freeware), WinBrill Tagger (Win, freeware)


Various


My Corpus Linguistics with R Google group, a good overview of a lot of annotation software, Bonito, Compleat Lexical Tutor, Culler corpus tool, Dexter annotator, ELAN, EXMARaLDA annotator, JBootTag, KfNgram, Linguistic Tree Constructor, MMAX2, N-gram software, Natural Language Software Registry, Natural Language Toolkit, NITE XML Toolkit, Range and WordCounter, Sense clusters, Summer Institute of Linguistics, Toolbox, UCS tool kit, UAM corpus tool, Web Concordancer
This is not really corpus linguistics but I include it anyway: Speech error database at the MPI-PsyLing


Statistics

Software - freeware :-)

the all-purpose tool R , R web, Free Statistical Software, SciLab, smaller web-based analyses at Simple Interactive Statistical Analysis, PHYLIP, Cluto and the Windows version gCluto, Cluster 3.0, and GeneCluster 3.0 for various kinds of cluster analysis, the SPSS-like package Stats4U (formerly known as OpenStat 4)


Information


My Statistics for linguists with R Google group, Handbook of Statistics (Tulsa, OK, USA), Virtual Statistics Lab (Rice University), Statistics.com, Statistics Glossary (Lancaster, UK), Simple Interactive Statistical Analysis


Scripts by myself


cf. here


General software

This is a list of my links to (a) useful websites and (b) some of my favorite software programs for Windows PCs. I recommend those (especially to students) because
(i) the websites and programs are extremely useful and sometimes even superior to what expensive proprietary software can do;
(ii) as freeware or shareware (sometimes even open source), the programs are extremely cheap to get;
(iii) as freeware or shareware (sometimes even open source), the programs are legal to download and use;
(vi) they are often less susceptible to malware than proprietary programs.

Your PC on a USB stick

PortableApps.com and John Haller as well as FOSS tools


Open source software / freeware


Freeware utilities, Open source Windows applications, Open source alternatives to commercial software, Open source God, TinyApps, Webi


Office


OpenOffice , EasyOffice 7, Google Docs and Spreadsheets


Internet


Internet browser: Firefox (my favorite extensions: CustomizeGoogle, DOM Inspector, FasterFox, FireFTP, GreaseMonkey, Gspace, LinkChecker, PDF DownloadRSiteSearch, Session Manager, ShowIP, Zotero); Email software: Thunderbird; FTP software: FileZilla; Download manager: FlashGet, FreshDownload; Website saver: Webspider; Voice-over-IP phone: Skype; Instant messaging: Gaim, Miranda, Trillian; Website designer: NVU, decoding winmail.dat attachments: WMDecode


Security


Antivirus: AVG, AntiVir, avast! 4 home edition, bitdefender home edition; Spyware: AdAware; Encryption: PGP and TrueCrypt; Firewall: Sygate Personal Firewall, Kerio Personal Firewall (lim. ed.), and ZoneAlarm; Anonymous surfing: Torpark; JAP; Shields Up - test your PC's security; secure passwords


System tuning/monitoring


SiSoft Sandra, ProcessExplorer


File compression


7-Zip, WiZ, Zipeg


File handling


renaming: EasyRename, FileRename.; splitting: MaxSplitter, Chainsaw; merging: AF Merge your files; imaging: SelfImage, DriveImage XML


Text editing/processing


Lyx, Notepad++, NoteTab Light, TextPad, JEdit, EditPlus 2, Tinn-R (esp. useful for R), SciTe (esp. useful for Perl), The Regulator, Regulazy


PDF generation/handling


FoxIt Reader, Cute PDF Writer, FreePDF, eXPert PDF reader, Ghostscript, and Ghostview


Programming


Strawberry Perl (but cf. also Active Perl, and more generally Perl.com, Perl.org and CPAN), Active Python with NumPy and SciPy, and R (language)


Harddrive simulation


Daemon


Multimedia


IrfanView, Gimp, Songbird


Various


Mozilla/Sunbird Calendar, Google Calendar, TurboNote