Stefan Th. Gries

Dispersion / adjusted frequency resources

Home

Contact information

Disclaimer

Last updated: 8 July 2011


R scripts to compute measures of dispersion and adjusted frequencies

R scripts


Script 1: dispersions1
Script 2: dispersions2
(The scripts incorporate the correction of a mistake in the computation of the normalized version of DP, for which I am very much indebted to Jefrey Lijffijt.)
Enter this into R to make the function from the first script (and change the "1" to "2" for the second one):

source("http://www.linguistics.ucsb.edu/faculty/stgries/research/dispersion/_dispersions1.r")¶
Then you can call the function as described in the article.


Reference files to download


If you use any of these lists, please cite the following paper as a reference: Gries, Stefan Th. to appear. Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics.


The British National Corpus Sampler



Here you can download files containing all words that occur in the BNC Sampler 10+ times together with their dispersion measures and adjusted frequencies; cf. bncsampler_readme.txt for details:
01q_bncsampler_output.zip (a zipped .txt file)
01q_bncsampler_output.ods (.ods file)


The British National Corpus Baby



Here you can download files containing all words that occur in the BNC Baby 10+ times together with their dispersion measures and adjusted frequencies; cf. bncbaby_readme.txt for details:
01q_bncbaby_output.zip (a zipped .txt file)
01q_bncbaby_output.ods (.ods file)


The spoken part of the British National Corpus World Edition (XML)



Here you can download files containing all words that occur in the spoken part of the BNC WE 10+ times together with their dispersion measures and adjusted frequencies; cf. bncxml-spoken_readme.txt for details:
01q_bncxmls_output.zip (a zipped .txt file)
01q_bncxmls_output.ods (.ods file)


The British Component of the Intl. Corpus of English



Here you can download files containing all words that occur in the ICE-GB 10+ times together with their dispersion measures and adjusted frequencies; cf. icegb_readme.txt for details:
01q_icegb_output.zip (a zipped .txt file)
01q_icegb_output.ods (.ods file)


Graphs to download


If you use any of these graphs, please cite the following paper as a reference: Gries, Stefan Th. 2010. Dispersions and adjusted frequencies in corpora: further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies (eds.), /Corpus linguistic applications: current studies, new directions/, 197-212. Amsterdam: Rodopi.


The spoken part of the British National Corpus World Edition (XML)



scatterplot comparing different measures of dispersion (a .png file, dimensions: 1600x1200)
scatterplot comparing different adjusted frequencies (a .png file, dimensions: 1600x1200)