#### Chapter 3: Descriptive statistics
# (1) How many classes would you approximately want for a histogram of 4000 data points?
# (2) Compute the mean of the following numbers: 1, 2, 4, 8, 10, 12.
# (3) Based on their performance in a test, students were awarded the following grades (1=best, 6=worst):
grades.in.test<-rep(1:6, c(2, 15, 10, 4, 4, 2))
# (a) Compute the most appropriate measure of central tendency.
# (b) Compute the most appropriate measure of dispersion.
# (c) Represent the result graphically in a bar plot.
# (d) Represent the result graphically in a pareto-chart.
# (4) Load the file <201_03-04_uh(m).csv> into a dataframe UHM
# (5) Determine how many disfluencies occurred in each genre.
# (6) Represent in a graph how many disfluencies occurred in each genre.
# (7) Was the 990th disfluency produced by a man or a woman?
# (8) Generate separate summary statistics for the lengths in the two genres.
# (9) Determine how many disfluencies are longer than average.
# (10) Determine whether men or women produce more of the disfluencies that are longer than average.
# (11) What does this do? (If you cannot see that immediately from the plot, execute the function without the plotting.)
barplot(prop.table(table(FILLER, GENRE),2))
# (12) Sort the data frame according to the factor SEX (ascending) and, within SEX, according to the disfluencies (descending) and, within disfluencies, according to the lengths of the disfluencies (ascending).
# (13) Compute the 95% confidence intervals for the proportions of the three disfluencies and discuss briefly what the confidence intervals suggest concerning the different frequencies of the disfluency markers.
# (14) Compute the average positions in sentences for "uh" and "uhm" and their 95% confidence intervals, and discuss briefly what the confidence intervals suggest concerning the different average positions of these two disfluency markers.
# (15) Ten bilingual students (English/German) took one dictation in English and one in German. They made the following numbers of mistakes in English and German respectively:
# 29, 20, 10, 16, 12, 15, 25, 22, 20, 23
# 21, 19, 28, 28, 26, 18, 16, 22, 20, 28
# (a) Compute a measure of correlation to quantify the association between the numbers of errors.
# (b) Illustrate the correlation in a graph and interpret the results (in one sentence).
# (16) Compute the number of mistakes expected from a student in the German dictation, if that student made 12 mistakes in the English dictation (i.e., use german.dict as the dependent variable).
# (17) Now you also obtained the sexes of the students: students 2 to 6 were girls, the rest boys.
# (a) Enter this into R
# (b) Compute the average numbers of errors in the German dictation for boys and girls.
# (c) Represent the numbers of mistakes in the German dictation as a function of the sex of the students graphically.
# (18) Standardize the numbers of errors in the English dictation.
# (19) The file <201_03-04_vpcs.csv> contains data from a corpus study on the alternation of particle placement that was introduced in Section 1.3.1.
# - column 1: the number of the data point
# - column 2: whether the example is from spoken or written language
# - column 3: which construction is used
# - column 4: how complex the direct object is (3 levels)
# - column 5: how long the direct object is (in syllables)
# - column 6: whether the verb-particle construction is followed by a directional PP (2 levels)
# - column 7: whether the referent of the direct object is animate or inanimate (2 levels)
# - column 8: whether the referent of the direct object is concrete or abstract (2 levels)
# (a) Read in this file, make the column names available, and test whether the input was successful.
# (b) Represent the correlation between the choice of construction and the complexity of the direct object graphically.
# (c) Create a table reprenting the correlation between the choice of construction and the complexity of the direct object and briefly summarize the result.
# (d) Represent the correlation between the choice of construction and the length of the direct object graphically and briefly summarize the result.
# (e) Compute whether the choice of construction is inflenced by the length of the direct object.
# (f) Investigate whether the choice of construction depends on the animacy of the referent of the direct objects and the presence/absence of a directional prepositional phrase.
# (20) 50 students took a statistics exam, 80% passed. What is the 95%-confidence interval for this result?