#+ fig.width=12, fig.height=8
### Sometimes, you need functionality that is not implemented in R readily but that you would know how to program. However, because what you want to do might be a very simple thing, you may not even consider writing and naming a real function so an alternative is using a so-called anonymous function, i.e. a (typically very short) function that is defined on the fly, used, and never named and immediately forgotten.
x <- read.delim("104_06b_clauseorders.csv")
summary(x)
head(x)
# Imagine you want to know how many different entries each column has. There is a function called apply, which takes three arguments:
# - a (typically two-dimensional) data structure such as a matrix or a data frame;
# - a 1 or a 2 depending on whether you want to do something to the rows (1) or the columns (2) of a matrix / data frame;
# - a function that you want to apply to the rows or columns of the matrix / data frame
# Thus, you start with apply(x, 2, ...) but now there is no function that counts the number of different types that you could put into that ... slot. If there was one (called typecounter), you'd just write apply(x, 2, typecounter) and be done with it. However, you know what such a function typecounter would be doing, namely what you would do if you had a single vector of which you wanted the number of different types: length(unique(...)), Thus, you can just define this function anonymously in that slot:
apply(x, 2, function(qwe) length(unique(qwe)))
# what happens here is that R sees 'oh, a function is defined (anonymously), which takes one argument called qwe and what it does with qwe is count the number of unique elements, and that is the applied to every column of x.
# How about a function call that returns for each column a frequency table?
apply(x, 2, function(qwe) sort(table(qwe), decreasing=TRUE))
# How about a function call that returns for each column the number of types occurring only once?
apply(x, 2, function(qwe) sum(table(qwe)==1))
# another data set
x <- read.delim("104_04b_reactiontimes.csv")
summary(x) # 77 rows, but maybe cases with missing data (NA)
head(x) # e.g. there are three missing data points in rows 1 and 6 in the data
# this is how we find out how many complete and incomplete cases there are
table(complete.cases(x)) # there are only 48 complete cases, i.e. cases with no missing data at all
# but how about finding out how many NAs, if any, each case has?
table(apply(x, 1, function(qwe) sum(is.na(qwe))))
# 48 complete cases, 5 cases where one data point per row is missing, 2 where 2 data points per row are missing, 22 where three data points per row are missing
# (1) You want to write a function summarizer that takes a numeric vector and returns everything that summary returns, but also the standard deviation.
# In other words, this is the desired behavior:
# summarizer(1:10) # should return
# Min. 1st Qu. Median Mean 3rd Qu. Max. SD
# 1.00000 3.25000 5.50000 5.50000 7.75000 10.00000 3.02765
# How would you do this 'manually'?
c(summary(1:10), "SD"=sd(1:10))
# step 1: what information/data structures does the above need to work?
# 1:10
# step 2: wrap a function definition around the code that introduces all required information/data structures!
summarizer <- function (...) { # define what the function will do and plan slots for the required information
return( # return as a result of the function
c( # the combination of
summary(...), # the summary of the relevant vector
"SD"=sd(...) # the standard deviation you want to add
) # end of c(...)
) # end of return(...)
} # end of function definition
# step 3: replace the (names of the) data structures the function needs by (more) generic ones!
summarizer <- function (numeric.vector) {
return(c(summary(numeric.vector), "SD"=sd(numeric.vector)))
}
summarizer(1:10)
# (2) You want to write a function to that takes two numeric arguments and returns how the first argument compares to the second.
# In other words, this is the desired behavior:
# to(5, 3) # should return ">"
# to(3, 5) # should return "<"
# to(3, 3) # should return "="
# (3) You want to write a function cumul.summer that takes a numeric vector and returns the cumulative sums of from the first to all n values.
# In other words, this is the desired behavior:
# cumul.summer(1:5) # 1 3 6 10 15
# (4) You want to write a function table.sorty that takes a frequency table as input and generates as output a table (i) whose rows are sorted by row totals and (ii) whose columns are sorted by column totals such that the top left of the table is likely to contain the highest frequencies. Also, you want an option to include row & column totals
# In other words, this is the desired behavior:
set.seed(1)
a <- sample(letters[1:4], 100, replace=TRUE)
b <- sample(letters[5:8], 100, replace=TRUE)
# table.sorty(table(a, b))
# b
# a g f h e Sum
# b 10 10 6 6 32
# d 7 6 8 6 27
# c 4 6 6 5 21
# a 11 3 2 4 20
# Sum 32 25 22 21 100
# (5) You want to write a function significance.tester returns the number of asterisks for a p-value.
# In other words, this is the desired behavior:
# p.value(0.07) # "ns"
# p.value(0.04) # "*"
# p.value(0.009) # "**"
# p.value(0.0009) # "***"
# (6) Use R's function log (see ?log) to write a function logster that
# - takes as input an argument called x: the number of which to take a log;
# - takes as input an argument mybase: the base of the log (the default should be 10);
# - when x is 0, returns 0, that when x is positive, just computes the usual log to the base of mybase, but that, when x is negative so that the function log does not work, returns the negative of the log of -x.
# In other words, this is the desired behavior (which can be very useful for plotting frequencies but should not be used for values of x between 0 and 1):
# logster(100) # should return 2
# logster(100, 2) # should return 6.643856
# logster(0) # should return 0
# logster(-100) # should return -2
# logster(-100, 2) # should return -6.643856
# you may find the function sign useful, too: see ?sign.
# (7) Use a loop to write a function revster that takes as input a vector/factor x and returns x with its elements reversed.
# In other words, this is the desired behavior:
# revster(1:5) # should return 5 4 3 2 1
# revster(c("q", "w", "e")) # should return "e" "w" "q"
# (Using a loop for this is not really smart since one could just say revster <- function (x) { x[length(x):1] } but I want you to practice more involved approaches.)
# (8) Different countries use different ways to quantify cars' fuel efficiency. In the USA, you usually find mpg, i.e. how many miles you can drive with one gallon of gas. In countries in Europe, you often find liters/100km, i.e. how many liters of gas you need to drive 100km. Write a function mileage.converter that
# - takes as input an argument some.number: the measure of fuel efficiency to convert from one scale into the other
# - returns the corresponding fuel efficiency on the other scale.