A rather dull worKLOG. This is just a scratchpad for solutions to IT problems that might be useful to someone else. Expect no opinions, no brilliant insights and definitely no pictures of pets or children. Expect stack traces, code snippets and other hints for the Google Indexer.

Thursday, June 07, 2007

Boxplots for beginners

How to create Boxplots in R
Refs:
http://en.wikipedia.org/wiki/Boxplot
http://www.r-project.org/

Suppose you have data such as:
number_of_threads thread time
1 1 10
2 1 20.5
2 2 19.5
3 1 30
3 2 25
3 3 35

This table represents made up data describing some system testing, where the system is exercised with 1,2,3... concurrent users. You want to produce a boxplot showing the timings for each number of concurrent users. Here's the magic incantation in R:
Assuming the above file is saved to data.txt
1) Use R's Misc menu to change to the correct working directory
2) Load the data into a dataframe with
myresults = read.table("data.txt", header=TRUE)
3) Generate the boxplot with
boxplot( time ~ number_of_threads, myresults)

To jazz it up with some axis labels:

boxplot( time ~ number_of_threads, myresults, xlab="Concurrent users", ylab="time")

Adding several boxplots on the same axis:
boxplot( time ~ number_of_threads, myresults, add=TRUE)
*Warning* - if you don't see your new boxplot, then it's because for some nutty reason the plot doesn't automatically rescale, and you need to set the scale explicitly with ylim=c(0,40) or similar.
Making the boxes a particular colour:
boxplot( time ~ number_of_threads, myresults, col="yellow")
and adding a legend:
legend(2,9m c("plot1","plot2"), fill=c("red","yellow"))