Wednesday 29 April 2015

Comparing two drugs side by side

You might prefer to show your graphs independently rather than overlay them.
One way to do this is to use the par() function to divide your plotting area into two pieces.
par() is discussed here in the R Function of the Day archive and is mentioned here on the Quick R pages.
As with all functions, the syntax is important.
To plot two graphs, side by side this is the code: par(mfrow=c(1,2))
To plot four graphs, arranged in 2 rows and 2 columns, this is the code: par(mfrow=c(2,2))
It's good to know how to reset the layout: par(mfrow=c(1,1)) # back to one graph
So here's the output:



Here is the script (again with repetition for understanding):

###  I have detailed lots of things separately so that it's obvious what I'm doing. 
### The code could be made shorter and less repetitive.
###  This is the data   ###
# Data from multiple experiments

conc <- c(1.00E-08, 1.00E-07, 5.00E-07, 1.00E-06, 1.00E-05, 
          1.00E-08, 1.00E-07, 5.00E-07, 1.00E-06, 1.00E-05, 
          1.00E-08, 1.00E-07, 5.00E-07, 1.00E-06, 1.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 5.00E-06, 1.00E-05, 2.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 5.00E-06, 1.00E-05, 2.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 5.00E-06, 1.00E-05, 2.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05) 

# First drug is called Drug 49
dead.cells49 <- c(28.6, 30.1, 34.6, 47.7, 81.7, 
                  27.1, 27.2, 35, 47.9, 80.6, 
                  27.8, 27.9, 34.2, 47.9, 82.4, 
                  31.1, 37.6, 55.7, 89.1, 84.3, 85.2, 
                  30.8, 35.6, 55.9, 90.5, 85.2, 86.3, 
                  32.6, 34.7, 54.1, 89.8, 84.2, 87.1, 
                  32.2, 34.4, 46.1, 76.2, 84.3, 84.1, 
                  31.4, 37.4, 50.5, 75, 85.3, 84.8, 
                  27.6, 37.3, 46.3, 74.1, 81.6, 82.7, 
                  26.7, 28.4, 27.9, 59.6, 87, 86.1)

# Second Drug is called Drug 11
dead.cells11 <- c(28.4, 28.9, 28.6, 30.1, 90, 
                  27.8, 28.5, 29.2, 29.3, 89.9, 
                  28.6, 26.6, 27.3, 29, 89.7, 
                  25.8, 28.7, 31, 80.7, 85, 85.2, 
                  25.3, 28.2, 30, 81.4, 83.5, 87.2, 
                  25.9, 25.4, 28.4, 81.2, 81.4, 85.6, 
                  36.4, 36.2, 33.4, 52.3, 87.2, 97, 
                  33.9, 29.5, 32, 44.8, 84.9, 97.2, 
                  30.1, 30.1, 31, 46.5, 83.5, 96.1, 
                  22.8, 23.7, 25.8, 37.9, 73.8, 86.7) 

# Put the data into a data frame and transform the data to make it postive for fitting
data49 <- data.frame(conc,dead.cells49)
data49$nM <- data49$conc * 1000000000
data49$log.nM <- log10(data49$nM) 

data11 <- data.frame(conc,dead.cells11)
data11$nM <- data11$conc * 1000000000
data11$log.nM <- log10(data11$nM) 

##################  fit the data  ##################
# make sure logconc remains positive, otherwise multiply to keep positive values
# (such as in this example where the iconc is multiplied by 1000

fit49 <- nls(dead.cells49 ~ bot+(top-bot)/(1+(log.nM/LD50)^slope),
           data = data49,
           start=list(bot=20, top=95, LD50=4, slope=-12))

fit11 <- nls(dead.cells11 ~ bot+(top-bot)/(1+(log.nM/LD50)^slope),
             data = data11,
             start=list(bot=20, top=95, LD50=4, slope=-12))


##################  Plot the results  ##################
#this lets you graph your calculated equations nice and pretty
x <- seq(0,6, length=100)

y49 <- (coef(fit49)["bot"]+(coef(fit49)["top"]-coef(fit49)["bot"])/(1+(x/coef(fit49)["LD50"])^coef(fit49)["slope"]))
m49 <- coef(fit49)

y11 <- (coef(fit11)["bot"]+(coef(fit11)["top"]-coef(fit11)["bot"])/(1+(x/coef(fit11)["LD50"])^coef(fit11)["slope"]))
m11 <- coef(fit11)

# This is the key bit of code that divides the plot area in two
par(mfrow=c(1,2))

# plot the first graph - goes on the left side
plot(data49$nM, data49$dead.cells49, 
     log="x", 
     main="Drug 49", 
     xlab="Drug concentration (microM)", 
     ylab="Dead cells (% of cells)",
     xlim= c(500, 10000),
     ylim= c(20,100),
     xaxt = "n")   # suppresses the labels on the x-axis

# order here is important
# you have to complete the plot before you start the next one
# adds the axis they way we want it to look
axis(1, at=c(500, 1000, 2500, 5000, 10000),  lab=c("0.5","1","2.5","5","10"))

# adds the non-linear fitted line to the first graph
lines(10^x,y49, lty="dotted", col="blue")

# add the LD50s in the legend which allows nice positioning. 
rp = vector('expression',1)
rp[1] = substitute(expression(LD50(microM) == MYVALUE), 
                   list(MYVALUE = format((10^m49[3])/1000,dig=3)))[2]
legend("topleft", legend = rp, bty = 'n', cex = 0.75)


# plot the second graph - goes on the right side
plot(data11$nM, data11$dead.cells11, 
     log="x", 
     main="Drug 11", 
     xlab="Drug concentration (microM)", 
     ylab="Dead cells (% of cells)",
     xlim= c(500, 10000),
     ylim= c(20,100),
     xaxt = "n")   # suppresses the labels on the x-axis

# adds the axis they way we want it to look
axis(1, at=c(500, 1000, 2500, 5000, 10000),  lab=c("0.5","1","2.5","5","10"))

# adds the non-linear fitted line to the second graph
lines(10^x,y11, lty="dotted", col="blue")

# add the LD50s in the legend which allows nice positioning. 
rp = vector('expression',1)
rp[1] = substitute(expression(LD50(microM) == MYVALUE), 
                   list(MYVALUE = format((10^m11[3])/1000,dig=3)))[2]
legend("topleft", legend = rp, bty = 'n', cex = 0.75)



Comparing two drugs on the same plot

I really like the range of options for visualisation that are available in R. Today, I want to describe one of the options to compare two drugs.

The experimental set up is the same as we used previously, with the exception that we now have data from two drugs not one:
  • add various concentrations of a two novels drug to the cells in culture
  • leave for 48 hours
  • then measure how many cells are dead
  • the experiment was done multiple times with the doses improved as it progresses
The objective today is to draw a graph and calculate LD50s that allow us to compare the two drugs.  

Here is an illustration of a graph that contains the data from both drugs:


Here is the script I used to generate this graph:

###  I have detailed the code separately for each set of data
###  to make it more obvious what I'm doing. 
### The code could be made shorter and less repetitive.

###  This is the data   ###
# Data from multiple experiments

conc <- c(1.00E-08, 1.00E-07, 5.00E-07, 1.00E-06, 1.00E-05, 
          1.00E-08, 1.00E-07, 5.00E-07, 1.00E-06, 1.00E-05, 
          1.00E-08, 1.00E-07, 5.00E-07, 1.00E-06, 1.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 5.00E-06, 1.00E-05, 2.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 5.00E-06, 1.00E-05, 2.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 5.00E-06, 1.00E-05, 2.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05, 
          1.00E-07, 5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05) 

# First drug is called Drug 49
dead.cells49 <- c(28.6, 30.1, 34.6, 47.7, 81.7, 
                  27.1, 27.2, 35, 47.9, 80.6, 
                  27.8, 27.9, 34.2, 47.9, 82.4, 
                  31.1, 37.6, 55.7, 89.1, 84.3, 85.2, 
                  30.8, 35.6, 55.9, 90.5, 85.2, 86.3, 
                  32.6, 34.7, 54.1, 89.8, 84.2, 87.1, 
                  32.2, 34.4, 46.1, 76.2, 84.3, 84.1, 
                  31.4, 37.4, 50.5, 75, 85.3, 84.8, 
                  27.6, 37.3, 46.3, 74.1, 81.6, 82.7, 
                  26.7, 28.4, 27.9, 59.6, 87, 86.1)

# Second Drug is called Drug 11
dead.cells11 <- c(28.4, 28.9, 28.6, 30.1, 90, 
                  27.8, 28.5, 29.2, 29.3, 89.9, 
                  28.6, 26.6, 27.3, 29, 89.7, 
                  25.8, 28.7, 31, 80.7, 85, 85.2, 
                  25.3, 28.2, 30, 81.4, 83.5, 87.2, 
                  25.9, 25.4, 28.4, 81.2, 81.4, 85.6, 
                  36.4, 36.2, 33.4, 52.3, 87.2, 97, 
                  33.9, 29.5, 32, 44.8, 84.9, 97.2, 
                  30.1, 30.1, 31, 46.5, 83.5, 96.1, 
                  22.8, 23.7, 25.8, 37.9, 73.8, 86.7) 

# Put the data into data frames and 
# transform the data to make it postive for fitting
data49 <- data.frame(conc,dead.cells49)
data49$nM <- data49$conc * 1000000000
data49$log.nM <- log10(data49$nM) 

data11 <- data.frame(conc,dead.cells11)
data11$nM <- data11$conc * 1000000000
data11$log.nM <- log10(data11$nM) 

##################  fit the data  ##################
# make sure logconc remains positive, otherwise multiply to keep positive values
# (such as in this example where the iconc is multiplied by 1000

fit49 <- nls(dead.cells49 ~ bot+(top-bot)/(1+(log.nM/LD50)^slope),
           data = data49,
           start=list(bot=20, top=95, LD50=4, slope=-12))

fit11 <- nls(dead.cells11 ~ bot+(top-bot)/(1+(log.nM/LD50)^slope),
             data = data11,
             start=list(bot=20, top=95, LD50=4, slope=-12))


##################  Plot the results  ##################
#this lets you graph your calculated equations nice and pretty
x <- seq(0,6, length=100)

y49 <- (coef(fit49)["bot"]+(coef(fit49)["top"]-coef(fit49)["bot"])/(1+(x/coef(fit49)["LD50"])^coef(fit49)["slope"]))
m49 <- coef(fit49)  #this extracts the fitted LD50 for drug 49

y11 <- (coef(fit11)["bot"]+(coef(fit11)["top"]-coef(fit11)["bot"])/(1+(x/coef(fit11)["LD50"])^coef(fit11)["slope"]))
m11 <- coef(fit11)



# plot the first set of points
plot(data49$nM, data49$dead.cells49, 
     log="x", 
     main="Comparing LD50 of two drugs ", 
     xlab="Drug concentration (microM)", 
     ylab="Dead cells (% of cells)",
     xlim= c(500, 10000),
     ylim= c(20,100),
     xaxt = "n")   # suppresses the labels on the x-axis

# add the points for the second set of data
points(data11$nM, data11$dead.cells11, col = "red")

# add the axis they way we want it to look
axis(1, at=c(500, 1000, 2500, 5000, 10000),  lab=c("0.5","1","2.5","5","10"))

# adds the two non-linear fitted lines 
lines(10^x,y49, lty="dotted", col="black")
lines(10^x,y11, lty="dotted", col="red")

# adds the legend
legend("topleft", 
       inset=.02,
       c("Drug 49","Drug 11"),
       lty=c("dotted","dotted"),
       lwd=c(2.5,2.5),
       col=c("black","red"),
       cex = 0.75)

# add the LD50s in the legend which allows nice positioning. 
rp = vector('expression',2)
rp[1] = substitute(expression(Drug49-LD50(microM) == MYVALUE), 
                   list(MYVALUE = format((10^m49[3])/1000,dig=3)))[2]
rp[2] = substitute(expression(Drug11-LD50(microM) == MYVALUE), 
                   list(MYVALUE = format((10^m11[3])/1000,dig=3)))[2]
legend('bottomright', legend = rp, bty = 'n', cex = 0.75)


Thursday 23 April 2015

Functions - the work horses of R

By way of a disclaimer, I’m not an R expert. I’m an experienced biochemist that believes R is a very valuable tool. As part of my learning curve, I’ve been trying to understand the fundamentals of R. Two core fundamentals are functions and objects. I’m going to write about functions here.


Functions do things in R. You often know something is a function by the presence of a set of round brackets. Looking at the script I wrote to analyse a protein assay, there are two functions with brackets in the first three lines of code.


The first function is denoted by
c(0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000, 0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000)
- this is the combine function. 

The code:


prot <- c(0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000, 0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000)


combines all the values into the object “prot”.


The second function is denoted by plot(abs~prot). This function draws a graph.


There is also another function <-. This is an assignment operator and is also a function.


Examples of other functions are also used in the Protein Assay script:
  • lm()- a function for fitting linear models.
  • abline()- a function for adding a straight line to a plot
  • text()- a function for adding text to a plot
  • round()- a function for rounding numbers


To make the function do what we want, we add arguments inside the brackets. Then close the bracket. The plot(abs~prot) function has the arguments abs and prot which provide the data for the x and y coordinates of the points on the plot.

To improve the plot, we add more arguments:

plot(abs~prot,
    xlab = "[Protein] (microg/ml)",
    ylab = "Absorbance (570nm)",
    main = "Protein Assay 20th April 2015")

In case you hadn’t worked it out, xlab gives a label for the x-axis, ylab gives the label for the y-axis and main gives the heading for the graph. The syntax is important and takes a bit of time to learn but R will give you an error message or produce a result that you don’t want if you get it incorrect. Then you can correct it.


Finding out more about functions

To find out more about a particular function, you can use the help in R by typing:

?plot
or
help(plot)

Either of these will give you an output in the help window in R-Studio. The help documentation can be complicated and can take a bit of time to work out. Be patient with yourself.

You can try the example function:
example(plot) - this will run examples if they have been coded as part of the documentation for the function.

You can search the internet and play with the code that you find. If you search
“plot r” in Google, you will find lots of useful resource. I like these two:

The official "Introduction to R" is available here.

Wednesday 22 April 2015

Drawing a cell death curve and calculating an LD50

I'm going to show you the this analysis before the script.
We do a lot of drug testing in our laboratory at Cardiff and it's a valuable thing for biochemists and pharmacologists. To do the analysis, most of the lab usually use a different (expensive) statistical package but I like R.
The experimental set up, done by a student in the lab, is as follows:
  • using some cells and add various concentrations of a novel drug
  • leave for 48 hours
  • then measure how many cells are dead
  • the experiment was done four times with the doses improved each time
Here is the graph and calculated LD50 - a measure of how good the drug is:


Here is the script I used to generate this graph:

### START of SCRIPT 

###  this is the data   ###
# data from four experiments
conc <- c(5.00E-07, 1.00E-06, 1.00E-05, 
          5.00E-07, 1.00E-06, 5.00E-06, 1.00E-05, 2.00E-05, 
          5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05, 
          5.00E-07, 1.00E-06, 2.50E-06, 5.00E-06, 1.00E-05)
dead.cells <- c(34.6, 47.7, 81.7, 
                37.6, 55.7, 89.1, 84.3, 85.2, 
                34.4, 46.1, 76.2, 84.3, 84.1, 
                24.5, 26.1, 60.6, 82.7, 87)

# transform the data to make it postive and put into a data frame for fitting 
data <- as.data.frame(conc)   # create the data frame
data$dead.cells <- dead.cells
data$nM <- data$conc * 1000000000
data$log.nM <- log10(data$nM) 

###  fit the data  ###
# make sure logconc remains positive, otherwise multiply to keep positive values
# (such as in this example where the conc is multiplied by 1000000

fit <- nls(dead.cells ~ bot+(top-bot)/(1+(log.nM/LD50)^slope),
             data = data,
             start=list(bot=20, top=95, LD50=3, slope=-12))

###  Plot the results  ###
#this lets you graph your calculated equations nice and pretty

x <- seq(0,6, length=100)
y <- (coef(fit)["bot"]+(coef(fit)["top"]-coef(fit)["bot"])/(1+(x/coef(fit)["LD50"])^coef(fit)["slope"]))
m <- coef(fit)

# plot the points first
plot(data$nM, data$dead.cells, 
     log="x", 
     main="Drug Dose Response and LD50", 
     xlab="Drug concentration (microM)", 
     ylab="Dead cells (% of cells)",
     xlim= c(500, 10000),
     ylim= c(20,100),
     xaxt = "n")   # suppresses the labels on the x-axis

# adds the x-axis labels they way I want it to look
axis(1, at=c(500, 1000, 2500, 5000, 10000),  lab=c("0.5","1","2.5","5","10"))

# adds the fitted non-linear line 
lines(10^x,y, lty="dotted", col="red", lwd =2)

# add the LD50 in the legend which allows nice positioning. 
rp = vector('expression',1)
rp[1] = substitute(expression(LD50(microM) == MYVALUE), 
                   list(MYVALUE = format((10^m[3])/1000,dig=3)))[2]
legend('topleft', legend = rp, bty = 'n')

# END OF SCRIPT

To prepare this script, I used the following sources:
Thanks to Mel for the data. 

Monday 20 April 2015

Let's analyse a protein assay....

Hopefully, you've already downloaded R and R-Studio. R-Studio is not essential.

Here is a script to do a protein assay:

# Protein Concentrations
prot <- c(0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000, 0.000, 0.016, 0.031, 0.063, 0.125, 0.250, 0.500, 1.000) 

# Absorbance from my protein assay
abs <- c(0.329, 0.352, 0.349, 0.379, 0.417, 0.491, 0.668, 0.956, 0.327, 0.341, 0.355, 0.383, 0.417, 0.446, 0.655, 0.905)

#Plot the data simply
plot(abs~prot)

#Calculate the line using the linear model function
line <- lm(abs~prot)

#Draw the line
abline(line)

#Improve the graph:
plot(abs~prot, 
     xlab = "[Protein] (microg/ml)",
     ylab = "Absorbance (570nm)",
     main = "Protein Assay 20th April 2015")
abline(line)

r2 <- round(summary(line)$r.squared, 3)
mylabel = bquote(italic(R)^2 == .(format(r2, digits = 3)))
text(x = 0.2, y = 0.9, labels = mylabel)


#Equation of a line y = mx + c
#In our case abs = slope * prot + intercept
# ukn.prot = (abs - intercept)/slope
int <- summary(line)$coefficients[1]
slope <- summary(line)$coefficients[2]
mylabel = bquote(y == .(format(slope, digits = 3))*x + .(format(int, digits = 3)))
text(x = 0.2, y = 0.8, labels = mylabel)

#now calculate some unknown protein concs from absorbances
#put the unknowns into a vector
abs.unknowns <- c(0.554, 0.568, 0.705)
#rearrange the equation of the line to ukn.prot = (abs - intercept)/slope
prot.unknowns <- (abs.unknowns - int)/slope

#put the answers on the graph
text(x = 0.8, y = (0.6), "Abs")
text(x = 0.92, y = (0.6), "Prot")
for (i in 1:length(abs.unknowns)){
  text(x = 0.8, y = (0.6 - i/20), abs.unknowns[i])
  text(x = 0.92, y = (0.6 - i/20), round(prot.unknowns[i], 3))
}

# END OF SCRIPT



This can be copy and pasted into the script window of R-Studio. 




Then each line can be run one by one by pressing the Run button:



This will generate the following output:


Getting Started with R-Studio...

I think R-Studio makes R easier

For me, finding and using R-Studio was a key step to coming to grips with R. R-Studio is just one of the IDEs - an integrated development environments - available for R. I use it because:

  • it was the first one I found
  • it's free
  • the interface looks similar in both Mac and PC

For these reasons, I will be introducing and using R-Studio in the one day course. Prior to the course, it's important to:

  • download R - it's here.
  • download R-studio - here is the R-studio home page.
  • ideally check that it all works by following the rest of the information on this page. 

Starting R-Studio

Please start the R-studio software.
This should give you a window like this:


Choose "New Directory"

You get this:


Choose "Empty Project"


You need to give a name to your project and choose a location to save the files into. 
Try to choose a name that is meaningful to you. 

The name I chose is "MyAnal20150420"

This now provides this kind of window:


This area allows you to use R in a slightly easier way than just the prompt. That is a bit more interactively. 

Orienting you in R-Studio

The window on the left that makes up most of this screen is the Console. 
Please press the top right hand corner of this window:


This changes the window to this:


This is the same window with some explanations:


Don't worry if this doesn't make much sense yet.
To make R do stuff you can write directly in the console window, or you can write some "scripts", you will run them to add data and to make graphs and then you will get the output.

To check everything is working go to the window on the bottom and type:
2 + 2
and then press return....

You should get 4, as per this picture:




CONGRATULATIONS - YOU'VE DONE YOUR FIRST CALCULATION WITH R.
You need to have done this BEFORE you join us on the R for Biochemists Training Day.
I will be adding a post about how to draw a simple graph over the next few days.





Wednesday 8 April 2015

Why, as a biochemist, I use R...

I started using R because I needed more functionality than Excel. I didn't want to pay for Prism or SPSS. I decided the best way forward was to learn how to use R.

My key aim was to learn some advanced visualisation skills. I have been a fan of flowingdata.com for many years. I bought Nathan Yau's book, Visualise This as a new year gift for 2014. It showed me how to use R to make heat maps, nicely stacked bar charts and interesting charts.

Then, I found R-Studio and I can honestly say that I haven't look back. R, within R-Studio, is my goto program for data analysis and visualisation. I published my first paper using R to analyse proteomic data last year and I will publish more in the future.

Here are some of the key advantages of using R for academic research: 

R can do analysis and visualisations that Excel and Prism can’t. 

These include generating lots of pairwise graphs to detect correlations, overplotting scatter plots, performing cluster analysis and heat maps.

R can record your workflow and then you can share it. 

R is more difficult because it uses command line to perform data analysis. However, this is also one of the key advantages. Because it uses command line, it also records a history of what you have done and allows you to write scripts. These scripts can be shared, easily reproduced and automated.

R is free! 

R is open source. It costs nothing to buy or update. I really like this.

There are free programs that interact with R that make it easier to use. 

I use R-Studio. These programs allow you to work with R in a slightly easier way. Again these are free and they are cross platform. They look the same on a PC or a Mac. 

There is lots of free help!
There is a large community that provides advice and help about R. Put a search into your web browser and you come across lots of people who have experienced similar problems. With a bit of work, you can apply these to your data.
I used my three favourite recently to find the code to draw a figure legend:
  1. Quick R - a site put together by Rob Kabacoff.
  2. StackOverflow - a place with LOTS of good answers - read before asking.
  3. R-bloggers - an aggregate of R blog sites - lots of new info every day!