R for Biochemists: May 2016

Friday 27 May 2016

Working with multiple Image Files....

I showcased this script, analysing multiple images, during our Image Analysis Workshop today. It was a very interesting morning. Dr Joaquín de Navascués supplied the Drosophila embryo images that we are analysing here.

In my previous two scripts, I have looked at analysing an image step by step. However, what we really want to do is to analyse multiple images and run the same analysis across all of them. One good way to do this in R is to bring all the images into one object - a list.

This allows us to write functions and then use the lapply( ) function to apply it to each element of the list.

The script below downloads 48 images from a folder on Github. These images are turned grey to allow masking. Then the regions of fluorescence are identified and the number of regions found with each staining is determined. We can look at the distribution of cells through the different layers.

The layers detail the staining of cells in a Drosophila embryo with different methods. The include fluorescence markers and nuclear stains.

Then I have extracted some data into a data frame and drawn graphs with ggplot2. Here are the graphs:

The graphs imply that the cell number as determined by the nuclear staining (blue graph) are similar in all the layers. They layers are generated as the fly embryo is imaged horizontally. The numbers of cells detected with Stain number 1 also stays constant while the number of cells detected with Stain number 2 diminish as we transit down the embryo.

Here is an example of the images:

Three of the images from the stack of 48. Left hand panel (green) is stain 1, middle panel is nuclei staining, right hand panel (white) is stain 2.

Here is the script. It's a bit long - 207 lines ! It might be easier to cut and paste from Github.
SCRIPT START
# install.packages("EBImage", "ggplot2", "RCurl") # if required.

library(EBImage)
library(ggplot2)
library(RCurl)

# this is a link to the list of files in our folider
link <- "https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/images/imageSeq/filenameslist"
# the download.file() function downloads and saves the file with the name given
download.file(url=link, destfile="filenameslist", mode="wb")
load("filenameslist")

# how many files are in this list?
length(filelist)
# 48

# this creates the first function called downloadImages
downloadImages <- function(x){
# assemble URLs
url <- paste0("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/images/imageSeq/", x)
cat("Image downloaded") # so that I know something is happening.
img <- readImage(url)
}

# read in all the files into a list using the lapply( ) function
# lapply - applies a function to a list or a vector.
# in this case, it applies the function readImage() to each element in the
# object filelist
# lapply( ) syntax: name of list, name of function
images <- lapply(filelist, downloadImages)

# it takes a while to download 48 images - be patient

# we now have all 48 images in a list
# to show us that R knows this is a list we can use the function mode( )
mode(images) # output to console says "list"

# to access a single image in the list we use double square brackets
display(images[[2]]) # display the second image in the list

# we can display the image brighter
display(images[[2]]*4, method = "raster")

# it might be good to save this locally so that it doesn't need
# to be downloaded again
# CHECK where you want to save it first
# setwd("SomeWhereYouWantToSaveTheData")
# setwd("/Users/paulbrennan/Desktop") # example
save(images, file = "Dros_Images")

# this can then be reloaded using the load( ) function
load("Dros_Images")

# Next step to alter the list of images a a group
# make all the images greyscale
# http://stackoverflow.com/questions/20347025/convert-rgb-image-to-one-channel-gray-image-in-r-ebimage
# create another function...
makeGrey <- function(img){
print("One image grey")
channel(img, mode = "grey")
}

# then apply the function to the list of images using lapply
# lapply( ) syntax: (name of list, name of function)
images.grey <-lapply(images, makeGrey)

# we have now created a list of grey scale images.
display(images.grey[[2]]*4, method = "raster) # allows to check.

# write a function to create image masks
maskCells <- function(img1){
# blur the image
img1 <- gblur(img1*2, sigma = 5)

# apply a Otsu threshold
img1.mask <- img1 > otsu(img1)

# generate an image with different values for each connected region
# key here is the bwlabel( ) function
img1.mask <- bwlabel(img1.mask)

# show this as a coloured blobs
display(colorLabels(img1.mask), method = "raster")


# the bwlabel() function 'counts' the blobs
count <- max(bwlabel(img1.mask))

# this outputs the count to us
cat('Number of cells in this image =', count,'\n')

return(img1.mask)
}

# make the image masks using the function and the lapply
images.mask <- lapply(images.grey, maskCells)
# result is a list of image masks

# write function to extract number of objects masked
countRegions <- function(img1.mask){return(max(bwlabel(img1.mask)))}

# to extract numbers from the image masks use lapply and function again
numbers <- lapply(images.mask, countRegions)
# result is a list of 48 numbers in this case

# use the numbers to make some plots
# in order to graph the numbers we put them into a data frame
numb <- unlist(numbers)
df <- data.frame(numb)
df$type <- c("memb", "st1", "nuc", "st2")
df$layer <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9,
10,10,10,10,11,11,11,11,12,12,12,12)

df.st1 <- subset(df, df$type == "st1")
df.st2 <- subset(df, df$type == "st2")
df.nuc <- subset(df, df$type == "nuc")

df.2 <- as.data.frame(cbind(df.nuc$layer, df.nuc$numb, df.st1$numb, df.st2$numb))
colnames(df.2) <- c("layer", "nuc", "st1", "st2")

# using the dataframe, I am making plots of the number of stained cells across the various layers
# make the plot for the numbers of nuclei
nucHist <- ggplot(df.2,
aes(x=as.factor(layer),
y=nuc)) +
geom_bar(stat="identity", fill = "blue") +
ggtitle("Nuclei per layer") +
ylab("Number of regions") +
xlab("Layers from top to bottom") +
theme_bw()
# show the plot
nucHist

# make the plot for the numbers of regions stained with stain number 1
stain1Hist <- ggplot(df.2,
aes(x=as.factor(layer),
y=st1)) +
geom_bar(stat="identity", fill = "green") +
ggtitle("Stain number 1") +
ylab("Number of regions") +
xlab("Layers from top to bottom") +
theme_bw()

# make the plot for the numbers of regions stained with stain number 2
stain2Hist <- ggplot(df.2,
aes(x=as.factor(layer),
y=st2)) +
geom_bar(stat="identity", fill = "grey") +
ggtitle("Stain number 2") +
ylab("Number of regions") +
xlab("Layers from top to bottom") +
theme_bw()

# from: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols: Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
library(grid)

# Make a list from the ... arguments and plotlist
plots <- c(list(...), plotlist)

numPlots = length(plots)

# If layout is NULL, then use 'cols' to determine layout
if (is.null(layout)) {
# Make the panel
# ncol: Number of columns of plots
# nrow: Number of rows needed, calculated from # of cols
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}

if (numPlots==1) {
print(plots[[1]])

} else {
# Set up the page
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

# Make each plot, in the correct location
for (i in 1:numPlots) {
# Get the i,j matrix positions of the regions that contain this subplot
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}

# run code above to make multiplot function first
multiplot(stain1Hist, nucHist, stain2Hist, cols = 3)
SCRIPT END

Friday 20 May 2016

Extracting fluorescence of objects from an Image with R...

In the last blog post, I showed how to identify regions in an image of cells.
This is the image:

It shows about 40 or so cells stained for a protein of interest.

Using the R package, EBImage, it's possible to identify these bright images, count the numbers and then extract and calculate the fluorescence signal in each region.

Here is a graph of the fluorescence signal for each of the 41 regions:

The script that does this is here:

SCRIPT START
# showing how to extract features from an image with EBImage

library(EBImage)
library(ggplot2)

# load up image
# the readImage( ) function will work on URLs to import image into R
c2 <- readImage("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/images/Dros_c2.tif")
# creates an object called c2

# show a brighter version of the image in the R graphics window
display(c2*4, method = "raster") #"raster" method means within R

# create the "mask" of this image by blur, threshold and counting
c2.b <- gblur(c2*2, sigma = 5) #blur
c2.t <- c2.b > otsu(c2.b) # apply otsu threshold
c2.t.cnt <- bwlabel(c2.t) # count the 'regions'

# show this as a coloured blobs
display(colorLabels(c2.t.cnt), method = "raster")
display(colorLabels(c2.t.cnt)) # in a browser

# next step is to extract some of the features about these objects.
# first extract the basic features (as defined by EBImage).
# first of these is mean intensity (fluorescence per pixel for the object. )
# this function creates a matrix with each object in a row
# these are calculated using the thresholded image and the original image
ftb <- computeFeatures.basic(c2.t.cnt, c2)

# to get a vector of the mean intensities
m.intent <- ftb[,1]
m.intent # show in Console
# there is one value for each shape...

# next category is shapes
# these a calculated from the thresholded image
fts <- computeFeatures.shape(c2.t.cnt)
area <- fts[,1] # select first column from object called fts
perimeter <- fts[,2] # select second colum from object called fts

# put the data into a dataframe (a different kind of R object)
df <- as.data.frame(m.intent)
df$area <- area
df$signal <- df$m.intent * df$area
df$regions <- seq(1:nrow(df))

# lots of ways to plot but one of the best is ggplot2
# plot the area of each region on the mask
ggplot(df,
aes(x = regions, y=area)) +
geom_bar(stat="identity")

# plot the signal of each region on the mask
ggplot(df,
aes(x = regions, y=signal)) +
geom_bar(stat="identity")

# plot mean intensity
ggplot(df,
aes(x = regions, y = m.intent)) +
geom_bar(stat="identity")

# dot plot of area v signal
ggplot(df,
aes(x=area, y=signal)) +
geom_point()

# plot the signal of each region on the mask
# in a slightly nicer way
ggplot(df,
aes(x = regions, y=signal)) +
geom_bar(stat="identity") +
ggtitle("Fluorescence within each region of the image") +
xlab("Regions from top right to bottom left of the image") +
ylab("Total fluorescence in the region") +
theme_bw()

# for some help on ggplot2, this R-Cookbook site is good.
# http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
SCRIPT END

There is lots of extra information about the features that can be extracted here:

EBImage Reference Manual

Sunday 15 May 2016

Counting and identifying stained cells step by step

On Friday, Dr Joaquín de Navascués and I held the first of two Image Analysis Workshops at Cardiff University. We had a good audience with PhD students and researchers. Joaquin led the first session which explored the fundamentals of images and using Image J or Fiji to open and explore those images.

Key points included:

good images start at the microscope - try not to over saturate your images during collection.
protect your original files as it can be easy to alter them permanently and lose data.
digital pictures are arrays of numbers.
these numbers can be visualised and transformed in lots of different ways.

In just under two weeks time, it's my turn. I am going to talk about using R to analyse images. I like R because it allows the development of reproducible, sharable and scalable workflows. The aim of the second workshop is to show an automated workflow that attendees can adapt to their own work if they want.

The key package that is useful for analysing images in R is EBImage developed by members of the EMBL and the EBI. It's a powerful and useful package.

This script below is one that will probably be used during the workshop. It is about using R to count the numbers of cells in an image. I'm going to go through the process step by step. The steps are as follows:

Download the image from Github
Make the image a little brighter
Blur the image using a Gaussian filter
Apply an Otsu threshold the image to convert to a binary image
Count the connected regions - i.e. the cells.
Display the regions in a multi-coloured output.

Here is a confocal microscope picture and the image with the cell regions identified:

Here is the script that identifies and counts the regions:

SCRIPT START
# aim of this script is to identify and count cells....
library(EBImage)
library(RCurl)

# this is the link to the image
link <- "https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/images/Dros_c2.tif"
# the download.file() function downloads and saves the file with the name given
download.file(url=link, destfile="file.tif", mode="wb")

# import image of stained cells into R
c2 <- readImage("file.tif")
# creates an object called c2

# show the image in the R graphics window
display(c2, method = "raster") #"raster" method means within R

# display a brighter image
display(c2*4, method = "raster")
# because an image is numbers we just multiply to make it brighter

# check details by writing the name of the object
c2 # shows some information
# it's a greyscale image

# make a brighter image by multiplying all the values by 2
c2.b <- c2*2

# gaussian blur
c2.b.blur <- gblur(c2.b, sigma = 5)
display(c2.b.blur, method = "raster")

Can you spot the difference?

## threshold using Otsu'smethod
otsu(c2.b.blur) # gives a threshold value using Otsu algorithm
# value = 0.1777344
c2.b.blur.thres <- c2.b.blur > otsu(c2.b.blur) # apply this value
display(c2.b.blur.thres, method = "raster")

# we see that the image is starkly black and white
# all pixels have been turned into either 0 or 1 - a binary image.

# generate an image with different values for each connected region
# key here is the bwlabel( ) function
c2.b.blur.thres.cnt <- bwlabel(c2.b.blur.thres)

# show this as a coloured blobs
display(colorLabels(c2.b.blur.thres.cnt), method = "raster")

# count by giving us the max value in the bwlabel() function
nucNo <- max(bwlabel(c2.b.blur.thres))
# output this number to the Console
nucNo # count = 41.

# do we need to brighten the image (probably not in this case)?
c2.blur.thres <- gblur(c2, sigma = 5) > otsu(gblur(c2, sigma = 5))
display(c2.blur.thres, method = "raster")
max(bwlabel(c2.blur.thres))
# answer is 42 - so not a big difference - good staining.

# what happens if we don't blur?
# we can calculate the Otsu threshold to the original image
otsu(c2)
c2.thres <- c2 > otsu(c2)
display(colorLabels(bwlabel(c2.thres)), method = "raster")

# doesn't look that different to the eye but...
# if we try counting....
max(bwlabel(c2.thres))
# answer is 1398 - lots of dots causing problems...

# there are other ways to apply thresholds but that's for later
SCRIPT END

Resources:

Bioconductor link about EBImage include Introduction & Reference Manual
Supplementary information from a Nature Methods paper (2013) which using EBImage and R to generate lots of nice data - it's very detailed but interesting.

Thursday 5 May 2016

Overview of gender breakdown across types of staff

I have spent a large part of the last month grappling with Athena SWAN data for the School of Medicine at Cardiff University. To apply for this equality award, the Athena SWAN Self Assessment Team needed to look at the gender breakdown of students and staff in the School.

Here is a bar chart I generated with the data to give an overview of the School:

Here's is the script I used to generate this data and some other graphs that explore using ggplot to make bar charts.

START
# overview of the School in terms of gender for our Athena SWAN application
library(readxl)
library(ggplot2)
library(reshape2)

# the data is here
link <- "https://raw.githubusercontent.com/brennanpincardiff/AthenaSWANBenchmarkData/master/asStaffOverview_comparedtype_20160505.xlsx"

# the download.file() function downloads and saves the file with the name given
download.file(url=link, destfile="file.xlsx", mode="wb")

# read in the data...
data <- read_excel("file.xlsx")

# calculate Female and Male as a percentage
data$tot <- data$f + data$m
data$Female <- (data$f / data$tot) *100
data$Male <- (data$m / data$tot) *100

# first a simple plot of the Cardiff School of Medicine Data
# plot the number of females of each type of member of the School
# pull out the Cardiff data
data.c <- data[data$uni == "Cardiff",2:5]

# make the object with the Cardiff subset
g <- ggplot(data.c,
aes(x = type,
y = f)) +
geom_bar(stat="identity")+
theme(axis.text.x = element_text(size = 12))

# show the graph
g

# The order of the bars is not what I want.
# take the order from the data file
types <- data.c$type

# apply this order to the graph
# http://stackoverflow.com/questions/5208679/order-bars-in-ggplot2-bar-graph
g + scale_x_discrete(limits = types)

# I want to put female and male on the same graph so...
# pull out and melt the Cardiff data so that have f and m for each type in long format
data.melt <- melt(data[data$uni == "Cardiff",2:5],
id.var = c("type", "year"))

# put the data in an object
g <- ggplot(data.melt,
aes(x = type,
y = value,
fill = variable)) +
scale_x_discrete(limits = types)

# show with geom_bar
# this is a stacked bar chart showing both female and male numbers
g + geom_bar(stat="identity")

# show female and male in separate bars using position arguement
g + geom_bar(stat="identity", position="dodge")

# have as stacked bar chart but normalised to one.
g + geom_bar(stat="identity", position="fill")

# I really want a percentage of each stacked
# to allow different parts of the School to be compared
# remove the raw numbers
data.s <- data[, -c(3:6)]
data.melt.s <- melt(data.s[data.s$uni == "Cardiff",2:4],
id.var = c("type")) # id = or id.var are same
colnames(data.melt.s) <- c("type", "Gender", "Percent")

# make a new graph
p <- ggplot(data.melt.s,
aes(x = type,
y = Percent,
fill = Gender)) +
geom_bar(stat="identity") +
scale_x_discrete(limits = types) +
ylab("Percentage of staff") +
xlab("") +
theme_bw() +
theme(axis.text.x = element_text(size = 14))
p # show the graph

# adjust the colours using scale_fill_brewer()
p + scale_fill_brewer(palette = 16, direction=-1) +
# and add a title
ggtitle("Gender breakdown of different members of the School of Medicine") +
theme(axis.title.y = element_text(size = 14 ))

# use data from Imperial & Leeds as comparison
# use the facet_wrap function to make all looks the same.
# remove Prof & Support as we only have for Cardiff
data.s <- data.s[-5,]
types <- types[-5]

# melt the data into the format for the bar chart.
data.melt.s <- melt(data.s,
id.var = c("uni", "type"))
colnames(data.melt.s) <- c("uni", "type", "Gender", "Percent")

# this is a final plot with three Medical Schools (it's a bit big).
p <- ggplot(data.melt.s,
aes(x = type,
y = Percent,
group = uni,
fill = Gender)) +
geom_bar(stat="identity") +
facet_wrap(~uni) + # this function gives three plots
scale_x_discrete(limits = types) +
ylab("Percentage of staff") +
xlab("") +
theme_bw() +
scale_fill_brewer(palette = 16, direction=-1) +
ggtitle("Gender breakdown of three Schools of Medicine") +
theme(strip.text.x = element_text(size = 14, colour = "black")) +
theme(axis.text.x = element_text(size = 11))

p # show the plot

END of SCRIPT

I looked up these pages for help:

A protein assay	An enzyme kinetic plot	Visual Index	Study at Cardiff

R for Biochemists

A site to help Biochemists learn R.

How to use this site...

Starting points