Thursday 18 June 2015

Drawing a proteomic data volcano plot....

I really like this data produced by this study from Liverpool (Eagle et al (2015) Mol Cell Proteomics, 14, 933-945). It a proteomic study of two types of leukaemic cell. I have used it already to compare their protein list to some of our data. Today, I have used it to draw a volcano plot which shows the change in protein expression and the significance of the change (p value). These graphs are popular in genomic and proteomic studies. 

Here is the graph, drawn with ggplot:


Updated 22nd July 2021: The data should be available from the Mol Cell Proteomics but it's not there any more. The file is available on Github and this script links and downloads directly.

START
library(ggplot2)

link <- ("https://raw.githubusercontent.com/brennanpincardiff/RforBiochemists/master/data/mcp.M114.044479.csv")
data<-read.csv(link, header=TRUE)

##Identify the genes that have a p-value < 0.05
data$threshold = as.factor(data$P.Value < 0.05)

##Construct the plot object
g <- ggplot(data=data, 
            aes(x=Log2.Fold.Change, y =-log10(P.Value), 
            colour=threshold)) +
  geom_point(alpha=0.4, size=1.75) +
  xlim(c(-6, 6)) +
  xlab("log2 fold change") + ylab("-log10 p-value") +
  theme_bw() +
  theme(legend.position="none")

g
# The script gives a warning message: Removed 1 rows containing missing values (geom_point).

# but it still works....

4 comments:

  1. Hello, I am trying to test your code in proteomic data in which I have a first column named Protein Accession.
    I would really thank you If you could tell me how to label the dots in accordance to Protein Accession (even if possible with No overlaping, but that is secondary).

    Thank you very much in advice,

    By the way, the code works perfectly!

    Julia Bauzá

    ReplyDelete
    Replies
    1. Sorry Julia,
      I missed your comment in my email overload in November. Do you still want help with this?
      Best wishes,
      Paul

      Delete

Comments and suggestions are welcome.