EP Manuscript “Highly Accessed”

The final PDF of our manuscript is now available on the Genome Biology website.

Header of GenomeBiology Manuscript

It’s been tagged as “highly accessed” article! From the Genome Biology website:

The ‘Highly accessed’ graphic appears on journal table of contents and search results to identify those articles that have been especially highly accessed, relative to their age, and the journal in which they were published.

X. laevis

One of our users asked for the files necessary to run ExpressionPlot on Affymetrix Xenopus laevis arrays:

Xenopus laevis

I was surprised to learn that there is no X. laevis genome yet. I took Affymetrix’s netaffx data file and probe tables and created some ExpressionPlot-compatible files by pretending each consensus sequence (basically Affymetrix’s idea of a transcript) was a chromosome. You can get the files using util/EP-manage.pl:

util/EP-manage.pl get_array X_laevis_2
util/EP-manage.pl get_annot Xl2

The name “Xl2″ for the annotation might be a little confusing since most of the other annotations are named after genome assemblies—here it really refers to being based on affymetrix’s X_laevis_2 consensus sequences.

If anyone needs the files for other arrays please don’t be shy to put in the request—someone else may also find it useful! Also, if you have any problems with these files please be sure to contact the ExpressionPlot discussion group so we can help you or fix the problems.

Prototype server returns

The prototype server is back up and running. Point your web browsers to als-research.expressionplot.com. Please let me know if you have any problems connecting and I will do my best to solve them expeditiously.

Mission Bay, San Francisco: the new (temporary) home of the ExpressionPlot prototype server
Mission Bay, San Francisco: the new (temporary) home of the ExpressionPlot prototype server

Forthcoming in Genome Biology!!!

Our manuscript has been accepted by Genome Biology. Thanks to everyone who has contributed to the process.

prototype down

Today I shut down the prototype server, als-research.mit.edu. It is in the caring hands of FedEx Ground until next week when I set it up in California. Please use the URL als-research.expressionplot.com from now on.

Canonical Distribution Tool

Just finished creating a new tool for quality control. It is available starting in EP version 1.6. I am calling it the “canonical distribution” tool, because it analyzes the positional distribution of reads near genomic landmarks (such as splice sites, start/stop codons etc) of the UCSC “canonical” transcripts. Here’s an example:

Example Canonical Distribution

Using this tool requires that you run the RNA-Seq pipeline of version 1.6 or later. If you have data run through an older version of the pipeline, just upgrade your EP installation and re-run the pipeline (you should always save the invocation of the pipeline for just such an occasion). Of course it won’t redo the whole pipeline, just create the files missing, which are those necessary to generate the canonical distribution plots.

To get to the tool on the website, look under the “read_type” page. If you choose “positional” under the “readclass” dropdown menu then it will generate a canonical distribution plot. This option will be disabled on projects that don’t have the right files yet.

For more information, see the read types page of the User’s Guide.

Manuscript Revision 2

Finished revision 2 of the paper last week. The associated version of ExpressionPlot is 1.3. There are several improvements over the previous revision.

  • Limma is now used to calculate differential expression statistics for microarrays.
  • User-defined event types are now available through the event_types.tsv interface (described in User’s Guide)
  • Annotation pipeline is now part of ExpressionPlot
  • A few other bug fixes, mostly due to Sean O’Keeffe and Mike Muratet

Here are the links for the submitted manuscript:

Paper Revision

This week I submitted the revision to my manuscript. Without sounding too cheesy I really think that the software improved as a result of the review process. For example, it is now possible to run DESeq as an option for calculating gene expression statistics. This package makes it possible to do population-based statistics.

Revised manuscript:

There was one thing which I really couldn’t address, and that was a method for population-based statistics for alternative splicing. This is on my to-do list. Right now I am working on getting MISO running and plugging it in as an alternative back-end for the splicing analysis. As I understand it is really a technical P-value that it calculates (well, a technical Bayes factor). It might not be too hard to use the a posteriori PSI distributions that it calculates to generate a population based P-value or Bayes factor. But first I have to make the adapter script to serve the output through ExpressionPlot.

Columbia Medical School and Annotation Factory

Just got back from a really fun trip to New York City. I visited my colleagues in the Maniatis Lab at Columbia medical school. I got a chance to work with Sean O’Keeffe who is running ExpressionPlot there. He has set up gbrowse for some of the data over there. It it a fantastic user interface. I wonder how hard it would be to install it along with ExpressionPlot and have an option to link seqview to it instead.

Sean also encouraged me to automate the part of EP that generates annotation files. I hadn’t bothered when I made the original version, seeing as how it was already quite complicated, but I realize now that people will want to be able to make more annotations without consulting me. So that is now in the works. I’ve got a few steps mapped out and described in the User’s Guide, and once it is done I will release version 0.7, which will include the bunch of scripts necessary to make a new annotation.

If you make a new annotation, be sure to post what you did to the expressionplot google group so we can upload your files to the repository and make them available to others.

iDEA Challenge 2011: Illumina’s Data Excellence Award

Illumina is holding a data visualization contest to promote the development of new ideas for visualizing high throughput sequencing data. I think ExpressionPlot fits the bill well since it makes it easy for all biologists to create the types of plots necessary for interpreting their RNA-Seq data and comparing it with other data sets. Read the rest of this entry »

A web-based framework for analysis of RNA-Seq and microarray gene expression data