Monday, June 25, 2012

Adapting the R AgiMicroRna Package to work with COMPACT AFE output files

This isn't a Dollhouse recap, but it's something I spent some time on this week and this seems like my best place to publish it.

I''m going through some old miRNA array data my lab generated before I got here, but never really analyzed. I found the AgiMicroRna package, a great open-source tool for analyzing Agilent miRNA array data in R.

But when I started using it for my project, following the guidelines in the BMC Genomics paper, it started throwing some pretty confusing errors. After digging through some old news groups and the package source code, I was able to get it working again, and I thought I'd share all the steps in one place so that anyone else who was having the same problems could solve it more quickly than I did.

The problems all stemmed from a single issue: When Agilent Feature Extraction interprets the scanned arrays, it has two possible output formats: "full" and "compact". Agilent's GeneSpring Analysis software only needs the compact version, so that's the default. The AgiMicroRna package is written to work with the full versions.

Your first hint that you're having this problem will be that when you try to read your data files with readMicroRnaAFE(), you'll see the error
Error in `[.data.frame`(obj, , columns[[a]]) : undefined columns selected
Here's the easiest way to fix it: Pull out your raw data files, go back to Feature Extraction, tell it to use the full  output style, and get on with your life. If that's not an option, though-- say, you're pulling the data from GEO-- you can still salvage it.

The root of the problem is that the AgiMicroRna package expects to find columns in the data file that aren't there. The fixes I outline here involve changing the file-reading functions to skip over parts that aren't present.

So, the first problem is with the readMicroRnaAFE() function. If you open up readMicroRnaAFE(), you'll see that this is a wrapper for a more general internal function called read.AgiMicroRna(), and the solution is to call that internal function directly with some more specific inputs.

To bring the internal function out where we can work with it, download the source code for the package, and find the read.AgiMicroRna.R file, and source() it.

Now instead of calling readMicroRnaAFE() on your target files, call read.AgiMicroRna() with all the same settings as are used inside readMicroRnaAFE(), but where the original says:
columns=list(TGS="gTotalGeneSignal",                            TPS="gTotalProbeSignal",                             meanS="gMeanSignal",                         procS="gProcessedSignal")

Change it to remove the reference to the non-existent gMeanSignal column, so:
columns=list(TGS="gTotalGeneSignal",                              TPS="gTotalProbeSignal",                                                         procS="gProcessedSignal")

Similarly, the column with the name of the miRNA associated with each probe is names "SystematicName", not "GeneName" as the package assumes. So, when calling read.AgiMicroRna(), make the annotation variable
annotation = c( "ControlType", "ProbeName","SystematicName")
Instead of the default
annotation = c( "ControlType", "ProbeName","GeneName") 
Now the file reading function should work as normal.

There are consequences for the normalization process, though. First, the RMA normalization method is just not possible without the mean signal data, so you have to use the Total Gene Signal (TGS) method, based on the functions tgsMicroRna() and tgsNormalization(). Second, the tgsMicroRna() function expects the annotation column of the input object to be called GeneName, when it's actually called SystematicName. So, make an R script file with the text of the tgsMicroRNA() function in it, and go through and swap "SystematicName" for "GeneName" throughout. Save the fixed version and source() it.

From there, you should be able to follow the package instructions to make an expression set and do your differential analysis. 

3 comments:

  1. Dear Russell,

    Thanks a lot for your very useful know how. I have implemented your method and successfully completed the analysis.

    Thanks once again.

    Regards,
    Avisek Deyati

    ReplyDelete
  2. Thanks Russell. It helped me out, although I'd like to mentiona that one needs to modify directly the read.agiMicroRna so it will actually read the "SystematicName", otherwise, it exclude this column even if one specifies it when calling read.agiMicroRna().

    Regards,

    Pablo R.

    ReplyDelete
    Replies
    1. read.agiMicroRNA is not able to read genename as well as systematic name. It is giving only control type and agilent probenames. How did you rectify this? How did you get the genenames? Please let me know

      Delete