Wednesday, March 10, 2010

Use local instead of my in Perl when using $$ (double dollar sign) inside an enclosed block

I thought I would be all clever and initialize several hash indices at once using the $$ notation in Perl - evaluating strings as variables. Unfortunately using "my" screws this up bigtime.

This must be another circumstance spelled out in the incomprehensible "my" treatise:
http://perldoc.perl.org/perlsub.html#Private-Variables-via-my()


my %foo;
$foo{'bar'}=1;
foreach $fooDex(qw(foo)){
$$fooDex{'bar'}=2;
}
print $foo{'bar'}."\n"; #prints 1




local %foo;
$foo{'bar'}=1;
foreach $fooDex(qw(foo)){
$$fooDex{'bar'}=2;
}
print $foo{'bar'}."\n"; #prints 2


This is one of those solutions that gives you a feeling of dread instead of accomplishment.

Tuesday, March 9, 2010

Getting the basics from readAligned

The UCR guide is a little sparse with regard to getting basic information from readAligned.

I'd like to add to the general cookbook. If some bioc people out there can contribute some alignment recipes can fill me in on some more basics please comment:


alignedReads <- readAligned("./", pattern="output.bowtie", type="Bowtie")

#how many reads did I attempt to align
#i don't think you can't get this from alignedReads

#how many reads aligned (one or more times)
length(unique(id(alignedReads)))

#how many hits were there?
length(alignedReads)

#how many reads produced multiple hits
length(unique(id(alignedReads[srduplicated(id(alignedReads))])))

#how many reads produced multiple hits at the best strata?
#please fill me in on this one

#how many reads aligned uniquely (with exactly one hit)
length(unique(id(alignedReads)))-length(unique(id(alignedReads[srduplicated(id(alignedReads))])))

#how many reads aligned uniquely at the best strata (the other hits were not as good)
#please fill me in on this one

#how many unique positions were hit? what if I ignore strand?
#please fill me in on this one

#how many converging hits were there (two query sequences aligned to the same genomic position)
#please fill me in on this one

Wednesday, March 3, 2010

Quality trimming in R using ShortRead and Biostrings

I wrote an R function to do soft-trimming, right clipping FastQ reads based on quality.

This function has the option of leaving out sequences trimmed to extinction and will do left-side fixed trimming as well.
#softTrim
#trim first position lower than minQuality and all subsequent positions
#omit sequences that after trimming are shorter than minLength
#left trim to firstBase, (1 implies no left trim)
#input: ShortReadQ reads
#       integer minQuality
#       integer firstBase
#       integer minLength
#output: ShortReadQ trimmed reads
library("ShortRead")
softTrim<-function(reads,minQuality,firstBase=1,minLength=5){
qualMat<-as(FastqQuality(quality(quality(reads))),'matrix')
qualList<-split(qualMat,row(qualMat))
ends<-as.integer(lapply(qualList,
function(x){which(x < minQuality)[1]-1}))
#length=end-start+1, so set start to no more than length+1 to avoid negative-length
starts<-as.integer(lapply(ends,function(x){min(x+1,firstBase)}))
#use whatever QualityScore subclass is sent
newQ<-ShortReadQ(sread=subseq(sread(reads),start=starts,end=ends),
quality=new(Class=class(quality(reads)),
quality=subseq(quality(quality(reads)),
start=starts,end=ends)),
id=id(reads))

#apply minLength using srFilter
lengthCutoff <- srFilter(function(x) {
width(x)>=minLength
},name="length cutoff")
newQ[lengthCutoff(newQ)]
}  


To use:
library("ShortRead")
source("softTrimFunction.R") #or whatever you want to name this
reads<-readFastq("myreads.fq") trimmedReads<-softTrim(reads=reads,minQuality=5,firstBase=4,minLength=3) writeFastq(trimmedReads,file="trimmed.fq") 
I strongly recommend reading the excellent UC Riverside HT-Sequencing Wiki cookbook and tutorial if you wish to venture into using R for NGS handling. Among other things, it will explain how to perform casting if you have Solexa scaled (base 64) fastq files. The function should respect that. http://manuals.bioinformatics.ucr.edu/home/ht-seq