Monday, October 31, 2011

Making R's paste act more like CONCAT

While vector-friendly, R's paste function has a few behaviors I don't particularly like.

One is using a space as the default separator:

> adjectives<-c("lean","fast","strong")
> paste(adjectives,"er")
> paste(adjectives,"er")
[1] "lean er"   "fast er"   "strong er"  #d'oh
> paste(adjectives,"er",sep="")
[1] "leaner"   "faster"   "stronger"

Empty vectors get an undeserved first class treatment:

> paste(indelPositions,"i",sep="")
[1] "i"
> indelPositions<-c(5,6,7)
> paste(indelPositions,"i",sep="")
[1] "5i" "6i" "7i" #good

> indelPositions<-c()
> paste(indelPositions,"i",sep="")
[1] "i"  #not so good

And perhaps worst of all, NA values get replaced with a string called "NA":

> placing<-"1"
> paste(placing,"st",sep="")
[1] "1st" #awesome

> placing<-NA_integer_
> paste(placing,"st",sep="")
[1] "NAst" #ugh

This is inconvenient in situations where I don't know a priori if I will get a value, a vector of length 0, or an NA.

Working from Hadley Wickham's str_c function in the stringr package, I decided to write a paste function that behaves more like CONCAT in SQL:

library(stringr)
concat<-CONCAT<-function(...,sep="",collapse=NULL){
  strings<-list(...)
  #catch NULLs, NAs
  if(
    all(unlist(llply(strings,length))>0)
    &&
    all(!is.na(unlist(strings)))
    ){
    do.call("paste", c(strings, list(sep = sep, collapse = collapse)))
  }else{
    NULL
  }
}

This function has the behaviors I expect:

> concat(adjectives,"er")
[1] "leaner"   "faster"   "stronger"

> concat(indelPositions,"i")
NULL

> concat(placing,"st")
NULL

That's more like it!

1 comment:

  1. This comment has been removed by the author.

    ReplyDelete