Monday, October 31, 2011

Making R's paste act more like CONCAT

While vector-friendly, R's paste function has a few behaviors I don't particularly like.

One is using a space as the default separator:

> adjectives<-c("lean","fast","strong")
> paste(adjectives,"er")
> paste(adjectives,"er")
[1] "lean er"   "fast er"   "strong er"  #d'oh
> paste(adjectives,"er",sep="")
[1] "leaner"   "faster"   "stronger"

Empty vectors get an undeserved first class treatment:

> paste(indelPositions,"i",sep="")
[1] "i"
> indelPositions<-c(5,6,7)
> paste(indelPositions,"i",sep="")
[1] "5i" "6i" "7i" #good

> indelPositions<-c()
> paste(indelPositions,"i",sep="")
[1] "i"  #not so good

And perhaps worst of all, NA values get replaced with a string called "NA":

> placing<-"1"
> paste(placing,"st",sep="")
[1] "1st" #awesome

> placing<-NA_integer_
> paste(placing,"st",sep="")
[1] "NAst" #ugh

This is inconvenient in situations where I don't know a priori if I will get a value, a vector of length 0, or an NA.

Working from Hadley Wickham's str_c function in the stringr package, I decided to write a paste function that behaves more like CONCAT in SQL:

library(stringr)
concat<-CONCAT<-function(...,sep="",collapse=NULL){
  strings<-list(...)
  #catch NULLs, NAs
  if(
    all(unlist(llply(strings,length))>0)
    &&
    all(!is.na(unlist(strings)))
    ){
    do.call("paste", c(strings, list(sep = sep, collapse = collapse)))
  }else{
    NULL
  }
}

This function has the behaviors I expect:

> concat(adjectives,"er")
[1] "leaner"   "faster"   "stronger"

> concat(indelPositions,"i")
NULL

> concat(placing,"st")
NULL

That's more like it!

Thursday, October 6, 2011

SELinux for enhanced headaches


Security Enhanced Linux (SELinux) is a new extra hidden layer of permissions that makes configuring things more difficult, without ever identifying itself as the culprit - kind of like ACLs but more cryptic. Though it may be more secure, it is not an enhancing experience to deal with, and probably not worth it for the average user.

For example to have Apache serve personal websites (i.e. http://server/~leipzig) it is no longer enough to alter httpd.conf, because you will get mysterious 403 errors until you do this (as others have experienced):
chcon -R -t httpd_sys_content_t /home/leipzig

You forget about this change until xauth starts complaining about stuff for no apparent reason:
/usr/bin/xauth:  timeout in locking authority file /home/leipzig/.Xauthority

so of course you need to do this (thanks Madhav Diwan for this post):
chcon unconfined_u:object_r:user_home_dir_t:s0 /home/leipzig

I have no idea what these things actually mean, nor any real interest in learning. I'm sure this stuff is great for sysadmin cocktail chat but at least for private servers it is just another the brake on the wheel of getting things done. For the time being I have set the level to "permissive", which means it displays warnings but does not interfere, but am leaning toward "disabled" or maybe something else:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=excoriated
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

More on the pros and cons:
http://unix.stackexchange.com/questions/9163/does-selinux-provide-enough-extra-security-to-be-worth-the-hassle-of-learning-set

Thursday, August 18, 2011

Installing RStudio Server on Scientific Linux 6: My bash notebook

Granted, not a brilliant sysadmin mind at work here, but this might help someone someday.
Scientific Linux (SL) is built from Red Hat Enterprise Linux

See installation instructions here:
http://rstudio.org/download/server
[leipzig@localhost ~]$ sudo rpm -Uvh
http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-5.noarch.rpm
[sudo] password for leipzig: 
Retrieving
http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-5.noarch.rpm
warning: /var/tmp/rpm-tmp.S2RQAH: Header V3 RSA/SHA256 Signature, key ID
0608b895: NOKEY
Preparing...                ########################################### [100%]
   1:epel-release           ########################################### [100%]
[leipzig@localhost ~]$ rpm -qa | grep epel
epel-release-6-5.noarch
[leipzig@localhost ~]$ which R
/usr/local/bin/R
[leipzig@localhost ~]$ R

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

&gt; q()
Save workspace image? [y/n/c]: n
[leipzig@localhost ~]$ wget
https://s3.amazonaws.com/rstudio-server/rstudio-server-0.94.92-x86_64.rpm
--2011-08-17 13:06:36-- 
https://s3.amazonaws.com/rstudio-server/rstudio-server-0.94.92-x86_64.rpm
Resolving s3.amazonaws.com... 72.21.211.170
Connecting to s3.amazonaws.com|72.21.211.170|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11373769 (11M) [application/x-redhat-package-manager]
Saving to: “rstudio-server-0.94.92-x86_64.rpm”

100%[===========================================================================
=============================================&gt;] 11,373,769  7.89M/s   in 1.4s   


2011-08-17 13:06:37 (7.89 MB/s) - “rstudio-server-0.94.92-x86_64.rpm” saved
[11373769/11373769]

[leipzig@localhost ~]$ sudo rpm -Uvh rstudio-server-0.94.92-x86_64.rpm
error: Failed dependencies:
	libR.so()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libRblas.so()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libRlapack.so()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libcrypto.so.6()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libgfortran.so.1()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libssl.so.6()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
[leipzig@localhost ~]$ sudo yum install R  
epel/metalink                                                                   
                                                          |  14 kB     00:00    

epel                                                                            
                                                          | 4.3 kB     00:00    

epel/primary_db                                                                 
                                                          | 4.0 MB     00:00    

sl                                                                              
                                                          | 3.2 kB     00:00    

sl-security                                                                     
                                                          | 1.9 kB     00:00    

Setting up Install Process
Resolving Dependencies
--&gt; Running transaction check
---&gt; Package R.x86_64 0:2.13.1-1.el6 set to be updated
--&gt; Processing Dependency: libRmath-devel = 2.13.1-1.el6 for package:
R-2.13.1-1.el6.x86_64
--&gt; Processing Dependency: R-devel = 2.13.1-1.el6 for package:
R-2.13.1-1.el6.x86_64
--&gt; Running transaction check
---&gt; Package R-devel.x86_64 0:2.13.1-1.el6 set to be updated
--&gt; Processing Dependency: R-core = 2.13.1-1.el6 for package:
R-devel-2.13.1-1.el6.x86_64
--&gt; Processing Dependency: bzip2-devel for package: R-devel-2.13.1-1.el6.x86_64
--&gt; Processing Dependency: gcc-gfortran for package: R-devel-2.13.1-1.el6.x86_64
--&gt; Processing Dependency: tk-devel for package: R-devel-2.13.1-1.el6.x86_64
--&gt; Processing Dependency: pcre-devel for package: R-devel-2.13.1-1.el6.x86_64
--&gt; Processing Dependency: tcl-devel for package: R-devel-2.13.1-1.el6.x86_64
---&gt; Package libRmath-devel.x86_64 0:2.13.1-1.el6 set to be updated
--&gt; Processing Dependency: libRmath = 2.13.1-1.el6 for package:
libRmath-devel-2.13.1-1.el6.x86_64
--&gt; Running transaction check
---&gt; Package R-core.x86_64 0:2.13.1-1.el6 set to be updated
--&gt; Processing Dependency: cups for package: R-core-2.13.1-1.el6.x86_64
--&gt; Processing Dependency: libtk8.5.so()(64bit) for package:
R-core-2.13.1-1.el6.x86_64
---&gt; Package bzip2-devel.x86_64 0:1.0.5-7.el6_0 set to be updated
---&gt; Package gcc-gfortran.x86_64 0:4.4.4-13.el6 set to be updated
---&gt; Package libRmath.x86_64 0:2.13.1-1.el6 set to be updated
---&gt; Package pcre-devel.x86_64 0:7.8-3.1.el6 set to be updated
---&gt; Package tcl-devel.x86_64 1:8.5.7-6.el6 set to be updated
---&gt; Package tk-devel.x86_64 1:8.5.7-5.el6 set to be updated
--&gt; Running transaction check
---&gt; Package cups.x86_64 1:1.4.2-35.el6_0.1 set to be updated
--&gt; Processing Dependency: portreserve for package:
1:cups-1.4.2-35.el6_0.1.x86_64
--&gt; Processing Dependency: poppler-utils for package:
1:cups-1.4.2-35.el6_0.1.x86_64
---&gt; Package tk.x86_64 1:8.5.7-5.el6 set to be updated
--&gt; Running transaction check
---&gt; Package poppler-utils.x86_64 0:0.12.4-3.el6_0.1 set to be updated
---&gt; Package portreserve.x86_64 0:0.0.4-4.el6 set to be updated
--&gt; Finished Dependency Resolution

Dependencies Resolved

================================================================================
================================================================================
==
 Package                                 Arch                            Version
                                      Repository                            Size
================================================================================
================================================================================
==
Installing:
 R                                       x86_64                         
2.13.1-1.el6                                  epel                              
   17 k
Installing for dependencies:
 R-core                                  x86_64                         
2.13.1-1.el6                                  epel                              
   33 M
 R-devel                                 x86_64                         
2.13.1-1.el6                                  epel                              
   88 k
 bzip2-devel                             x86_64                         
1.0.5-7.el6_0                                 sl-security                       
  249 k
 cups                                    x86_64                         
1:1.4.2-35.el6_0.1                            sl-security                       
  2.3 M
 gcc-gfortran                            x86_64                         
4.4.4-13.el6                                  sl                                
  4.7 M
 libRmath                                x86_64                         
2.13.1-1.el6                                  epel                              
  111 k
 libRmath-devel                          x86_64                         
2.13.1-1.el6                                  epel                              
   21 k
 pcre-devel                              x86_64                         
7.8-3.1.el6                                   sl                                
  317 k
 poppler-utils                           x86_64                         
0.12.4-3.el6_0.1                              sl-security                       
   72 k
 portreserve                             x86_64                         
0.0.4-4.el6                                   sl                                
   21 k
 tcl-devel                               x86_64                         
1:8.5.7-6.el6                                 sl                                
  161 k
 tk                                      x86_64                         
1:8.5.7-5.el6                                 sl                                
  1.4 M
 tk-devel                                x86_64                         
1:8.5.7-5.el6                                 sl                                
  495 k

Transaction Summary
================================================================================
================================================================================
==
Install      14 Package(s)
Upgrade       0 Package(s)

Total download size: 43 M
Installed size: 89 M
Is this ok [y/N]: y
Downloading Packages:
(1/14): R-2.13.1-1.el6.x86_64.rpm                                               
                                                          |  17 kB     00:00    

(2/14): R-core-2.13.1-1.el6.x86_64.rpm                                          
                                                          |  33 MB     00:05    

(3/14): R-devel-2.13.1-1.el6.x86_64.rpm                                         
                                                          |  88 kB     00:00    

(4/14): bzip2-devel-1.0.5-7.el6_0.x86_64.rpm                                    
                                                          | 249 kB     00:00    

(5/14): cups-1.4.2-35.el6_0.1.x86_64.rpm                                        
                                                          | 2.3 MB     00:01    

(6/14): gcc-gfortran-4.4.4-13.el6.x86_64.rpm                                    
                                                          | 4.7 MB     00:02    

(7/14): libRmath-2.13.1-1.el6.x86_64.rpm                                        
                                                          | 111 kB     00:00    

(8/14): libRmath-devel-2.13.1-1.el6.x86_64.rpm                                  
                                                          |  21 kB     00:00    

(9/14): pcre-devel-7.8-3.1.el6.x86_64.rpm                                       
                                                          | 317 kB     00:00    

(10/14): poppler-utils-0.12.4-3.el6_0.1.x86_64.rpm                              
                                                          |  72 kB     00:00    

(11/14): portreserve-0.0.4-4.el6.x86_64.rpm                                     
                                                          |  21 kB     00:00    

(12/14): tcl-devel-8.5.7-6.el6.x86_64.rpm                                       
                                                          | 161 kB     00:00    

(13/14): tk-8.5.7-5.el6.x86_64.rpm                                              
                                                          | 1.4 MB     00:00    

(14/14): tk-devel-8.5.7-5.el6.x86_64.rpm                                        
                                                          | 495 kB     00:00    

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--
Total                                                                           
                                                 3.1 MB/s |  43 MB     00:13    

warning: rpmts_HdrFromFdno: Header V3 RSA/SHA256 Signature, key ID 0608b895:
NOKEY
epel/gpgkey                                                                     
                                                          | 3.2 kB     00:00 ...

Importing GPG key 0x0608B895 "EPEL (6) epel@fedoraproject.org" from
/etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
Is this ok [y/N]: y
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Warning: RPMDB altered outside of yum.
  Installing     : 1:tk-8.5.7-5.el6.x86_64                                      
                                                                            1/14

  Installing     : portreserve-0.0.4-4.el6.x86_64                               
                                                                            2/14

  Installing     : poppler-utils-0.12.4-3.el6_0.1.x86_64                        
                                                                            3/14

  Installing     : 1:cups-1.4.2-35.el6_0.1.x86_64                               
                                                                            4/14

  Installing     : R-core-2.13.1-1.el6.x86_64                                   
                                                                            5/14

  Installing     : gcc-gfortran-4.4.4-13.el6.x86_64                             
                                                                            6/14

  Installing     : libRmath-2.13.1-1.el6.x86_64                                 
                                                                            7/14

  Installing     : 1:tcl-devel-8.5.7-6.el6.x86_64                               
                                                                            8/14

  Installing     : 1:tk-devel-8.5.7-5.el6.x86_64                                
                                                                            9/14

  Installing     : libRmath-devel-2.13.1-1.el6.x86_64                           
                                                                           10/14

  Installing     : bzip2-devel-1.0.5-7.el6_0.x86_64                             
                                                                           11/14

  Installing     : pcre-devel-7.8-3.1.el6.x86_64                                
                                                                           12/14

  Installing     : R-devel-2.13.1-1.el6.x86_64                                  
                                                                           13/14

  Installing     : R-2.13.1-1.el6.x86_64                                        
                                                                           14/14


Installed:
  R.x86_64 0:2.13.1-1.el6                                                       
                                                                                


Dependency Installed:
  R-core.x86_64 0:2.13.1-1.el6                R-devel.x86_64 0:2.13.1-1.el6     
  bzip2-devel.x86_64 0:1.0.5-7.el6_0       cups.x86_64 1:1.4.2-35.el6_0.1     
  gcc-gfortran.x86_64 0:4.4.4-13.el6          libRmath.x86_64 0:2.13.1-1.el6    
  libRmath-devel.x86_64 0:2.13.1-1.el6     pcre-devel.x86_64 0:7.8-3.1.el6    
  poppler-utils.x86_64 0:0.12.4-3.el6_0.1     portreserve.x86_64 0:0.0.4-4.el6  
  tcl-devel.x86_64 1:8.5.7-6.el6           tk.x86_64 1:8.5.7-5.el6            
  tk-devel.x86_64 1:8.5.7-5.el6              

Complete!
[leipzig@localhost ~]$ sudo rpm -Uvh rstudio-server-0.94.92-x86_64.rpm
error: Failed dependencies:
	libcrypto.so.6()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libgfortran.so.1()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libssl.so.6()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
[leipzig@localhost ~]$ sudo yum install libcrypto.so.6
Setting up Install Process
Resolving Dependencies
--&gt; Running transaction check
---&gt; Package openssl098e.i686 0:0.9.8e-17.el6 set to be updated
--&gt; Processing Dependency: libc.so.6(GLIBC_2.3.4) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libkrb5.so.3(krb5_3_MIT) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libc.so.6(GLIBC_2.1) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libcom_err.so.2 for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libc.so.6(GLIBC_2.0) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libk5crypto.so.3 for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libk5crypto.so.3(k5crypto_3_MIT) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libdl.so.2(GLIBC_2.0) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libc.so.6(GLIBC_2.7) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libkrb5.so.3 for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libc.so.6(GLIBC_2.4) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libgssapi_krb5.so.2 for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libdl.so.2(GLIBC_2.1) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libc.so.6(GLIBC_2.1.3) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libresolv.so.2 for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libz.so.1 for package: openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libc.so.6 for package: openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libdl.so.2 for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Processing Dependency: libc.so.6(GLIBC_2.3) for package:
openssl098e-0.9.8e-17.el6.i686
--&gt; Running transaction check
---&gt; Package glibc.i686 0:2.12-1.7.el6_0.5 set to be updated
--&gt; Processing Dependency: libfreebl3.so for package:
glibc-2.12-1.7.el6_0.5.i686
--&gt; Processing Dependency: libfreebl3.so(NSSRAWHASH_3.12.3) for package:
glibc-2.12-1.7.el6_0.5.i686
---&gt; Package krb5-libs.i686 0:1.9-9.el6_1.1 set to be updated
--&gt; Processing Dependency: libkeyutils.so.1(KEYUTILS_0.3) for package:
krb5-libs-1.9-9.el6_1.1.i686
--&gt; Processing Dependency: libkeyutils.so.1 for package:
krb5-libs-1.9-9.el6_1.1.i686
--&gt; Processing Dependency: libselinux.so.1 for package:
krb5-libs-1.9-9.el6_1.1.i686
---&gt; Package libcom_err.i686 0:1.41.12-3.el6 set to be updated
---&gt; Package zlib.i686 0:1.2.3-25.el6 set to be updated
--&gt; Running transaction check
---&gt; Package keyutils-libs.i686 0:1.4-1.el6 set to be updated
---&gt; Package libselinux.i686 0:2.0.94-2.el6 set to be updated
---&gt; Package nss-softokn-freebl.i686 0:3.12.8-1.el6_0 set to be updated
--&gt; Finished Dependency Resolution

Dependencies Resolved

================================================================================
================================================================================
==
 Package                                     Arch                         
Version                                     Repository                          
 Size
================================================================================
================================================================================
==
Installing:
 openssl098e                                 i686                         
0.9.8e-17.el6                               sl                                  
772 k
Installing for dependencies:
 glibc                                       i686                         
2.12-1.7.el6_0.5                            sl-security                         
4.3 M
 keyutils-libs                               i686                         
1.4-1.el6                                   sl                                  
 19 k
 krb5-libs                                   i686                         
1.9-9.el6_1.1                               sl-security                         
711 k
 libcom_err                                  i686                         
1.41.12-3.el6                               sl                                  
 33 k
 libselinux                                  i686                         
2.0.94-2.el6                                sl                                  
106 k
 nss-softokn-freebl                          i686                         
3.12.8-1.el6_0                              sl-security                         
108 k
 zlib                                        i686                         
1.2.3-25.el6                                sl                                  
 71 k

Transaction Summary
================================================================================
================================================================================
==
Install       8 Package(s)
Upgrade       0 Package(s)

Total download size: 6.0 M
Installed size: 18 M
Is this ok [y/N]: y
Downloading Packages:
(1/8): glibc-2.12-1.7.el6_0.5.i686.rpm                                          
                                                          | 4.3 MB     00:02    

(2/8): keyutils-libs-1.4-1.el6.i686.rpm                                         
                                                          |  19 kB     00:00    

(3/8): krb5-libs-1.9-9.el6_1.1.i686.rpm                                         
                                                          | 711 kB     00:00    

(4/8): libcom_err-1.41.12-3.el6.i686.rpm                                        
                                                          |  33 kB     00:00    

(5/8): libselinux-2.0.94-2.el6.i686.rpm                                         
                                                          | 106 kB     00:00    

(6/8): nss-softokn-freebl-3.12.8-1.el6_0.i686.rpm                               
                                                          | 108 kB     00:00    

(7/8): openssl098e-0.9.8e-17.el6.i686.rpm                                       
                                                          | 772 kB     00:00    

(8/8): zlib-1.2.3-25.el6.i686.rpm                                               
                                                          |  71 kB     00:00    

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--
Total                                                                           
                                                 1.2 MB/s | 6.0 MB     00:04    

Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : nss-softokn-freebl-3.12.8-1.el6_0.i686                       
                                                                             1/8

  Installing     : glibc-2.12-1.7.el6_0.5.i686                                  
                                                                             2/8

  Installing     : libcom_err-1.41.12-3.el6.i686                                
                                                                             3/8

  Installing     : zlib-1.2.3-25.el6.i686                                       
                                                                             4/8

  Installing     : libselinux-2.0.94-2.el6.i686                                 
                                                                             5/8

  Installing     : keyutils-libs-1.4-1.el6.i686                                 
                                                                             6/8

  Installing     : krb5-libs-1.9-9.el6_1.1.i686                                 
                                                                             7/8

  Installing     : openssl098e-0.9.8e-17.el6.i686                               
                                                                             8/8


Installed:
  openssl098e.i686 0:0.9.8e-17.el6                                              
                                                                                


Dependency Installed:
  glibc.i686 0:2.12-1.7.el6_0.5        keyutils-libs.i686 0:1.4-1.el6           
     krb5-libs.i686 0:1.9-9.el6_1.1       libcom_err.i686 0:1.41.12-3.el6      
  libselinux.i686 0:2.0.94-2.el6       nss-softokn-freebl.i686 0:3.12.8-1.el6_0 
     zlib.i686 0:1.2.3-25.el6            

Complete!
[leipzig@localhost ~]$ sudo rpm -Uvh rstudio-server-0.94.92-x86_64.rpm
error: Failed dependencies:
	libcrypto.so.6()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libgfortran.so.1()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libssl.so.6()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
[leipzig@localhost ~]$ sudo yum install libcrypto.so.6
Setting up Install Process
Package openssl098e-0.9.8e-17.el6.i686 already installed and latest version
Nothing to do
[leipzig@localhost ~]$ sudo yum install libgfortran.so.1
Setting up Install Process
Resolving Dependencies
--&gt; Running transaction check
---&gt; Package compat-libgfortran-41.i686 0:4.1.2-39.el6 set to be updated
--&gt; Finished Dependency Resolution

Dependencies Resolved

================================================================================
================================================================================
==
 Package                                           Arch                         
   Version                                  Repository                      Size
================================================================================
================================================================================
==
Installing:
 compat-libgfortran-41                             i686                         
   4.1.2-39.el6                             sl                              99 k

Transaction Summary
================================================================================
================================================================================
==
Install       1 Package(s)
Upgrade       0 Package(s)

Total download size: 99 k
Installed size: 488 k
Is this ok [y/N]: y
Downloading Packages:
compat-libgfortran-41-4.1.2-39.el6.i686.rpm                                     
                                                          |  99 kB     00:00    

Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing     : compat-libgfortran-41-4.1.2-39.el6.i686                      
                                                                             1/1


Installed:
  compat-libgfortran-41.i686 0:4.1.2-39.el6                                     
                                                                                


Complete!
[leipzig@localhost ~]$ sudo rpm -Uvh rstudio-server-0.94.92-x86_64.rpm
error: Failed dependencies:
	libcrypto.so.6()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libgfortran.so.1()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
	libssl.so.6()(64bit) is needed by rstudio-server-0.94.92-1.x86_64
[leipzig@localhost ~]$ sudo yum install libssl.so.6
Setting up Install Process
Package openssl098e-0.9.8e-17.el6.i686 already installed and latest version
Nothing to do

[leipzig@localhost ~]$ sudo rpm -Uvh --nodeps rstudio-server-0.94.92-x86_64.rpm
Preparing...                ########################################### [100%]
   1:rstudio-server         ########################################### [100%]
rsession: no process killed
Starting rstudio-server: /usr/lib/rstudio-server/bin/rserver: error while
loading shared libraries: libssl.so.6: cannot open shared object file: No such
file or directory
[FAILED]

#trying some stuff recommended here
#http://support.rstudio.org/help/discussions/problems/839-installing-rstudio-
from-source-after-installing-r-from-source

[leipzig@localhost ~]$ sudo yum install openssl098e-0.9.8e
Setting up Install Process
Resolving Dependencies
--&gt; Running transaction check
---&gt; Package openssl098e.x86_64 0:0.9.8e-17.el6 set to be updated
--&gt; Finished Dependency Resolution

Dependencies Resolved

================================================================================
================================================================================
==
 Package                                  Arch                               
Version                                       Repository                      
Size
================================================================================
================================================================================
==
Installing:
 openssl098e                              x86_64                             
0.9.8e-17.el6                                 sl                             
762 k

Transaction Summary
================================================================================
================================================================================
==
Install       1 Package(s)
Upgrade       0 Package(s)

Total download size: 762 k
Installed size: 2.2 M
Is this ok [y/N]: y
Downloading Packages:
openssl098e-0.9.8e-17.el6.x86_64.rpm                                            
                                                          | 762 kB     00:00    

Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Warning: RPMDB altered outside of yum.
rstudio-server-0.94.92-1.x86_64 has missing requires of libcrypto.so.6()(64bit)
rstudio-server-0.94.92-1.x86_64 has missing requires of
libgfortran.so.1()(64bit)
rstudio-server-0.94.92-1.x86_64 has missing requires of libssl.so.6()(64bit)
  Installing     : openssl098e-0.9.8e-17.el6.x86_64                             
                                                                             1/1


Installed:
  openssl098e.x86_64 0:0.9.8e-17.el6                                            
                                                                                


Complete!
[leipzig@localhost ~]$ sudo yum install gcc41-libgfortran-4.1.2
Setting up Install Process
No package gcc41-libgfortran-4.1.2 available.
Error: Nothing to do
[leipzig@localhost ~]$ sudo yum install pango-1.28.1
Setting up Install Process
Package pango-1.28.1-3.el6_0.5.x86_64 already installed and latest version
Nothing to do
[leipzig@localhost ~]$ sudo rpm -Uvh --nodeps rstudio-server-0.94.92-x86_64.rpm
Preparing...                ########################################### [100%]
	package rstudio-server-0.94.92-1.x86_64 is already installed

[leipzig@localhost ~]$ sudo rstudio-server start
[leipzig@localhost ~]$ sudo rstudio-server verify-installation
Stopping rstudio-server:                                   [  OK  ]
/usr/lib/rstudio-server/bin/rsession: error while loading shared libraries:
libgfortran.so.1: wrong ELF class: ELFCLASS32
Starting rstudio-server:                                   [  OK  ]
[leipzig@localhost ~]$ sudo yum install libgfortran.so.1
Setting up Install Process
Package compat-libgfortran-41-4.1.2-39.el6.i686 already installed and latest
version
Nothing to do
[leipzig@localhost ~]$ sudo rpm -Uvh
ftp.scientificlinux.org/linux/scientific/6.0/x86_64/os/Packages/compat-
libgfortran-41-4.1.2-39.el6.x86_64.rpm
error: open of
ftp.scientificlinux.org/linux/scientific/6.0/x86_64/os/Packages/compat-
libgfortran-41-4.1.2-39.el6.x86_64.rpm failed: No such file or directory
[leipzig@localhost ~]$ wget
ftp.scientificlinux.org/linux/scientific/6.0/x86_64/os/Packages/compat-
libgfortran-41-4.1.2-39.el6.x86_64.rpm
--2011-08-18 04:39:39-- 
http://ftp.scientificlinux.org/linux/scientific/6.0/x86_64/os/Packages/compat-
libgfortran-41-4.1.2-39.el6.x86_64.rpm
Resolving ftp.scientificlinux.org... 131.225.110.147
Connecting to ftp.scientificlinux.org|131.225.110.147|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 128080 (125K) [application/x-rpm]
Saving to: “compat-libgfortran-41-4.1.2-39.el6.x86_64.rpm”

100%[===========================================================================
=============================================&gt;] 128,080      488K/s   in 0.3s   


2011-08-18 04:39:39 (488 KB/s) - “compat-libgfortran-41-4.1.2-39.el6.x86_64.rpm”
saved [128080/128080]

[leipzig@localhost ~]$ sudo rpm -Uvh
compat-libgfortran-41-4.1.2-39.el6.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:compat-libgfortran-41  ########################################### [100%]
[leipzig@localhost ~]$ sudo rstudio-server verify-installation
Stopping rstudio-server:                                   [  OK  ]
Starting rstudio-server:                                   [  OK  ]

Thursday, June 23, 2011

Big-Ass Servers™ and the myths of clusters in bioinformatics

Spending $55k for a 512GB machine (Big-Ass Server™ or BAS™) can be a tough sell for a bioinformatics researcher to pitch to a department head.

Dell PowerEdge r900, available in orange and lemon-lime
Speaking as someone who keeps his copy of CLR safely stored in the basement, ready to help rebuild society after a nuclear holocaust, I am painfully aware of the importance of algorithm development in the history of computing, and the possibilities for parallel computing to make problems tractable.

Having recently spent 3 years in industry, however, I am now more inclined to just throw money at problems. In the case of hardware, I think this approach is more effective than clever programming for many of the current problems posed by NGS.

From an economic and productivity perspective, I believe most bioinformatics shops doing basic research would benefit more from having access to a BAS™ than a cluster. Here's why:
  • The development of multicore/multiprocessor machines and memory capacity has outpaced the speed of networks. NGS analyses tends to be more memory-bound and IO-bound rather than CPU-bound, so relying on a cluster of smaller machines can quickly overwhelm a network.
  • NGS has forced the number of high-performance applications from BLAST and protein structure prediction to doing dozens of different little analyses, with tools that change on a monthly basis, or are homegrown to deal with special circumstances. There isn't time or ability to write each of these for parallel architectures.
If those don't sound very convincing, here is my layman's guide to dealing with the myths you might encounter concerning NGS and clusters:

Myth: Google uses server farms. We should too.


Google has to focus on doing one thing very well: search.

Bioinformatics programmers have to explore a number of different questions for any given experiment. There is not time to develop a parallel solution to many of these questions as they will lead to dead ends.

Many bioinformatic problems, de-novo assembly being a prime example, are notoriously difficult to divide among several machines without being overwhelmed with messaging. You can imagine trying to divide a jigsaw puzzle among friends sitting several tables, you would spend more time talking about the pieces than fitting them together.

Myth: Our development setup should mimic our production setup


An experimental computing structure with a BAS™ allows for researchers to freely explore big data without having to think about how to divide it efficiently. If an experiment is successful and there is the need to scale-up to a clinical or industrial platform, that can happen later.

Myth: Clusters have been around a long time so there is a lot of shell-based infrastructure to distribute workflows


There are tools for queueing jobs, but those are often quite helpless to assist in managing workflows that are written as parallel and serial steps - for example, waiting for steps to finish before merging results.

Various programming languages have features to take advantage of clusters. For example, R has SNOW. But Rsamtools requires you to load BAM files into memory, so a BAS™ is not just preferable for NGS analysis with R, it's required.

Myth: The rise of cloud computing and Hadoop means that homegrown clusters are irrelevant that but also means we don't need a BAS™


The popularity of cloud computing in bioinformatics is also driven by the newfound ability to rent time on a BAS™. The main problem with cloud computing is the bottleneck posed by transferring GBs data to the cloud.

Myth: Crossbow and Myrna are based on Hadoop, we can develop similar tools


Ben Langmead, Cole Trapnell, and Michael Schatz, alums of Steven Salzberg's group at UMD, have developed NGS solutions using the Hadoop MapReduce framework.
  • Crossbow is a Hadoop-based implementation of Bowtie.
  • Myrna is an RNA-Seq pipeline.
  • Contrail is a de novo short read assembler.
These are difficult programs to develop, and these examples are also somewhat limited experimental proofs of concept or are married to components that may be undesirable for certain analyses. The Bowtie stack (Bowtie, Tophat, Cufflinks), while revolutionary in its implementation of Burroughs-Wheeler algorithm, is itself is built around the limitations of computers in the year 2008. For many it lacks the sensitivity to deal with, for example, 1000 Genomes data.

The dynamic scripting languages used most bioinformatics programmers are not as well suited to Hadoop as Java. To imply we can all develop similar tools of this sophistication is unrealistic. Many bioinformatics programs are not even threaded, much less designed to work amongst several machines.

Myth: embarrassingly parallel problems imply a cluster is needed

 

A server with 4 quad-core processors is often adequate for handling these embarrassing problems. Dividing the work just tends to lead to further embarrassments.

 

Here is a particularly telling quote from Biohaskell developer Ketil Malde on Biostar:
In general, I think HPC are doing the wrong thing for bioinformatics. It's okay to spend six weeks to rewrite your meteorology program to take advantage of the latest supercomputer (all of which tend to be just a huge stack of small PCs these days) if the program is going to run continously for the next three years. It is not okay to spend six weeks on a script that's going to run for a couple of days.

In short, I keep asking for a big PC with a bunch of the latest Intel or AMD core, and as much RAM as we can afford.

Myth: We don't have money for a BAS™ because we need a new cluster to handle things like BLAST


IBM System x3850 X5 expandable to 1536GB, mouse not included
Even the BLAST setup we think of as being the essence of parallel (a segmented genome index - every node gets a part of the genome) is often not the one that many institutions have settled on. Many rely on farming out queries to a cluster in which every node has the full genome index in memory.

Secondly, the mpiBLAST appears to be more suited to dividing an index among older machines than today's, which typically have >32GB RAM. Here is a telling FAQ entry:

I benchmarked mpiBLAST but I don't see super-linear speedup! Why?!

mpiBLAST only yields super-linear speedup when the database being searched is significantly larger than the core memory on an individual node. The super-linear speedup results published in the ClusterWorld 2003 paper describing mpiBLAST are measurements of mpiBLAST v0.9 searching a 1.2GB (compressed) database on a cluster where each node has 640MB of RAM. A single node search results in heavy disk I/O and a long search time.
http://www.mpiblast.org/Docs/FAQ#super-linear

Your comments on this topic are welcome!

Tuesday, March 15, 2011

RStudio: My thoughts


Let me get this out of the way: I just love RStudio.

Created by a team lead by JJ Allaire, a name that should ring a bell if you were involved in web development during the Clinton administration, RStudio is an R IDE that is actually designed for R from the ground up. RStudio works on Linux, Mac, and Windows platforms, and can even run over the web.

While borrowing many of the best features from ESS, the Mac R-GUI, and maybe Anup Parikh's Red-R, RStudio provides solutions to several long-standing barriers that have hampered R code development. For instance, to do Sweave-&gt;tex-&gt;pdf (then view the pdf) in ESS was a frustrating, arthritic (M-n s M-n P)process that flummoxed even the greatest minds of our generation. RStudio has a handy button (Compile PDF) that brings you all the way from .Rnw to Acrobat. Although this command appears to run in its own session, leading to some unexpected behavior compared to running Sweave from the command line, the fact that this IDE is already geared for Sweave bodes well for future development.
This is fucking genius

The movement of commands back and forth from console to editor is another task that other editors made unnecessarily difficult - the old Mac R GUI console would not let you copy-and-paste a subset of the history, ESS was always geared to having users write code in the editor then executing lines but never writing code in the console then committing to the script. RStudio provides means of easily going in either direction. Control over multiple plots (solving both the overwritten X-Window and the annoying type=Cairo PNG problem on OS X) is a welcome relief.

RStudio offers very good autocompletion for such a relatively weird language - in addition to package methods it is aware of data frame columns and user-defined functions, for instance.

RStudio has already garnered a good number of suggestions. Here's personally what I would like to see:
  1. More support for LaTeX markup, including menu driven formatting options so users don't have to memorize stuff like \textbf{}
  2. More built-in aesthetic support for ggplot2, something where users are given a WYSIWYG manipulating an existing plot similar to Jeroen Ooms' ggplot2 web application
  3. A non-sudo Linux binary and a method of specifying different R and TeX installations kicking around a server without re-installing from source.
  4. Better control over the working directory (already reported and a likely future feature)
  5. A means of quickly seeing where source files are actually located without mouseover
  6. Integration with version control.
  7. Code cleanup and indenting