Tuesday, July 28, 2009

Beware! Groovy split and tokenize don't treat empty elements the same

Groovy's tokenize, which returns a List, will ignore empty elements (when a delimiter appears twice in succession). Split keeps such elements and returns an Array. If you want to use List functions but you don't want to lose your empty elements, then just use split and convert your Array into a List in a separate step.

This might be important if you are parsing CSV files with empty cells.


import groovy.util.GroovyTestCase


class StringTests extends GroovyTestCase {

protected void setUp() {
super.setUp()
}

protected void tearDown() {
super.tearDown()
}

void testSplitAndTokenize() {
assertEquals("This,,should,have,five,items".tokenize(',').size(),5)
assertEquals("This, ,should,have,six,items".tokenize(',').size(),6)

assertEquals("This, ,should,have,six,items".split(',').size(),6)
assertEquals("This,,should,have,six,items".split(',').size(),6)

//convert array to List and re-evaluate
def fieldArray = "This,,should,have,six,items".split(',')
def fields=fieldArray.collect{it}
assert fields instanceof java.util.List
assertEquals(fields.size(),6)
}
}