Beware! Groovy split and tokenize don't treat empty elements the same
Groovy's tokenize, which returns a List, will ignore empty elements (when a delimiter appears twice in succession). Split keeps such elements and returns an Array. If you want to use List functions but you don't want to lose your empty elements, then just use split and convert your Array into a List in a separate step.
This might be important if you are parsing CSV files with empty cells.
This might be important if you are parsing CSV files with empty cells.
import groovy.util.GroovyTestCase
class StringTests extends GroovyTestCase {
protected void setUp() {
super.setUp()
}
protected void tearDown() {
super.tearDown()
}
void testSplitAndTokenize() {
assertEquals("This,,should,have,five,items".tokenize(',').size(),5)
assertEquals("This, ,should,have,six,items".tokenize(',').size(),6)
assertEquals("This, ,should,have,six,items".split(',').size(),6)
assertEquals("This,,should,have,six,items".split(',').size(),6)
//convert array to List and re-evaluate
def fieldArray = "This,,should,have,six,items".split(',')
def fields=fieldArray.collect{it}
assert fields instanceof java.util.List
assertEquals(fields.size(),6)
}
}
Thanks for the post!
ReplyDeleteAnother tricky thing I noticed with split is that
assertEquals(",,,".split(',').size(),0)
assertEquals(",,,a".split(',').size(),4)
that is, if all the tokens are empty then the returned list is not made by empty tokens, but is itself an empty list.
this caused me some headaches :)
Split will omit ending empty elements.
ReplyDelete"a,b,c".split(",").size() == 3 //as expected
but
"a,b,".split(",").size() == 2 //not as expected
and
"a,,".split(",").size() == 1 //not as expected
However
",,c".split(",").size() == 3 //as expected
Similarly,
",, ".split(",").size() == 3 //as expected
and
"a,, ".split(",").size() == 3 //as expected
This gives rise to a hacky work-around:
(someString + " ").split(someDelimiter).size() == yourExpectedSize
By adding a space to the end of the string, we make sure the last element is never empty. In this case, split works as expected. If necessary, you can .trim() the last element of the array.
This seems to work as well:
ReplyDeletefoo = "A,,C"
println foo.tokenize(",")
bar= foo.split(",").toList()
println "bar is " + bar
OUTPUT is:
[A, C]
bar is [A, , C]
Use split(",", -1)
ReplyDeletebest litter box odor eliminator
ReplyDeleteWondering how to get rid of or remove cat urine / pee smell? Have you used the best cat litter for odor control / smell with no luck? KittyVent is the ultimate litter box odor eliminator. Eliminating cat urine / pee litter smell for good.
https://kittyvent.com/