[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
regexp: how to split a cellstr array into substring arrays, each matchin
From: |
Philip Nienhuis |
Subject: |
regexp: how to split a cellstr array into substring arrays, each matching regular expressions |
Date: |
Sun, 20 May 2012 22:00:53 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6 |
Having a cellstr array like this:
octave:178> ar = {'abcdefguvwxAny' ; 'acegxyzTrailing'; 'vxzJunk'}
ar =
{
[1,1] = abcdefguvwxAny
[2,1] = acegxyzTrailing
[3,1] = vxzJunk
}
how can I efficiently split it into two columns using regular
expressions like
'[abcdefg]' and '[uvwxyz]'
to obtain
{ 'abcdefg', 'uvwxAny'; 'acegTrailing', 'xyz'; '', 'vxzJunk'} ?
IOW, I'd like to split the cellstr array at the location where
'[uvwxyz]' matches (even if not present, see far below).
The closest I get is:
## Invert pattern and use 'split' keyword
octave:179> ss = regexp (ar, '[^abcdefg]', 'split')
ss =
{
[1,1] =
{
[1,1] = abcdefg
[1,2] =
[1,3] =
[1,4] =
[1,5] =
[1,6] =
[1,7] =
[1,8] =
}
[2,1] =
{
[1,1] = aceg
[1,2] =
[1,3] =
[1,4] =
[1,5] =
[1,6] = a
[1,7] =
[1,8] =
[1,9] =
[1,10] = g
}
[3,1] =
{
[1,1] =
[1,2] =
[1,3] =
[1,4] =
[1,5] =
[1,6] =
[1,7] =
[1,8] =
}
}
octave:180> col1 = cellfun (@(x) x{1}, {ss{:}}, 'uni', false)
col1 =
{
[1,1] = abcdefg
[1,2] = aceg
[1,3] =
}
octave:181> col2 = regexp (ar, '[uvwxyz].*', 'match', 'once')
tt =
{
[1,1] = uvwxAny
[2,1] = xyzTrailing
[3,1] = vxzJunk
}
## ...or the latter statement, perhaps more robust, as:
octave:182> tt = regexp (ar, '[uvwxyz].*', 'match')
tt =
{
[1,1] =
{
[1,1] = uvwxAny
}
[2,1] =
{
[1,1] = xyzTrailing
}
[3,1] =
{
[1,1] = vxzJunk
}
}
octave:183> col2 = cellfun (@(x) [x{:}], {tt{:}}, 'Uni', false)
tt =
{
[1,1] = uvwxAny
[1,2] = xyzTrailing
[1,3] = vxzJunk
}
octave:184>
( cellfun() was invoked to be able to use repeated indexing; I couldn't
find another way to extract the first/last entries of ss and tt. )
I think my method isn't very robust.
So I hope there's a less convoluted and more reliable way.
BTW,
octave:184> ar = {'abcdefguvwxAny' ; 'acegxyzTrailing'; 'aJunk'}
ar =
{
[1,1] = abcdefguvwxAny
[2,1] = acegxyzTrailing
[3,1] = Junk
}
octave:186> tt = regexp (ar, '[uvwxyz].*', 'match', 'once')
tt =
{
[1,1] = uvwxAny
[2,1] = xyzTrailing
[3,1] = unk
}
=> is this a bug? (swallowing the "J" from the last entry)
Thanks,
Philip
- regexp: how to split a cellstr array into substring arrays, each matching regular expressions,
Philip Nienhuis <=