regexp: how to split a cellstr array into substring arrays, each matchin

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

regexp: how to split a cellstr array into substring arrays, each matchin

From:	Philip Nienhuis
Subject:	regexp: how to split a cellstr array into substring arrays, each matching regular expressions
Date:	Sun, 20 May 2012 22:00:53 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Having a cellstr array like this:

octave:178> ar = {'abcdefguvwxAny' ; 'acegxyzTrailing'; 'vxzJunk'}
ar =
{
  [1,1] = abcdefguvwxAny
  [2,1] = acegxyzTrailing
  [3,1] = vxzJunk
}

how can I efficiently split it into two columns using regularexpressions like

'[abcdefg]'  and  '[uvwxyz]'

to obtain

{ 'abcdefg', 'uvwxAny'; 'acegTrailing', 'xyz'; '', 'vxzJunk'}  ?

IOW, I'd like to split the cellstr array at the location where'[uvwxyz]' matches (even if not present, see far below).



The closest I get is:

## Invert pattern and use 'split' keyword
octave:179> ss = regexp (ar, '[^abcdefg]', 'split')
ss =
{
  [1,1] =
  {
    [1,1] = abcdefg
    [1,2] =
    [1,3] =
    [1,4] =
    [1,5] =
    [1,6] =
    [1,7] =
    [1,8] =
  }
  [2,1] =
  {
    [1,1] = aceg
    [1,2] =
    [1,3] =
    [1,4] =
    [1,5] =
    [1,6] = a
    [1,7] =
    [1,8] =
    [1,9] =
    [1,10] = g
  }
  [3,1] =
  {
    [1,1] =
    [1,2] =
    [1,3] =
    [1,4] =
    [1,5] =
    [1,6] =
    [1,7] =
    [1,8] =
  }
}
octave:180> col1 = cellfun (@(x) x{1}, {ss{:}}, 'uni', false)
col1 =
{
  [1,1] = abcdefg
  [1,2] = aceg
  [1,3] =
}
octave:181> col2 = regexp (ar, '[uvwxyz].*', 'match', 'once')
tt =
{
  [1,1] = uvwxAny
  [2,1] = xyzTrailing
  [3,1] = vxzJunk
}

## ...or the latter statement, perhaps more robust, as:
octave:182> tt = regexp (ar, '[uvwxyz].*', 'match')
tt =
{
  [1,1] =
  {
    [1,1] = uvwxAny
  }
  [2,1] =
  {
    [1,1] = xyzTrailing
  }
  [3,1] =
  {
    [1,1] = vxzJunk
  }
}
octave:183> col2 = cellfun (@(x) [x{:}], {tt{:}}, 'Uni', false)
tt =
{
  [1,1] = uvwxAny
  [1,2] = xyzTrailing
  [1,3] = vxzJunk
}
octave:184>

( cellfun() was invoked to be able to use repeated indexing; I couldn'tfind another way to extract the first/last entries of ss and tt. )

I think my method isn't very robust.
So I hope there's a less convoluted and more reliable way.


BTW,
octave:184> ar = {'abcdefguvwxAny' ; 'acegxyzTrailing'; 'aJunk'}
ar =
{
  [1,1] = abcdefguvwxAny
  [2,1] = acegxyzTrailing
  [3,1] = Junk
}
octave:186> tt = regexp (ar, '[uvwxyz].*', 'match', 'once')
tt =
{
  [1,1] = uvwxAny
  [2,1] = xyzTrailing
  [3,1] = unk
}

=> is this a bug? (swallowing the "J" from the last entry)

Thanks,

Philip

[Prev in Thread]

Current Thread

[Next in Thread]

regexp: how to split a cellstr array into substring arrays, each matching regular expressions, Philip Nienhuis <=
- Re: regexp: how to split a cellstr array into substring arrays, each matching regular expressions, Philip Nienhuis, 2012/05/21
  - Re: regexp: how to split a cellstr array into substring arrays, each matching regular expressions, Mike Miller, 2012/05/21
    - Re: regexp: how to split a cellstr array into substring arrays, each matching regular expressions, Philip Nienhuis, 2012/05/21

Prev by Date: Re: plotyy broken in 3.6.1?
Next by Date: Re: plotyy broken in 3.6.1?
Previous by thread: Another problem with GraphicsMagick-1.3.12
Next by thread: Re: regexp: how to split a cellstr array into substring arrays, each matching regular expressions
Index(es):
- Date
- Thread