[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regex-split for Guile
From: |
William James |
Subject: |
Re: regex-split for Guile |
Date: |
Mon, 14 Mar 2011 07:54:39 -0700 (PDT) |
Neil Jerram wrote:
> Thanks for posting that! For fun/interest, here's an alternative
> implementation that occurred to me.
>
> Neil
Thanks for the feedback.
>
>
> (use-modules (ice-9 regex)
> (ice-9 string-fun))
>
> (define (regex-split regex str . opts)
> (let* ((unique-char #\@)
> (unique-char-string (string unique-char)))
> (let ((splits (separate-fields-discarding-char
> unique-char
> (regexp-substitute/global #f
> regex
> str
> 'pre
> unique-char-string
> 0
> unique-char-string
> 'post)
> list)))
This is an approach that I used some years ago in Awk.
ASCII code 1 is used as the unique character:
# Produces array of nonmatching and matching
# substrings. The size of the array will
# always be an odd number. The first and the
# last item will always be nonmatching.
function shatter( s, shards, regexp )
{ gsub( regexp, "\1&\1", s )
return split( s, shards, "\1" )
}
> (cond ((memq 'keep opts)
> splits)
> (else
> (let ((non-matches (map (lambda (i)
> (list-ref splits (* i 2)))
> (iota (floor (/ (1+ (length
> splits))
> 2))))))
> (if (memq 'trim opts)
> (filter (lambda (s)
> (not (zero? (string-length s))))
> non-matches)
> non-matches)))))))
The way that I want 'trim to work is to remove just the
leading and trailing empty strings. In Ruby, trailing
null strings are removed by default:
",foo,,,bar,".split( "," )
==>["", "foo", "", "", "bar"]