chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Codepoint indices for matched regexps (UTF-8)?


From: Henry Hu
Subject: [Chicken-users] Codepoint indices for matched regexps (UTF-8)?
Date: Fri, 15 Jun 2018 09:44:14 -0400

Hello world!

I am trying to use unit irregex to match regular expressions in UTF-8 text.  Is anyone familiar with a way to ask for the codepoint indices rather than byte indices for the match?

For example:

(irregex-match-start-index (irregex-search (irregex "Č" 'utf8) "čččČččč"))

returns 6 when I want it to return 3, since there are 3 characters (6 bytes) before my match.

I tried (use utf8), but it is documented that it doesn't affect irregex and it sure enough doesn't.  I tried using the 'utf8 option while compiling my regex, but it doesn't change the index returned by irregex-match-start-index.

Thank you for any ideas you might have!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]