octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte


From: Mike Miller
Subject: [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters
Date: Sun, 28 Jul 2019 17:01:04 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36

Follow-up Comment #13, bug #35910 (project octave):

I'm seeing some strange regressions with this change in a minimal container
running Octave with some regular expressions containing UTF-8 characters (in
the doctest package).

Example regexerror.m:


c = regexp ('lorem ipsum', '^\s*(⇒|=>|⊣|-\|)', 'lineanchors');



$ octave regexerror.m
error: regexp: unrecognized character after (? or (?- at position 13 of
expression
error: called from
    regexerror at line 1 column 3


I get different results depending on whether this script is run from the
command shell or in an interactive Octave, and whether the environment
contains LANG or LC_??? variables containing UTF-8 or not. This suggests that
there is something I could configure or install in my environment to fix this,
but I have no idea what that is at the moment. Any ideas?

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?35910>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]