octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte


From: Mike Miller
Subject: [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters
Date: Sun, 28 Jul 2019 19:30:28 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36

Follow-up Comment #14, bug #35910 (project octave):

Ok, I can now also reproduce this on my full development system, not just in a
container, but only with octave-cli.

So to repeat in a more reproducible way, here is regexp_error.m:


c = regexp ('lorem ipsum', '^\s*(⇒|=>|⊣|-\|)')


And here are three examples, showing that it works without error in
interactive Octave with LANG set to include UTF-8, but errors when locale
variables are not set or when running in batch mode from the command line:


$ octave-cli-6.0.0 -q
>> regexp_error
c = [](1x0)
>> 

$ env -u LANG octave-cli-6.0.0 -q
>> regexp_error
error: regexp: unrecognized character after (? or (?- at position 13 of
expression
error: called from
    regexp_error at line 1 column 3
>> 

$ octave-cli-6.0.0 -q regexp_error.m 
error: regexp: unrecognized character after (? or (?- at position 13 of
expression
error: called from
    regexp_error at line 1 column 3


When octave-gui is used, these errors are not raised. So there is possibly
some locale initialization that happens in the Qt framework as part of the
octave-gui executable that is missing from octave-cli.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?35910>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]