bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Does gawk character classes follow this?


From: arnold
Subject: Re: [bug-gawk] Does gawk character classes follow this?
Date: Fri, 15 Feb 2019 03:01:34 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Eli Zaretskii <address@hidden> wrote:

> But the included regex library comes from Gnulib, so if the Gnulib
> folks change their code, Gawk will follow suit, right?

Yes, but I don't expect this to change. I usually review the differences
before importing and make sure that 'make check' still passes.

> (does Gnulib even document these
> subtleties? AFAIK they just point o Posix).

POSIX pretty much wants what I've described as happening.

> So the definition is locale-dependent,

Yes, but that's been the case since locales came into the picture, it's
not anything new.

> in addition to all the other problems.

What other problems?

> > On systems that understand locales, the C library returns true/false
> > for a given character / wide character based on the locale's settings.
>
> Right, which means, unless your locale's codeset is UTF-8, Gawk only
> supports characters that can be encoded by the locale's codeset.

That's always been true.  Gawk relies on the C library multibyte to
wide char and wide char to multibyte routines, and the behavior of
all of those are also locale-dependant.

I get the feeling that there's something really bothering you, but
I don't understand what.

Can you clarify, please?

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]