bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Does gawk character classes follow this?


From: Eli Zaretskii
Subject: Re: [bug-gawk] Does gawk character classes follow this?
Date: Fri, 15 Feb 2019 11:36:16 +0200

> From: address@hidden
> Date: Fri, 15 Feb 2019 02:26:01 -0700
> Cc: address@hidden, address@hidden
> 
> Hi Eli.
> 
> Eli Zaretskii <address@hidden> wrote:
> 
> > > [:alnum:]         [a-zA-Z0-9]
> > > [:alpha:]         [a-zA-Z]
> > > [:ascii:]         [\x00-\x7F]
> > > [:cntrl:]  [\x00-\x1F\x7F]
> >
> > Doesn't the meaning of these character classes depend on the
> > implementation of the regex library with which Gawk was linked?
> 
> Yes and no. Gawk always links with the included regex and dfa routines,
> so there really isn't an option to use a different regex library.

But the included regex library comes from Gnulib, so if the Gnulib
folks change their code, Gawk will follow suit, right?  This means we
cannot reliably describe what each such named class means without
updating it every time we import a new version of the regex library,
something that might not be trivial (does Gnulib even document these
subtleties? AFAIK they just point o Posix).

> That said, gawk's routines use the underlying C library ctype/wctype
> routines to check those classes.

So the definition is locale-dependent, in addition to all the other
problems.

> On systems that understand locales, the C library returns true/false
> for a given character / wide character based on the locale's settings.

Right, which means, unless your locale's codeset is UTF-8, Gawk only
supports characters that can be encoded by the locale's codeset.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]