bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep does not process non-ASCII characters correctly


From: Paul Eggert
Subject: Re: grep does not process non-ASCII characters correctly
Date: Tue, 8 May 2001 11:05:43 -0700 (PDT)

> From: Bruno Haible <address@hidden>
> Date: Tue, 8 May 2001 15:43:08 +0200 (CEST)

> The remaining problems in grep appear to be located in dfa.h and dfa.c.

Yes, those modules need to be made multibyte-aware.

A related problem is that POSIX.2 requires that the modules also
understand multicharacter collating sequences.  For example in a
Danish locale where "aa" is a collating sequence, POSIX.2 requires
that [^[:alpha::] must match "aa".  This is one reason why I never
looked into fixing dfa (the other was lack of time).

This requirement has been relaxed in POSIX.1-200x draft 6, so (unless
it gets changed again before the standard comes out) dfa.c doesn't
need to worry about multicharacter collating sequences matched by
bracket expressions if it doesn't want to.  But that may bring up
another issue: what if dfa and regex are incompatible with each other,
even though they both conform to POSIX?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]