[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: major gawk bug
From: |
Aharon Robbins |
Subject: |
Re: major gawk bug |
Date: |
Tue, 8 Jun 2004 15:10:43 +0300 |
Greetings. Re this:
> Date: Tue, 8 Jun 2004 15:51:19 +0400
> From: Stanislav Ievlev <address@hidden>
> To: address@hidden
> Cc: address@hidden, address@hidden
> Subject: major gawk bug
>
> Hello friends!
>
> Why gawk uses setlocale(), but have a hardcoded table (const char
> casetable[] )
> for case-independent regexp matching?
The hard coded table predates, by many years, all the locale related code
in gawk. No-one ever noticed until now that it was an issue.
> This table is correct for latin1 charset only, but incorrect for others,
> e.g. for KOI8-R (russian).
>
> KOI8-R encoding is fully compatible with 7-bit ASCII (so gawk compiles well),
> but has other symbols for codes greater then 128.
>
> So gawk supports only latin1, but ignore cp1251,koi8-r,koi8-u, etc.
>
> As I understand, it's not a problem to fill this table with locale
> specific symbols at start.
Code changes welcome. I have no idea how to do that in a manner that
is correct for all 8-bit ASCII-compatible locales. If you (or someone
else) wishes to contribute a patch, sometime soon would be a good time,
as I'm back in development mode, at least for the next little while.
> With best regards
> Stanislav Ievlev
>
> ALT Linux Team.
Thanks,
Arnold