[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gawk ignores case with LANG=en_US
From: |
Aharon Robbins |
Subject: |
Re: gawk ignores case with LANG=en_US |
Date: |
Wed, 13 May 2009 19:05:08 +0300 |
Hi.
With locales, ranges don't mean what you think they mean. This is discussed
in the gawk documentation. Use
/^[[:lower:]]/ { print }
to get what you want.
Thanks,
Arnold
> Subject: gawk ignores case with LANG=en_US
> From: Jim Keniston <address@hidden>
> To: address@hidden
>
> --- bug.awk ---
> /^[a-z]/ { print }
> --- input ---
> 1
> a
> A
> --- output_buggy ---
> a
> A
> --- output_expected ---
> a
> -----
> Repeat by:
> $ gawk -f bug.awk < input
>
> Assuming that environment variables LC_ALL and LC_CTYPE are
> undefined, if I run the above with the LANG environment variable
> set to "en_US.utf8" or "en_US", "A" matches "^[a-z]" and the
> output is as in output_buggy. Setting IGNORECASE=0 in the
> command line or the script doesn't help.
>
> If I do
> $ LANG= gawk -f bug.awk < input
> I get the expected output.
>
> gawk version: GNU Awk 3.1.5
> OS: RH Fedora 9 + Linux v2.6.29-rc8
>
> Jim Keniston
> IBM Linux Technology Center
> Beaverton, OR
>
- Re: gawk ignores case with LANG=en_US,
Aharon Robbins <=