bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: sub-problem with regex


From: Aharon Robbins
Subject: RE: sub-problem with regex
Date: Sun, 29 Sep 2002 18:46:17 +0300

I'm glad it's now working. I will try to add something about this
to the gawk doc, since this is becoming an increasingly frequent
question.

Arnold

> From: Alexander Stohr <address@hidden>
> To: Aharon Robbins <address@hidden>, Alexander Stohr
>        <address@hidden>,
>    address@hidden
> Date: Fri, 27 Sep 2002 15:31:25 +0200
> Subject: RE: sub-problem with regex
>
> the suggested change fixed the problem.
>
> here are a few words on my leraning curve:
>
> as seen in manpage the LC_ALL=C subdivides
> into several other LC_* settings.
> I have to assume that its the LC_CTYPE
> submember that is the thing in request.
>
> hmm, on a RH7.3 system where its working:
>   LANG=en_US.iso885915
>   LC_ALL=<unset>
>
> on the RH beta system where it fails
>   LANG=en_US.UTF-8
>   LC_ALL=<unset>
>
> so it was the UTF-8 which does break my awak code.
> just curios that [[:upper:]] was different from [A-Z].
> (and it broke midnight commander grafical chars
> on all telnet consoles too!)
>
> on the other side i tried setting IGNORECASE and
> saw effects on results on first machine but could 
> not affect the problematic behaviour on the second.
>
> so i would suggest adding a string hint to both
> settings vice versa because it seems the LANG 
> setting does prevent the other to change behaviour.
> two possible causes => two things to look after
> => two or more possible solutions.
>
> Note:
> as only beeing a minor part time awk i have had
> no success in dumping the script variables into
> a file despite doing pretty nice cut and paste.
>
> anyways, the thing is working now.
>
> > -----Original Message-----
> > From: Aharon Robbins [mailto:address@hidden
> > Sent: Friday, September 27, 2002 09:56
> > To: address@hidden; address@hidden
> > Subject: Re: sub-problem with regex
> > 
> > 
> > This is a problem with the locale you're using.  If you set
> > 
> >     export LANG=C
> > 
> > or
> > 
> >     export LC_ALL=C
> > 
> > before running gawk, it'll work the way you expect.  It's not 
> > a gawk bug.
> > 
> > Arnold
>
> ----=_NextPart_ST_09_27_21_Friday_September_27_2002_3360
> Content-Type: text/html;
>       charset="iso-8859-1"
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
> <HTML>
> <HEAD>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
> <META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2653.12">
> <TITLE>RE: sub-problem with regex</TITLE>
> </HEAD>
> <BODY>
>
> <P><FONT SIZE=2>the suggested change fixed the problem.</FONT>
> </P>
>
> <P><FONT SIZE=2>here are a few words on my leraning curve:</FONT>
> </P>
>
> <P><FONT SIZE=2>as seen in manpage the LC_ALL=C subdivides</FONT>
> <BR><FONT SIZE=2>into several other LC_* settings.</FONT>
> <BR><FONT SIZE=2>I have to assume that its the LC_CTYPE</FONT>
> <BR><FONT SIZE=2>submember that is the thing in request.</FONT>
> </P>
>
> <P><FONT SIZE=2>hmm, on a RH7.3 system where its working:</FONT>
> <BR><FONT SIZE=2>&nbsp; LANG=en_US.iso885915</FONT>
> <BR><FONT SIZE=2>&nbsp; LC_ALL=&lt;unset&gt;</FONT>
> </P>
>
> <P><FONT SIZE=2>on the RH beta system where it fails</FONT>
> <BR><FONT SIZE=2>&nbsp; LANG=en_US.UTF-8</FONT>
> <BR><FONT SIZE=2>&nbsp; LC_ALL=&lt;unset&gt;</FONT>
> </P>
>
> <P><FONT SIZE=2>so it was the UTF-8 which does break my awak code.</FONT>
> <BR><FONT SIZE=2>just curios that [[:upper:]] was different from [A-Z].</FONT>
> <BR><FONT SIZE=2>(and it broke midnight commander grafical chars</FONT>
> <BR><FONT SIZE=2>on all telnet consoles too!)</FONT>
> </P>
>
> <P><FONT SIZE=2>on the other side i tried setting IGNORECASE and</FONT>
> <BR><FONT SIZE=2>saw effects on results on first machine but could </FONT>
> <BR><FONT SIZE=2>not affect the problematic behaviour on the second.</FONT>
> </P>
>
> <P><FONT SIZE=2>so i would suggest adding a string hint to both</FONT>
> <BR><FONT SIZE=2>settings vice versa because it seems the LANG </FONT>
> <BR><FONT SIZE=2>setting does prevent the other to change behaviour.</FONT>
> <BR><FONT SIZE=2>two possible causes =&gt; two things to look after</FONT>
> <BR><FONT SIZE=2>=&gt; two or more possible solutions.</FONT>
> </P>
>
> <P><FONT SIZE=2>Note:</FONT>
> <BR><FONT SIZE=2>as only beeing a minor part time awk i have had</FONT>
> <BR><FONT SIZE=2>no success in dumping the script variables into</FONT>
> <BR><FONT SIZE=2>a file despite doing pretty nice cut and paste.</FONT>
> </P>
>
> <P><FONT SIZE=2>anyways, the thing is working now.</FONT>
> </P>
>
> <P><FONT SIZE=2>&gt; -----Original Message-----</FONT>
> <BR><FONT SIZE=2>&gt; From: Aharon Robbins [<A 
> HREF="mailto:address@hidden";>mailto:address@hidden</A>]</FONT>
> <BR><FONT SIZE=2>&gt; Sent: Friday, September 27, 2002 09:56</FONT>
> <BR><FONT SIZE=2>&gt; To: address@hidden; address@hidden</FONT>
> <BR><FONT SIZE=2>&gt; Subject: Re: sub-problem with regex</FONT>
> <BR><FONT SIZE=2>&gt; </FONT>
> <BR><FONT SIZE=2>&gt; </FONT>
> <BR><FONT SIZE=2>&gt; This is a problem with the locale you're using.&nbsp; 
> If you set</FONT>
> <BR><FONT SIZE=2>&gt; </FONT>
> <BR><FONT SIZE=2>&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; export LANG=C</FONT>
> <BR><FONT SIZE=2>&gt; </FONT>
> <BR><FONT SIZE=2>&gt; or</FONT>
> <BR><FONT SIZE=2>&gt; </FONT>
> <BR><FONT SIZE=2>&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; export LC_ALL=C</FONT>
> <BR><FONT SIZE=2>&gt; </FONT>
> <BR><FONT SIZE=2>&gt; before running gawk, it'll work the way you 
> expect.&nbsp; It's not </FONT>
> <BR><FONT SIZE=2>&gt; a gawk bug.</FONT>
> <BR><FONT SIZE=2>&gt; </FONT>
> <BR><FONT SIZE=2>&gt; Arnold</FONT>
> </P>
>
> </BODY>
> </HTML>
> ----=_NextPart_ST_09_27_21_Friday_September_27_2002_3360--
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]