bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gawk and non-ASCII characters


From: Eli Zaretskii
Subject: Re: Gawk and non-ASCII characters
Date: Sat, 16 Oct 2010 15:01:00 +0200

> Date: Sat, 16 Oct 2010 08:22:56 -0400
> From: Charles Kozierok <address@hidden>
> 
> I am grabbing HTML code from a site that has some non-ASCII codes in
> it. Specifically, the code is "C2 A0". This shows up in ANSI as a
> capital "A" with a circumflex on top followed by a space. In ASCII it
> becomes a regular "A" followed by a space.
> 
> I need to be able to properly identify these so I can get rid of them,

What exactly do you mean by "these"?  Do you mean the sequence "C2
A0", or do you want to identify each one of them individually?

> but I can't figure out how to do it. The character doesn't seem to
> match any character codes within gawk, and I can't find any command
> line or option settings to either filter them out or have them be
> dealt with properly.

What is your locale?  (If this is on GNU/Linux, the `locale' command
will show that.)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]