bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gawk and non-ASCII characters


From: John Cowan
Subject: Re: Gawk and non-ASCII characters
Date: Sat, 16 Oct 2010 11:29:09 -0400
User-agent: Mutt/1.5.18 (2008-05-17)

Charles Kozierok scripsit:

> I am grabbing HTML code from a site that has some non-ASCII codes in
> it. Specifically, the code is "C2 A0". This shows up in ANSI as a
> capital "A" with a circumflex on top followed by a space. In ASCII it
> becomes a regular "A" followed by a space.

What it is, is a non-breaking space ( ) encoded in UTF-8.

> I need to be able to properly identify these so I can get rid of them,

If you actually want to get rid of them, use "iconv -f UTF-8 -t ASCII".
Alternatively, leave them alone and switch to working in UTF-8.  Notepad
can handle it, and so can many third-party editors.

-- 
What is the sound of Perl?  Is it not the       John Cowan
sound of a [Ww]all that people have stopped     address@hidden
banging their head against?  --Larry            http://www.ccil.org/~cowan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]