[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Single unrecognized character wrecks entire display
From: |
Peter Dyballa |
Subject: |
Re: Single unrecognized character wrecks entire display |
Date: |
Wed, 22 Aug 2012 17:18:36 +0200 |
Am 22.08.2012 um 11:36 schrieb Alexandre Oberlin:
> The problem is that it names a full list as bad characters, when only one.
> utf-8-mac cannot encode these: \351 \350 \351 \342 \351 \234 \350 \350
> \311 \240
> How can I spot the true non utf-8 without trying them all?
When you set read-quoted-char-radix to 8 you can search for these "characters"
in the text by:
C-s C-q 3 5 1 RET
Hopefully! I think the problem is that your convertor (can't you use something
reliable like iconv or recode?) makes mistakes. \240 or A0 in hex exists as
partner of another byte (with C2 it constructs NO-BREAK SPACE, with C3 it's
LATIN SMALL LETTER A WITH GRAVE, …), \234 or 9C builds with C3 LATIN CAPITAL
LETTER U WITH DIAERESIS etc. I think what GNU Emacs wants to tell you and what
I did not understand the first time is, that some characters obviously are not
encoded correctly so that these "isolated" *bytes* are left over, they don't
fit into regular 2- or 3- or even 4-byte codes of the UTF-8 encoding – and of
course none of them is an ASCII character encoded by one byte (i.e., itself).
The utf-8-mac encoding in GNU Emacs is UTF-8 that uses ^M or CR as end of line
character (UNIX uses ^J or Line Feed).
Can you give us some more details of the original source and the convertor, and
its working principle (command line options)? How do you open it in GNU Emacs?
How does it behave when you had launched GNU Emacs with -Q, i.e., with none of
your possibly faulty customisation? By using for example on the command line:
env LC_CTYPE=UTF-8 LANG=fr_FR.UTF-8 emacs -Q &
or
env LC_CTYPE=UTF-8 LANG=fr_FR.UTF-8
/Applications/Emacs.app/Contents/MacOS/Emacs -Q &
GNU Emacs should then automatically switch to some UTF-8 encoding – whether
it's Apple or UNIX or MS line endings should not play such a role. You should
see, if the input is faulty, searchable octal codes.
--
Greetings
Pete
A lot of us are working harder than we want, at things we don't like to do.
Why? ...In order to afford the sort of existence we don't care to live.
– Bradford Angier
Re: Single unrecognized character wrecks entire display, Stefan Monnier, 2012/08/22