[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Email text that confuses charset recognition in emacs
From: |
Paul Eggert |
Subject: |
Re: Email text that confuses charset recognition in emacs |
Date: |
Tue, 16 Apr 2013 21:37:08 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 |
On 04/16/2013 09:27 AM, Giorgos Keramidas wrote:
> the attached email message confuses the charset
> detection machinery of Emacs, and it starts interpreting all text as
> Japanese text -- even though most of the contents of the file are plain
> us-ascii text.
Although the text is US-ASCII it contains a valid ISO-2022-7bit
coding sequence (the two things are not incompatible)
which Emacs is properly detecting and converting. The problem is that
the text later contains the invalid escape sequence
ESC LF > > SP ( B
This text was intended to switch out of a Japanese charset (the immediately
preceding text is valid ISO-2022-7bit Japanese), but a mailer that
*thought* that the text was ASCII inserted LF > > SP after the ESC
and before the ( B, causing the ESC ( B to be corrupted, so Emacs remains
in Japanese mode until the end of the input.
Perhaps when Emacs is decoding ISO-2022-7bit and sees an invalid
escape sequence, it should switch back to ASCII. That would have
fixed your problem, and wouldn't break the decoding of any valid
ISO-2022-7bit sequence.