Re: Possible UTF-8 CJK Regressions in Terminal Emulators

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible UTF-8 CJK Regressions in Terminal Emulators

From:	Kenichi Handa
Subject:	Re: Possible UTF-8 CJK Regressions in Terminal Emulators
Date:	Thu, 10 Jun 2004 09:20:33 +0900 (JST)
User-agent:	SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <address@hidden>, Stefan Monnier <address@hidden> writes:

> > As surrogate pair was not handled well by UTF-16 converter,
> > I've just fixed it too (not yet installed, I'm now adding
> > comments in a code).  Untranslatable characters are decoded
> > into UTF-8 form represented by the sequence of
> > eight-bit-graphic/control characters (the same way as UTF-8
> > decoding, thus we can use utf-8-post-read-conversion).  The
> > UTF-16 encoder encodes such a sequence back to the origianl
> > UTF-16 form.  So, now the UTF-16 support is at the same
> > level as UTF-8.

> Does that mean that some sequences of eight-bit-graphic/control are not
> encoded into the corresponding raw bytes?

No.  But, that's only the case that we encode a modified
text (i.e. eight-bit-graphic/control chars are
added/modified after we decoded a source).

> If so, that makes me a bit uneasy, since those special chars were
> introduced specifically to handle things like binary input or
> bad-byte-sequences and make sure that we at least preserve the raw bytes in
> those cases.

As far as we encode a non-modified text that is generated by
decoding a source, we can preserve the byte sequence even if
the original source contains bad-byte-sequence (for the case
of UTF-8, I found a case that doesn't work as expected and
fixed).

---
Ken'ichi HANDA
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Kenichi Handa, 2004/06/07
- Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Miles Bader, 2004/06/07
  - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Kenichi Handa, 2004/06/07
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Dave Love, 2004/06/08
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Kenichi Handa, 2004/06/09
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Stefan Monnier, 2004/06/09
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Kenichi Handa <=
- Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Dave Love, 2004/06/08
  - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Kenichi Handa, 2004/06/09
- Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Kenichi Handa, 2004/06/11
  - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Juanma Barranquero, 2004/06/12
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Kenichi Handa, 2004/06/13
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Juanma Barranquero, 2004/06/13
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Andreas Schwab, 2004/06/13
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Kenichi Handa, 2004/06/13
    - Re: Possible UTF-8 CJK Regressions in Terminal Emulators, Luc Teirlinck, 2004/06/13

Prev by Date: Re: bug in comment-region
Next by Date: Re: Implement new symbol-start and symbol-end regexp operators
Previous by thread: Re: Possible UTF-8 CJK Regressions in Terminal Emulators
Next by thread: Re: Possible UTF-8 CJK Regressions in Terminal Emulators
Index(es):
- Date
- Thread