[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: decode-coding-string on invalid UTF-8 string isn't rejected
From: |
Kenichi Handa |
Subject: |
Re: decode-coding-string on invalid UTF-8 string isn't rejected |
Date: |
Wed, 12 Mar 2003 09:51:19 +0900 (JST) |
In article <address@hidden>, Simon Josefsson <address@hidden> writes:
> I'm trying to use decode-coding-string to "guess" charsets, and
> noticed this:
> (decode-coding-string "r\xe4k" 'latin-1)
> => "räk"
> (decode-coding-string "r\xe4k" 'utf-8)
> => "r"
> Wouldn't it be more appropriate if it returned nil (like
> `decode-char') or "rk"?
I've just fixed it to return "r\xe4k", i.e., invalid 8-bit
bytes are decoded into eight-bit-control or
eight-bit-graphic characters as the other cases. Please
try the latest CVS HEAD.
> Perhaps I'm looking in the wrong place though. Is there a elisp
> function that takes a unibyte string and decodes it using whatever the
> default (process) coding system priorities may be? I.e., for me that
> runs in a UTF-8 locale, first try decoding as utf-8, if it fails,
> continue with Latin-1, etc.
(decode-coding-string UNIBYTE_STRING 'undecided) should work
as your purpose.
---
Ken'ichi HANDA
address@hidden