[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: What exactly is chinese-big5?
From: |
Kenichi Handa |
Subject: |
Re: What exactly is chinese-big5? |
Date: |
Fri, 18 Apr 2008 20:28:08 +0900 |
User-agent: |
SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) |
In article <address@hidden>, Eli Zaretskii <address@hidden> writes:
> > In Emacs 22, you can read the written file by utf-8 and
> > search for U+FFFD.
> Is U+FFFD the _only_ character that will be produced for any codepoint
> that is unassigned in the Big5 code space? That is, if I search for
> U+FFFD, will I find _all_ the places where the original file had
> something not belonging to Big5?
No exactly. U+FFFD is the only character that will be
produced for "any character that can't be unified with
Unicode". Which Big5 character can unified with Unicode is
defined in subst-big5.el in Emacs 22 (I don't know which
Big5 version Dave used to make that file) and in
etc/charsets/BIG5.map in Emacs 23. So, if the dialect of
Big5 is different from what defined in those files, there's
a possibility that some character which the file creater
thinks Big5 is encoded into U+FFFD.
> Also, assuming that I find one or more invalid characters, is there
> some encoding other than chinese-big5 that I should try, which could
> explain those problematic characters, besides those I mentioned in my
> original message? This file came from Chinese speaking people, so
> there's little doubt it should include only strings that can be read
> by Chinese speakers. Therefore, I wonder how come it does not
> translate cleanly into Unicode. (I cannot ask the people who produced
> the file about these issues, since they seem to be pretty ignorant
> about that: they claimed the file was in UTF-8...)
That file may be GBK whose code-space is similar to but
wider than Big5. But, it's supported only in Emacs 23.
---
Kenichi Handa
address@hidden