bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coding-system perfectionism locks user out


From: Lee Sau Dan
Subject: Re: coding-system perfectionism locks user out
Date: 04 Feb 2002 14:24:27 +0100
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7

>>>>> "Dan" == Dan Jacobson <jidanni@deadspam.com> writes:

    Dan> I do 
    Dan> $ lynx -dump 
http://www.geocities.com/Tokyo/Pagoda/3847/sapienti/hagfa99b.htm
    Dan> > hagfa99b.txt 
    Dan> $ emacs -q hagfa99b.txt I go into options>mule
    Dan> and the choice to set coding system is blanked out... what's
    Dan> worse, its keystroke isn't even mentioned in the menu.

What?  I  tried what you do.  No  problem, except that I  have to tell
Emacs that  this file is in BIG5  encoding.  You can do  that with C-x
RET  c chinese-big5  RET  C-x C-f  hagfa99b.txt  RET.  I  see what  is
expected in traditional Chinese characters.


    Dan> Anyway, at this point the user would just see his data
    Dan> garbled, with no pointers on what to do next.

Computers are not very clever.   They can't reliably tell English from
French.



    Dan> [I then did M-! cat hagfa99b.txt and can then at least see
    Dan> what I'm supposed to see [big5 chinese], but then don't
    Dan> expect that I can save and then see the file again
    Dan> correctly.]

Something  wrong  with  the  coding  systems setting.   Have  you  M-x
set-language-environment?   Apparently, your  process-coding-system is
correct,    because    the    M-!     output    is    decoded    using
process-coding-system's value.

For reading  a file, Emacs  could only make  a guess about  the coding
system.  Since the file you gave started with a short section of ASCII
only text,  and over half  of the file  contents are ASCII  only, what
would you  expect the smartest  multi-lingual editor to  do?  Remember
what Knuth says: Computers are good at following instructions, but not
reading  your mind.  You're  more intelligent  that the  computer, and
hence you know it's Big5-encoded.  The computer is stupid and fails to
discover this.  So, it needs your help: C-x RET c chinese-big5 RET.



    Dan> This brings up the point: if the file is 99% big5, then why
    Dan> not allow me to still handle it as 100% big5 if I
    Dan> want...

But the  file you gave  was not 99%  big5.  It's less than  50%.  Over
half of it is ASCII.  (Well... yes, ASCII is a subset of big5, but I'm
talking about  big5-only characters here.)   I think Emacs  is correct
here not to conclude  that the file is in big5.  It  could be in other
encodings as well.

In your file, most of the lines look lie:

        yong1 央  yong1 氧  yong1 養  yong1 癢  yong1 盎  

in  which 25  bytes are  ASCII and  10 bytes  are  BIG5-specific.  I'm
ignoring whitespaces here.  10 out  of (10+25) is just 28.6%.  This is
a typical line from your file.  So, the actual figure should be around
this, and  a rough upper  bound is, IMO,  30%.  That's still  far from
50%.  How come you claim it's 99% big5?  "big5-specific", I mean.


    Dan> Why can't emacs be told "I live in big5 land.  

Have you already set-language-environment?


    Dan> Sometimes I
    Dan> have a giant file with one or two chars in it that cause
    Dan> emacs to doubt that it is a big5 file.

If you're sure it is a big5-encode file, use C-x RET c ...


    Dan> but I can't easily
    Dan> because you think you are smarter than me and wont show it to
    Dan> me in big5 mode, no matter what buttons i press".

No, Emacs  thinks its more stupid  than you.  So, instead  of making a
wild guess  that a file  containing only 30%  of big5-only bytes  is a
big5-encoded file,  it behaves conservatively.  And  since Emacs knows
its stupid, it  allows you to override its stupid  decision: C-x RET c
chinese-big5 RET C-x C-f hagfa99b.txt RET.


-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee



reply via email to

[Prev in Thread] Current Thread [Next in Thread]