emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: a few MULE criticisms, cemacs, & current emacs segfaults by changes


From: Stephen J. Turnbull
Subject: Re: a few MULE criticisms, cemacs, & current emacs segfaults by changes in GNU ld.
Date: Thu, 15 May 2003 16:03:51 +0900
User-agent: Gnus/5.1001 (Gnus v5.10.1) XEmacs/21.5 (carrot, linux)

>>>>> "Hin-Tak" == Hin-Tak Leung <address@hidden> writes:

    Hin-Tak> I am writing this in the hope that MULE will satisfy my
    Hin-Tak> editing needs one day.

Mule documentation has historically been pretty minimal outside of
Japanese usage.  The features you need are probably there, or needing
only a bit of Lisp wrapper.  They can be hard to find if you don't
read the code.

    Hin-Tak> I am native Chinese, and can also do a small amount of
    Hin-Tak> Japanese, so my experience is probably quite
    Hin-Tak> representative.

Japanese is my daily language (although English is my mother tongue),
and I can do a small amount of Korean and Spanish.  My experience has
been the opposite: I really don't want to go anywhere without Mule.

    Hin-Tak> (1) Associations: the ability to let the user choose the
    Hin-Tak> next possible associated characters. In English, "Search"
    Hin-Tak> is often followed by "engine" "for" or "through". In
    Hin-Tak> Chinese, When I type "Leung" (not in common sentence
    Hin-Tak> vocabulary, but it is a common surname), it is almost
    Hin-Tak> certain that I would follow with the rest of my name.

Similar facilities, which not only allow you to define associations,
but will automatically learn them as you type, have been available in
Mule (at least for Japanese) for ten years or so, with third-party
input methods such as Wnn and Canna.  Wnn also supports Chinese.

    Hin-Tak> This would most certainly require extending MULE with the
    Hin-Tak> ability of loading distionaries of commonly used phrases
    Hin-Tak> in various languages.

I think it is preferable to delegate this to the system input methods,
which already have such dictionaries and autolearning.

A feature which you imply, and is (to my great annoyance, as it makes
a lot of bad choices) implemented in Mac kotoeri, is auto-completion.
That is, input that you did type is completed with words that
typically follow it, but you have not yet typed.  This could be done,
I think, but (at least for Japanese romakana input) it seems hard to
do it conveniently.  YMMV, and if you want that feature, to my
knowledge it is unimplemented in Mule input methods now available.
(Some commercial methods are accessible via X Input Methods.)

    Hin-Tak> (2) Hints: quite similiar to (1), e.g. sometimes I can't
    Hin-Tak> quite remember the code for "Leung", but vaguely know it
    Hin-Tak> is "e*f" in ChangJie. (it is actually 'eif'). On just
    Hin-Tak> about any other systems (MacOS's CJK extensions, etc),
    Hin-Tak> the full list is displayed and it narrows down as the
    Hin-Tak> user types, so the user can select the correct one
    Hin-Tak> visually if he can't remember the exact code. On Cxterm,
    Hin-Tak> one can do 'e?f' or 'e??f' to obtain a list of matches.

Hints should be easy to do.  "Narrowing the list as the user types"
would be harder, because it has to take account of the case where
there is no GUI available.  Even with GUI, I find the (often long)
tables are normally annoying and obtrusive.  I'd really prefer that
they were available on request.  More complexity ....  It might
require a special data structure to be efficient, too.  (Otherwise
you'd end up with something like repeatedly matching a regexp against
the candidate strings.)  But it should be doable.

    Hin-Tak> (4) The inability to process part of a file in one
    Hin-Tak> encoding and save it as a binary stream: This might be
    Hin-Tak> possible in MULE, but I can't work out how

Use `encode-coding-region' and `decode-coding-region'.  They are not
currently defined as interactive commands.  This has always annoyed
me, too, but not enough to change it.  (The facility would need a
bunch of wrapping to make it user-friendly, simply adding an
interactive declaration isn't good enough.)

    Hin-Tak> - MULE seems to insist that I save or convert documents
    Hin-Tak> into its internal representation.

Saving in internal representation is severely discouraged; in the Mule
implementation I use it's not possible without using a special
debugging build.  Using the internal representation in the working
buffer is the only way that the file can be displayed as Chinese or
Japanese while editing.  Isn't it sufficient as long as (a) you can
read it as ordinary glyphs and (b) the file is saved in the format you
want?

    Hin-Tak> e.g. I have a file, part of which is in GB2312, part in
    Hin-Tak> JIS-euc, and part BIG5, separated by clear ASCII markers.

My, you do like to make life interesting for yourself!

    Hin-Tak> I would like to edit the different parts individually,
    Hin-Tak> and without breaking the others, and without converting
    Hin-Tak> to MULE's internal format or a common one like UTF-8. I
    Hin-Tak> don't think it is difficult to implement, but it is more
    Hin-Tak> like the MULE developers think they know my needs better
    Hin-Tak> than I do, and insist that I do things their way.

It would be very easy to implement for a single file.  It's extremely
difficult to do generally.  However, the Mule developers have already
done a comprehensive and robust implementation of exactly this
functionality, by implementing ISO 2022.  This is incompatible with
Big5, of course, but that was a deliberate design decision by the Big5
implementers.  Big5, like Shift JIS, is not interoperable with
anything else.  Mule developers can hardly be expected to spend effort
on trying to make deliberately unfriendly systems cooperate!

ISTR there is a facility for reading and writing files containing Big5
mixed with incompatible coding systems, but those files are only
readable by Mule because they use ISO 2022 private character sets to
represent the Big5 characters.  (Ie, Mule knows they are Big5, and if
you cut them out and save them separately, they can be saved as Big5.
But in the mixed format, ordinary Big5 applications will not see "a
bunch of garbage with sensible Big5 mixed in", they'll see "all
garbage".)

I think the bottom line is that most of the features you want are
fairly easily supported in Mule.  Some (an output format containing
portions in "true" Big5) you'd have to pay handsomely to get a self-
respecting programmer to implement.[1]  The rest are mostly already
available in Japanese, because they've been contributed by the
Japanese community (including the core Mule developers, of course).  I
would guess that the lack in Chinese is due to lack of effective
interest from users, ie, contribution of code, implementation
suggestions, and data (eg, the conversion dictionaries).  The Mule
developers have done well by Chinese as far as I can tell: two
separate built-in coding systems (GB2312 and Big5) with provision for
others (there are built-in charsets for CNS 11643), a dozen or more
input methods, etc.  But to refine those "raw materials" requires
input from users.


Footnotes: 
[1]  Ie, I sympathize with your needs, but I know that if I
implemented it, you'd be back to ask me for support on a regular
basis, because it's an inherently unstable concept.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]