emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug 130397


From: Stefan Monnier
Subject: Re: Bug 130397
Date: Thu, 06 Jan 2005 12:33:11 -0500
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/21.3.50 (gnu/linux)

> Remember that the internationalization of ispell was done long before the
> MULE code was added to emacs.

Actually, it's this understanding that leads me to think that 
CASECHARS, NOT-CASECHARS, OTHERCHARS, MANY-OTHERCHARS-P,
EXTENDED-CHARACER-MODE, and CHARACTER-SET, should be used after encoding
the word.

Before MULE, Emacs only worked with single-byte coding systems (things like
latin-1, but not iso-2022 or utf-8) and the exact same coding-system was
used by ispell, so ispell.el's CASECHARS, NOT-CASECHARS, OTHERCHARS,
MANY-OTHERCHARS-P, EXTENDED-CHARACER-MODE, and CHARACTER-SET applied to
*encoded* text (i.e. text in latin-1 encoding, not in the internal encoding
used in Emacs MULE).

So it would seem to make sense (in order to simulate the pre-MULE behavior),
to first encode the text (into latin-1 or somesuch
singlebyte coding system) and then use CASECHARS, NOT-CASECHARS, OTHERCHARS,
MANY-OTHERCHARS-P, EXTENDED-CHARACER-MODE, and CHARACTER-SET.

Now encoding the whole text can't be realistically done, so we need to first
recognize words, then encode them, then use those vars.
I.e. the word-recogniztion code shouldn't use CASECHARS, NOT-CASECHARS,
OTHERCHARS, MANY-OTHERCHARS-P, EXTENDED-CHARACER-MODE, and CHARACTER-SET.

> For instance, one of the major issues when MULE was implemented was the
> fact that multiple bytes passed to ispell may only count as a single
> byte or character on the display.

How/when can that happen?  Can you give an example?


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]