emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>"


From: Dave Love
Subject: Re: ".*utf\\(-?8\\)\\>" versus ".*[._]utf" versus "address@hidden>"
Date: 03 Feb 2002 18:20:26 +0000
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1.80

>>>>> Paul Eggert writes:

 > Because utf-8 should be the normal case.  In the normal case, the
 > encoding name should be delimited, to prevent incorrect matches
 > when one encoding name is a suffix of another.

I'm surprised false matches are any more likely with utf-8 than 8859
&c.  Is prefixing it with \< good enough?

 > I'm not sure I follow your point, but I'll try to answer.  The code in
 > question is using a heuristic to guess the coding system from the
 > locale name.  

It's actually guessing a complete language environment.

 > All other things being equal, it's better to keep the
 > heuristic simple and easy to explain.  The heuristic I was trying to
 > use is:

 >   Emacs looks at the codeset part of the locale name (e.g. the "UTF-8"
 >   in "address@hidden"), except that there is a special case for
 >   old-fashioned 8859-style locale names like "iso_8859_1".

On checking again, I'm not at all sure the current code DTRT.  For
instance (given that I've defined a Windows-1251 coding system and
language environment):

(set-locale-environment "cs_CZ.windows-1250")
  => nil
current-language-environment
  => "Czech"
(symbol-value (car coding-category-list))
  => iso-8859-2

I think what should happen in this case is that the codeset part of
the locale should override the language part.  At least it should set
the default coding system variables appropriately.  If there's a
corresponding language environment defined, probably its properties
should override those from the base environment without resetting
everything else.  (Obviously that requires implementing something
different from set-language-environment.)  In the trivial case that
only the codeset is given, set up any corresponding language
environment, otherwise use English with preferred coding system and
terminal coding system from the supplied codeset.

The language environment stuff could probably do with a bit of
re-thinking to fit better with locale processing and customization.

[I suspect that the territory (?) part of the locale should be used
for some things like calendar defaults, as well as the language part.
E.g. en_GB should default to UK holidays and British (Oxford) speling
checking.]



reply via email to

[Prev in Thread] Current Thread [Next in Thread]