emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CSV parsing and other issues (Re: LC_NUMERIC)


From: Eli Zaretskii
Subject: Re: CSV parsing and other issues (Re: LC_NUMERIC)
Date: Tue, 08 Jun 2021 21:52:59 +0300

> Cc: emacs-devel@gnu.org
> From: Maxim Nikulin <manikulin@gmail.com>
> Date: Tue, 8 Jun 2021 23:35:51 +0700
> 
> On 08/06/2021 09:35, Eli Zaretskii wrote:
>  > From: Boruch Baum
>  >> No? If an Emacs user has two buffers in two separate languages, the
>  >> buffer-local settings aren't / won't be respected?
>  >
>  > First, language is different from locale.  And second, we don't even
>  > have a buffer-local notion of language yet.
> 
> Certainly locale is more precise than just language since it includes 
> region and other variants, moreover it can be granularly tuned (date, 
> numbers, sorting can be adjusted independently), but I still think that 
> all these properties can be sometimes broadly referred to as language.

No, they cannot, not in general.  A locale comes with a whole database
of different settings: language, encoding (a.k.a. "codeset"), formats
of date and time, names of days of the week and of the months, rules
for collation and capitalization, etc. etc.  You can easily find
several locales whose language is English, but some/many/all of the
other locale-dependent settings are different.  It isn't a coincidence
that a locale's name includes more than just the language part.

> Low level functions can accept explicit locale.

Which ones?  Most libc routines don't, they use the locale as a global
identifier.  And many libc's (with the prominent exception of glibc)
don't support efficient change of a locale in the middle of a program,
they assume that the program's locale is set once at program startup.

> Higher level API can obtain it implicitly from 
> buffer-local variables and global locale. For example the LOCALE 
> argument of `string-collate-lessp' is optional one. I can even 
> anticipate that locale may be stored in text properties some times. A 
> random message from recent "About multilingual documents" thread at 
> emacs-orgmode mail list:
> https://lists.gnu.org/archive/html/emacs-orgmode/2021-05/msg00252.html

That's mostly about input methods and org-export, I don't see how it's
relevant to what Boruch asked.

> At first basic functionality may be implemented. The problem is to 
> choose extensible API.

No, the problem is to have a design that would allow an efficient
implementation.  Given what the underlying libc does, it isn't easy.

And then we have conceptual problems.  For example, in a multilingual
editor such as Emacs, the notion of a "buffer language" not always
makes sense, you'd need to support portions of text that have
different language properties.  Imagine switching locales as Emacs
processes adjacent stretches of text and other complications.  For
example, changing letter-case for a stretch or Turkish text is
supposed to be different from the English or German text.  I'm all
ears for ideas how to design such "language support".  It definitely
isn't easy, so if you have ideas, please voice them!

> I just have realized that nl_langinfo(3) (and nl_langinfo_l(3) as well) 
> from libc accepts RADIXCHAR (decimal dot) and THOUSEP (group separator) 
> arguments. They are good candidates for `locale-info' extension.

We already use nl_langinfo in locale-info, so what exactly are you
suggesting here? adding more items?  You don't really expect Lisp
programs to format numbers such as 123,456 by hand after learning from
locale-info that the thousands separator is a comma, do you?

> Actually Qt links my example with other libraries from ICU. My point was 
> that since Emacs anyway (indirectly) links with this library, the 
> dependency may be not so heavy.

If you are suggesting that we introduce ICU as a dependency, we could
discuss the pros and cons.  It isn't a simple decision, because ICU
comes with a lot of baggage that we already have implemented in Emacs,
so whether we throw away what we have and use ICU instead, or just add
what we miss without depending on ICU, requires good thought and good
acquaintance with the ICU internals (to make sure it does what we want
in Emacs, and doesn't break existing features).

> My personal requirements for number 
> formatting were quite modest so far, I expect that other languages (CJK, 
> right-to-left scripts, etc.) may require quite special treatment, so 
> implementation in Emacs (and further maintenance) may require a lot of 
> work. At least API of ICU should be studied to get some inspiration what 
> features will be necessary for users from other regions.

I don't think the problem is the API.

> E.g. I was completely unaware that negative sign may be represented by 
> parenthesis

Really? it's standard in financial applications.

> I expect enough surprises and unexpected "discoveries" during 
> implementation of better locale support. That is why I would consider 
> adapting some more or less established API for this purpose.

I don't think "consider" cuts it.  We have already a lot of stuff in
Emacs; what we don't have needs serious design and comparison of
available implementation options.  Emacs's needs are quite special and
unlike those of most other programs.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]