[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: CSV parsing and other issues (Re: LC_NUMERIC)
From: |
Maxim Nikulin |
Subject: |
Re: CSV parsing and other issues (Re: LC_NUMERIC) |
Date: |
Mon, 14 Jun 2021 23:38:19 +0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 |
On 12/06/2021 01:04, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Fri, 11 Jun 2021 23:58:24 +0700
On 10/06/2021 23:57, Eli Zaretskii wrote:
>> From: Maxim Nikulin Date: Thu, 10 Jun 2021 23:28:59 +0700
>
> For processing CSV, if there's a need to know whether the
> locale uses the comma as a decimal separator, we could
> indeed extend locale-info. But such an extension is almost
> trivial and doesn't even touch on the significant problems
> in the rest of the discussion.
You forgot `setlocale(LC_NUMERIC, "C")', didn't you?
No, I didn't. Adding a call to setlocale to locale-info, even if we
want to add an argument for the caller to control the locale, is
trivial.
I would avoid such manipulations and the reason is not efficiency of
particular implementation. Locale is not thread local, so changing it in
*getter* is a source rare but really obscure hardly reproducible
problems. I do not like such output
1234.567890
1234,567890
1234.567890
of the following program changing locale in a parallel thread
#include <locale.h>
#include <pthread.h>
#include <stdio.h>
#include <time.h>
#define DELAY_NS 40000000
void* other_thread(void *arg) {
struct timespec delay = { 0, DELAY_NS/2 };
nanosleep(&delay, NULL);
printf("%f\n", 1234.56789);
delay.tv_nsec = DELAY_NS;
nanosleep(&delay, NULL);
printf("%f\n", 1234.56789);
nanosleep(&delay, NULL);
printf("%f\n", 1234.56789);
return NULL;
}
int main() {
setlocale(LC_NUMERIC, "C");
pthread_t thread_id;
pthread_create(&thread_id, NULL, &other_thread, NULL);
struct timespec delay = { 0, DELAY_NS };
nanosleep(&delay, NULL);
setlocale(LC_NUMERIC, "");
nanosleep(&delay, NULL);
setlocale(LC_NUMERIC, "C");
void *res;
pthread_join(thread_id, &res);
return 0;
}
Explicit locale objects decoupled from application-wide global
preferences are safer and more flexible.
> Here's a trivial example:
>
> (insert (downcase (buffer-substring POS1 POS2)))
>
> Contrast with
>
> (insert (downcase "FOO"))
Either `set-text-properties' should be called on "FOO" before passing it
to `downcase'
Which property will help here? we don't have such properties. they
need to be designed and implemented.
Let's name it "locale". Its value is some object that represents either
a "solid" locale such as de_DE or combined LC_NUMERIC=en_GB +
LC_TIME=de_DE + default fr_FR. Data required for particular operations
may be loaded on demand.
or `locale-downcase' with LOCALE first argument should be
added.
How would you implement locale-downcase? Are you familiar with how
Emacs case tables work?
No, I am not familiar with Emacs internals dealing with case conversion.
I already wrote I am even unaware how to properly handle Turkish. For
the scripts I am familiar with, it is enough to have default table for
normalizing and conversion. I can admit that sometimes conversion may
depend on language and the language can not be determined from code
point. In such cases I expect additional override table that has higher
priority than the default one.
> And even if we had locale-downcase, which locale would you
> pass to it in any given use case?
I already mentioned responsibility chain: explicit value or set of
overrides passed by user, text property for particular span of
characters, buffer-local variables, global environment variables. Locale
may be instantiated from its name "it_IT". Convenience functions to
obtain locale at point likely will be useful as well. (Actually I am
assuming number parsing-formatting rather than case conversion.)
- Re: CSV parsing and other issues (Re: LC_NUMERIC), (continued)
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Boruch Baum, 2021/06/10
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Eli Zaretskii, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Boruch Baum, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Maxim Nikulin, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Filipp Gunbin, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Eli Zaretskii, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Filipp Gunbin, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Eli Zaretskii, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Maxim Nikulin, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Eli Zaretskii, 2021/06/11
- Re: CSV parsing and other issues (Re: LC_NUMERIC),
Maxim Nikulin <=
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Eli Zaretskii, 2021/06/14
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Maxim Nikulin, 2021/06/16
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Eli Zaretskii, 2021/06/16
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Stefan Monnier, 2021/06/10
- Re: CSV parsing and other issues (Re: LC_NUMERIC), Maxim Nikulin, 2021/06/12