[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite
From: |
Linas Vepstas |
Subject: |
bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string |
Date: |
Mon, 9 Jan 2017 21:34:36 -0600 |
This short C program illustrates the issue. The locale, the output port etc.
are UTF-8. The bad results are no surprise: the code currently in git for
scm_puts etc. explicitly ignores the locale setting, always, and always
assumes latin1 -- its hard-coded in there.
--linas
#include <libguile.h>
void *wrap_eval(void* p)
{
char *wtf = "(setlocale LC_ALL \"\")";
SCM eval_str = scm_from_utf8_string(wtf);
scm_eval_string(eval_str);
return NULL;
}
void *wrap_puts(void* p)
{
char *wtf = p;
SCM port = scm_current_output_port ();
scm_puts("the port-encoding is=", port);
scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port);
scm_puts("\nThe string to display is =", port);
scm_puts (wtf, port);
scm_puts("\nWas expecting to see this=", port);
SCM str = scm_from_utf8_string(wtf);
scm_display(str, port);
scm_puts("\n\n", port);
return NULL;
}
int main(int argc, char* argv[])
{
scm_with_guile(wrap_eval, 0x0);
char * wtf = "Ćićolina";
scm_with_guile(wrap_puts, wtf);
wtf = "Thủ Dầu Một";
scm_with_guile(wrap_puts, wtf);
wtf = "Småland";
scm_with_guile(wrap_puts, wtf);
wtf = "Hòa Phú Phú Tân";
scm_with_guile(wrap_puts, wtf);
wtf = "係 拉 丁 字 母";
scm_with_guile(wrap_puts, wtf);
}
The output is always this:
the port-encoding is=UTF-8
The string to display is =Ćićolina
Was expecting to see this=Ćićolina
the port-encoding is=UTF-8
The string to display is =Thủ Dầu Một
Was expecting to see this=Thủ Dầu Một
the port-encoding is=UTF-8
The string to display is =Småland
Was expecting to see this=Småland
the port-encoding is=UTF-8
The string to display is =Hòa Phú Phú Tân
Was expecting to see this=Hòa Phú Phú Tân
the port-encoding is=UTF-8
Was expecting to see this=係 拉 丁 字 母 æ¯
What's cool is that all this stuff works in email!
--linas
On Mon, Jan 9, 2017 at 4:03 PM, Andy Wingo <address@hidden> wrote:
> On Sun 08 Jan 2017 19:16, Linas Vepstas <address@hidden> writes:
>
>> There appears to be a regression in guile-2.2 with utf8 handling
>> in the scm_puts() scm_lfwrite() and scm_c_put_string() functions.
>>
>> In guile-2.0, one could give these utf8-encoded strings, and these
>> would display just fine. In 2.2 they get mangled.
>
> Could it be this from NEWS:
>
> ** Better locale support in Guile scripts
>
> When Guile is invoked directly, either from the command line or via a
> hash-bang line (e.g. "#!/usr/bin/guile"), it now installs the current
> locale via a call to `(setlocale LC_ALL "")'. For users with a unicode
> locale, this makes all ports unicode-capable by default, without the
> need to call `setlocale' in your program. This behavior may be
> controlled via the GUILE_INSTALL_LOCALE environment variable; see the
> manual for more.