guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using libunistring for string comparisons et al


From: Mark H Weaver
Subject: Re: Using libunistring for string comparisons et al
Date: Sat, 19 Mar 2011 10:06:51 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux)

Andy Wingo <address@hidden> writes:
>> Ludovic, Andy and I discussed this on IRC, and came to the conclusion
>> that UTF-8 should be the encoding assumed by functions such as
>> scm_c_define, scm_c_define_gsubr, scm_c_define_gsubr_with_generic,
>> scm_c_export, scm_c_define_module, scm_c_resolve_module,
>> scm_c_use_module, etc.
>
> Can we step back a little and revisit this decision?
>
> Clearly, we need to specify the encoding for these procedures, and have
> it not be locale encoding.  However I don't think we would be breaking
> anyone's code if we simply restricted it to 7-bit ASCII.
>
> I am quite sensitive to the "justice" argument -- that we not restrict
> the names our users give to Scheme identifiers, or the characters they
> use in their strings.  But these values typically come from literals in
> C source code, which has no portable superset of ASCII.

Not everyone writes portable code.  Who here limits their code to the
R6RS and avoids all Guile-specific features?  Portability may be
something to strive for, but when compelling reasons dictate otherwise,
it's not unreasonable to limit your portability to better compilers like
gcc.

For those who don't speak English but wish to hack with Guile, being
able to write code in their own language is a compelling reason.

Anyway, one can only hope that some future C standard supports unicode,
but if the folks who control those standards don't give a damn about
non-english speakers, that doesn't mean we should follow their example.

> Furthermore, such a default would not restrict our users at all -- they
> can always use the non-_c_ variants with a symbol explicitly constructed
> with (e.g.) scm_from_utf8_symbol.

We have those convenience functions for a reason.  You recently proposed
several more convenience functions, so apparently you prefer to save
keystrokes like the rest of us.  I'm sure our non-english-speaking
comrades feel the same way.

Let me ask you this: why would you oppose changing the scm_c_ functions
to use UTF-8 by default?  If you're comfortable with ASCII-only names,
then UTF-8 will work fine for you, since ASCII strings are unchanged in
UTF-8.

> Finally, users are moving away from these functions anyway.  The thing
> to do now is to write Scheme, not C: and in Scheme we do the Right
> Thing.

If you write all your code in Scheme now, then you should care even less
about the scm_c_ functions.  So why oppose what you recently agreed to?


As a meta-comment: I've grown rather weary from fighting this battle
alone.  My hacking has completely stopped because of this argument.  To
those of you out there who care about this issue, please let your voices
be heard.  I know you're there, because a few of you stated your
opinions rather strongly on IRC.  If others don't join in soon, I'm
likely to soon give up on this, and be left with rather less enthusiasm
for Guile than when I started, I'm sorry to say.

      Mark



reply via email to

[Prev in Thread] Current Thread [Next in Thread]