[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode strings and symbols
From: |
Ludovic Courtès |
Subject: |
Re: Unicode strings and symbols |
Date: |
Mon, 10 Aug 2009 23:27:48 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) |
Hey,
Mike Gran <address@hidden> writes:
[...]
>> > +SCM_API void scm_charprint (scm_t_uint32 c, SCM port);
>>
>> This ought to be internal, no?
>
> Could be. A couple of the types are given their own print functions:
> scm_intprint and an scm_uintprint. Most types don't have their own
> print functions. Are int and uint given special treatment because of
> their radix term?
Dunno. Anyway, they're not really meant to be public either. Feel free
to make them internal as well, while you're at it. ;-)
>> > + (scm_t_wchar) (unsigned char) STRINGBUF_INLINE_CHARS (buf)[i];
>>
>> Is the double cast needed?
>
> Sort of. Unsigned char will successfully be implicitly cast to
> scm_t_wchar, so the scm_t_wchar term is just for clarity. The unsigned
> char term is definitely needed. Negative 8-bit chars are the upper half
> of the 8-bit charset (128 - 255). Casting them directly to scm_t_wchar
> may return 0xFFFFFF80 - 0xFFFFFFFF instead of 128-255. I don't have any
> problem removing the scm_t_wchar cast. Would you prefer that?
How about:
#define STRINGBUF_INLINE_CHARS(buf) \
((unsigned char *) SCM_CELL_OBJECT_LOC ((buf), 1))
and changing the caller to:
for (i = 0; i < len; i++)
mem[i] = (scm_t_wchar) STRINGBUF_INLINE_CHARS (buf)[i];
?
That would make the intent clearer to me.
> I put it in because that information needs to be available in the
> bytecode compiler. A slightly clearer name would probably be
> string-bytes-per-character, I suppose.
Agreed, let's take this name.
>> > +SCM_INTERNAL char *scm_to_stringn (SCM str, size_t *lenp,
>> > + const char *encoding,
>> > + enum iconv_ilseq_handler handler);
>>
>> I suppose this would eventually become public. What do you think?
>> Should we use a different type for HANDLER before that happens?
>
> The simplest thing would be to make some constants like
>
> scm_c_define ("STRING_ESCAPE", scm_from_int(iconveh_escape_sequence))
>
> Something similar is done in the scm_seek function's constants, such as
> SEEK_CUR.
It's a C API so Scheme-level constants don't matter.
I was wondering whether using `enum iconv_ilseq_handler' in the public
API would be a good idea because that means that public headers include
either the system's or GNU libiconv's <iconv.h> (or some libunistring
header), in which case `guile.pc' must include the right `-I' flag, etc.
This may slightly complicate compilation of Guile apps. Another
downside is that Guile's API would be bound to the values and semantics
of `iconv_ilseq_handler', and bound to iconv.
One possibility to avoid th would be to define our own type similar to
`iconv_ilseq_handler'.
Thanks,
Ludo'.