[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: about strings, symbols and chars.
From: |
Gary Houston |
Subject: |
Re: about strings, symbols and chars. |
Date: |
29 Nov 2000 18:10:35 -0000 |
> From: Dirk Herrmann <address@hidden>
> Date: Wed, 29 Nov 2000 12:27:23 +0100 (MET)
>
> On Tue, 28 Nov 2000, Dirk Herrmann wrote:
>
> > I'd like to get rid of the SCM_STRING_UCHARS macro and clean up the
> > handling of characters and strings with respect to signedness. In other
> > words, it should be clearly defined what kind of characters are to be
> > found in a scheme string object.
>
> Surprisingly, changing SCM_STRING_CHARS to always return an unsigned char*
> doesn't seem to have any effect at all. Not a single additional compiler
> warning message. Hmmm. Why is that? As far as I know, it is not
> specified whether a char is a signed or an unsigned value. Thus, a char*
> could potentially be a pointer to a signed char or an unsigned
> char. Assigning these pointer types to each other should at least cause a
> compiler warning, shouldn't it?
In C++ the compiler typically refuses to convert between char * and
unsigned char * without a cast. In C it doesn't usually matter what
kind of char * you use. I think the only time it makes any difference
is when converting a char to an int.
If you have:
char a = -1;
unsigned char b = a;
a and b have the same bit pattern, they still represent the same
character.
Two approaches that Guile could take are:
1) For the macros that give pointers to chars in an SCM object: provide
char * and unsigned char * versions. This is probably best for people
who have to deal with C++ and need the right pointer for whatever they
are going to do with the characters. I suppose for completeness there
should be signed char * versions too, but I guess they wouldn't be
used very often.
For functions, do what the C standard library does and use char *.
People linking Guile into other programs are not likely to find this
surprising.
Care is needed when converting char to int.
2) Use unsigned chars throughout. This may reduce char->int problems
but may be painful in C++ since you must cast pointers before you can
use strlen etc. In this case there probably should be a
SCM_STRING_UCHARS macro and no SCM_STRING_CHARS, to avoid confusion.
It seems to me that 1) is more or less what we are doing now, and it isn't
clear whether switching to 2) would be a good thing.