gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] utf8 and emacs text/string multibyte representation


From: Matt Kaufmann
Subject: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Date: Sat, 1 Nov 2014 10:18:27 -0500

I saw your question and was curious, so I looked into it a bit:

>> To your knowledge, is there any objection to defining alpha-char-p as
>> including code-char's >= 128?

I see that SBCL 1.2.2 is OK with that, for example:

* (code-char 232)

#\LATIN_SMALL_LETTER_E_WITH_GRAVE
* (alpha-char-p (code-char 232))

T
* 

In fact, that alpha-char-p call also returns T in (versions of)
Allegro CL, CCL, CLISP, CMU CL, LispWorks, and SBCL.

Next, I checked the CL HyperSpec

http://www.lispworks.com/documentation/HyperSpec/Body/f_alpha_.htm#alpha-char-p

and found this for alpha-char-p:

  Returns true if character is an alphabetic[1] character; otherwise,
  returns false.

I followed the link to "alphabetic"

http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_a.htm#alphabetic

and found this as the first definition, which seems to justify the
above return value of T.

  adj. (of a character) being one of the standard characters A through
  Z or a through z, or being any implementation-defined character that
  has case, or being some other graphic character defined by the
  implementation to be alphabetic[1].

[By the way, ACL2 has this wrong!  So I'm glad you asked.  I'll fix
that....]

-- Matt
   From: Camm Maguire <address@hidden>
   Date: Sat, 01 Nov 2014 10:50:48 -0400
   Cc: Raymond Toy <address@hidden>, address@hidden

   Greetings!

   Carl Shapiro <address@hidden> writes:

   > On Fri, Oct 31, 2014 at 11:20 AM, Camm Maguire <address@hidden> wrote:
   >
   >     It really appears that unicode refers more to a glyph than anything
   >     else.  If we follow your suggestions, and leave characters 8-bit, aref
   >     random O(1) access, is there any utility to providing unicode functions
   >     #'glyph-length or some such in a common lisp implementation?
   >
   > Yes, a Common Lisp character is a UTF-8 code unit.  As such, (length "א") 
would return 2 in GCL whereas it returns 1 in CMUCL.
   >
   > For iterating across strings in ways other than by UTF-8 code unit, you 
will want to provide an iterators for iterating by code point, by glyph,
   > and so forth.
   >
   > In theory, something like CL-UNICODE would provide that but I think its 
really lacking in a number of important ways.  GCL being what it is, you
   > could link against ICU and use their functions to start with.
   >

   Thanks so much for these tips.  They certainly seem to illuminate the
   path forward.  Can't see how we could do better than icu.

   To your knowledge, is there any objection to defining alpha-char-p as
   including code-char's >= 128?

   Take care,
   -- 
   Camm Maguire                                     address@hidden
   ==========================================================================
   "The earth is but one country, and mankind its citizens."  --  Baha'u'llah

   _______________________________________________
   Gcl-devel mailing list
   address@hidden
   https://lists.gnu.org/mailman/listinfo/gcl-devel





reply via email to

[Prev in Thread] Current Thread [Next in Thread]