[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] [Maxima-discuss] [Maxima-commits] [git] Maxima CAS branc

From: Raymond Toy
Subject: Re: [Gcl-devel] [Maxima-discuss] [Maxima-commits] [git] Maxima CAS branch, master, updated. branch-5_37-base-91-gd9bf6ff
Date: Sat, 10 Oct 2015 08:13:20 -0700
User-agent: Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b33 (darwin)

>>>>> "Camm" == Camm Maguire <address@hidden> writes:

    Camm> Greetings!
    Camm> Raymond Toy <address@hidden> writes:

    >> I, unfortunately, don't have great hope of seeing gcl with unicode any
    >> time soon because the plan for supporting unicode is really
    >> complicated. [1][2]
    >> --
    >> Ray
    >> [1] UTF-8 strings with 21-bit Lisp character.  I don't know how that's
    >> going to work reliably when you can index at random points in the
    >> string and also insert random characters into a utf-8 code
    >> sequence.
    >> [2] I suggested a really simple utf-16 with 16-bit chars to simplify
    >> the implementation and still cover 99-44/100% of the use cases.
    >> This is way easier to do with very minimal code changes.

    Camm> Perhaps I should weigh in here.  I do have a branch starting utf8
    Camm> unicode character support, but it will have to wait until post 2.6.13.

That's really great news!

    Camm> Emacs takes this strategy, so I know its doable, and the performance 
    Camm> probably a net win as the gc overhead of the larger strings will
    Camm> outweigh the string access times, I'm guessing.  We also had a
    Camm> discussion on gcl-devel that the current approach of defining a
    Camm> character to be a byte, and relying on terminals etc. to do the
    Camm> translation, is legal, although not desirable as a permanent 

    Camm> I can outline the algorithm if there is interest, but essentially a
    Camm> simple one entry cache to cover the vast majority of cases of 
    Camm> access (utf8 can do this backwards as well) together with a log(N)
    Camm> special character counting from the beginning, cache, or end (making
    Camm> use of parallelism in long integers) for random access, appears quite
    Camm> serviceable.  This is not that complicated, and can be source inlined
    Camm> escaping out the most common case of no special bytes, which can be
    Camm> indicated by a flag in the header.

O(log(n)) access on strings certainly breaks people assumptions on
O(1) array access.  Keeping the cache consistent seems error prone,
but I suppose most strings aren't modified at all.  Strings are
probably composed of shorter strings and not modified in-place.

Best of luck! I'm looking forward to this.

    Camm> (BTW, I've also put in open-stream-p for you in 2.6.13pre.)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]