Re: [Chicken-users] UTF-8 support in eggs

chicken-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] UTF-8 support in eggs

From:	Oleg Kolosov
Subject:	Re: [Chicken-users] UTF-8 support in eggs
Date:	Fri, 11 Jul 2014 02:20:16 +0400
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

On 07/09/14 09:00, Alex Shinn wrote:
> However, I don't think that's the real problem.  The issue as I
> understand is that although Chicken has both strings and
> bytevectors in the core, historically and for continued simplicity
> strings are abused as bytevectors in many cases. ...

And this is a pity. But, it looks like there are some movement starting
to clean things up in the core. Perhaps this issue could be fixed in the
meantime.

> The clean way to handle this is to duplicate the useful string
> APIs for bytevectors.  This could be done without code duplication
> with the use of functors, though compiler assistance may be
> needed for efficiency (e.g. for inlined procedures).  Even without
> code duplication there would be an increase in the core library
> size, though we could probably move most utilities to external
> libraries (how often do you need regexps that operate on binary
> data?).

Considering Chibi Scheme size numbers from your other mail, I hardly
call this a huge price for the benefit received. Even for my specific
embedded use cases.

> If we could (through functors or in a pinch duplication) bring
> the bytevector API up to speed with strings, then the next
> step is to identify all such abusers of strings and move them
> to bytevectors.

Looks like a good plan to me.

> The bigger issue from the performance perspective is existing
> idioms that use indexes, which can degrade to quadratic behavior
> in the worst case no matter how much you optimize (without hacks
> that slow down normal usage).  So people would have to learn to
> take substrings where appropriate to avoid the start/end parameters
> to all SRFI 13 functions, or we would need to deprecate SRFI 13
> in favor of a cursor-oriented API (planned for R7RS).

Do you have some examples on how to avoid performance degradation and
not use string indexes? I've looked briefly into the Chibi string module
source, is this the way to go? How about more complex formatting like
outputting numbers with padding? I guess these should be handled with
something like fmt (or chibi.show). What are performance characteristics
of those? I guess again that this depends on "sufficiently smart
compiler" to inline things, is it hard to implement?

Chicken has inlining support and can generate some "inline files". But,
apart from crashes, I was not able get anything measurable from this.
Perhaps it works better for some specific code patterns.

We are currently using Chicken with utf8 for drawing GUI in immediate
mode with some string trimming and padding. We managed to get decent
performance for the most cases, but were forced to implement searching
and indexing in C. Scheme version, while smaller and simpler, is 2+
times slower. It would be great to use Scheme everywhere.

> So as you see the change is contagious.  We can update the core
> efficiently and easily, but then we have to fix the string abusers,
> and then we have to replace existing index-oriented APIs.

I personally think it is worth it.

-- 
Regards, Oleg

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Chicken-users] UTF-8 support in eggs, (continued)

Prev by Date: Re: [Chicken-users] UTF-8 support in eggs
Next by Date: Re: [Chicken-users] UTF-8 support in eggs
Previous by thread: Re: [Chicken-users] UTF-8 support in eggs
Next by thread: Re: [Chicken-users] UTF-8 support in eggs
Index(es):
- Date
- Thread