bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31138: Native json slower than json.el


From: Eli Zaretskii
Subject: bug#31138: Native json slower than json.el
Date: Tue, 23 Apr 2019 17:58:35 +0300

> Cc: sebastien@chapu.is, yyoncho@gmail.com, 31138@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 23 Apr 2019 17:22:34 +0300
> 
> On 23.04.2019 15:15, Eli Zaretskii wrote:
> > I thought about this.  It could make sense to have a UTF-8 specific
> > function to encode and decode strings.  With encodings other than
> > UTF-8 it becomes trickier, and probably likewise with buffer text,
> > where we need to take the gap into account.
> 
> Doing that for buffer text as well might be helpful.

In what use cases would this be helpful?  Most cases of decoding text
in a buffer happen when we read text from files, where we already have
an internal optimization for plain ASCII files.  We could perhaps try
a similar optimization for UTF-8 instead of just ASCII.

Use cases where we read without decoding and then decode buffer
contents "by hand" are relatively rare, certainly when the stuff to
decode is so large that the performance gains will be tangible.

> So that's why I mentioned decode-coding-string (though 
> code_convert_string would be a better choice; or decode_coding_object?), 
> as opposed to creating a new specialized function.

code_convert_string also handles encoding, though.

> What I can understand from our testing, this kind of change improves 
> performance for all kinds of strings when the source encoding is 
> utf_8_unix. Even for large ones (despite you expecting otherwise).

I tested 10K strings, and the advantage there already becomes
relatively small.  10K characters may be a lot for strings, but it
isn't for buffers.  The optimization we use as part of decoding
insert-file-contents avoids the problem by inserting the ASCII part
directly and starting to decode from the first non-ASCII character.
With strings and with text already in the buffer this is not currently
possible, or at least not easily.

> Again, the patch, or several, shouldn't be particularly hard to write, 
> and we can try them out with different scenarios.

If someone wants to work on such patches, I'm sure they will be
welcome.  But we should have clear use cases and good test cases to
time them, IMO.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]