bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31138: Native json slower than json.el


From: Dmitry Gutov
Subject: bug#31138: Native json slower than json.el
Date: Wed, 24 Apr 2019 18:55:45 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 23.04.2019 17:58, Eli Zaretskii wrote:

Doing that for buffer text as well might be helpful.

In what use cases would this be helpful?  Most cases of decoding text
in a buffer happen when we read text from files, where we already have
an internal optimization for plain ASCII files.

HTTP request response buffers? They can be as large as several megabytes, as we have established in this discussion.

We could perhaps try
a similar optimization for UTF-8 instead of just ASCII.

I think so.

Use cases where we read without decoding and then decode buffer
contents "by hand" are relatively rare, certainly when the stuff to
decode is so large that the performance gains will be tangible.

Maybe be. We could try and benchmark, though.

So that's why I mentioned decode-coding-string (though
code_convert_string would be a better choice; or decode_coding_object?),
as opposed to creating a new specialized function.

code_convert_string also handles encoding, though.

That's just one more comparison. We could also do that in decode_coding_object instead, but I'm not sure about the overhead of the intervening code.

What I can understand from our testing, this kind of change improves
performance for all kinds of strings when the source encoding is
utf_8_unix. Even for large ones (despite you expecting otherwise).

I tested 10K strings, and the advantage there already becomes
relatively small.  10K characters may be a lot for strings, but it
isn't for buffers.

That's probably true. I have tried a similar shortcut, removing the code_convert_string call in json_encode (which is called once from json-parse-string), and that did not measurably affect its performance.

But it did increase the performance of json-serialize, by more than 2x, on the same test data I've been using.

Like 3.8s to 1.6s for 10 iterations.

Can we do something like that? Removing conversion altogether is probably not an option, but even using Fstring_as_unibyte instead lead to a significant improvement (2,43s with this approach).

diff --git a/src/json.c b/src/json.c
index 7d6d531427..01682473ca 100644
--- a/src/json.c
+++ b/src/json.c
@@ -266,7 +266,7 @@ json_encode (Lisp_Object string)
 {
   /* FIXME: Raise an error if STRING is not a scalar value
      sequence.  */
-  return code_convert_string (string, Qutf_8_unix, Qt, true, true, true);
+  return string;
 }

 static AVOID





reply via email to

[Prev in Thread] Current Thread [Next in Thread]