bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31138: Native json slower than json.el


From: Eli Zaretskii
Subject: bug#31138: Native json slower than json.el
Date: Mon, 22 Apr 2019 18:36:13 +0300

> Cc: p.stephani2@gmail.com, sebastien@chapu.is, yyoncho@gmail.com,
>  31138@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 22 Apr 2019 18:02:35 +0300
> 
> > (Let me know if you need help in writing the code for the above 2
> > tests.  I think parse_str_as_multibyte should help a lot.)
> 
> I do.
> 
> At the very least: am I supposed to use parse_str_as_multibyte similarly 
> to how make_string does, or to write a function similar to 
> parse_str_as_multibyte? I can more or less follow its logic, but I don't 
> understand if any of its callees cannot cope with improper input.

Let's start with just ASCII strings, and then consider moving to valid
UTF-8 sequences.  I take it you can easily write a loop that ensures a
string is pure ASCII?

> > I guess we should also have some test case with non-ASCII characters,
> > if we will introduce these optimizations.
> 
> We already do in test/src/json-tests.el, like I previously mentioned. 

No, I meant a test of performance.  if we begin by testing for plain
ASCII strings, then non-ASCII strings will take longer to convert.
The existing tests are too short to support measurement of the effect,
we need a larger JSON object with many non-ASCII strings.

> If we're talking about big changes and increases in complexity, sure, we 
> should weigh them. But if a simple change gives us even a 20-30% 
> improvement, why not take it? The reporter is not the only one who 
> parses JSON in Emacs.

Suit yourself, but I don't like investing hours in code just to hear
"your best is not good enough" from those who triggered the changes to
begin with.

> Speaking of bigger improvements... it seems that with the patch below, 
> and the fact that it passes the existing tests, we have at least 
> established that the contents of the C strings that libjansson returns 
> and our "decoded" strings are very often exactly the same. So most of 
> the time what code_convert_string does is not really conversion, but in 
> effect verification. I'm betting it's a frequent situation in other use 
> cases, too.

Of course.  Verification code almost always comes up empty-handed.  it
doesn't mean we can throw it away.

> So one optimization (more complex to implement, I'm sure) would be to 
> defer creating coding->dst_object inside decode_coding_object until 
> we're sure we need it (the source and destination bytes actually come 
> out different), and if we don't, return src_object in the end (I'm only 
> taking about the case when dst_object is Qt). That might improve 
> performance across the board, including during the encoding step. Or 
> might not, of course. What do you think?

I don't want to make changes that affect decoding everywhere, because
having raw bytes in other cases is a more frequent phenomenon.  Let's
just optimize JSON parsing, OK?

> 
> diff --git a/src/json.c b/src/json.c
> index 928825e034..2b0cc8a313 100644
> --- a/src/json.c
> +++ b/src/json.c
> @@ -225,8 +225,7 @@ json_has_suffix (const char *string, const char *suffix)
>   static Lisp_Object
>   json_make_string (const char *data, ptrdiff_t size)
>   {
> -  return code_convert_string (make_specified_string (data, -1, size, 
> false),
> -                              Qutf_8_unix, Qt, false, true, true);
> +  return make_specified_string (data, -1, size, false);
                                                   ^^^^^
Should be 'true', right?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]