Re: I created a faster JSON parser

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I created a faster JSON parser

From:	Herman , Géza
Subject:	Re: I created a faster JSON parser
Date:	Sat, 09 Mar 2024 12:08:54 +0100


Eli Zaretskii <eliz@gnu.org> writes:

From: Herman, Géza <geza.herman@gmail.com>
Cc: Herman Géza <geza.herman@gmail.com>,
 emacs-devel@gnu.org
Date: Fri, 08 Mar 2024 21:22:13 +0100
Yes, it seems that EMACS_UINT is good for my purpose, thanksfor
the suggestion.
Are you sure you need the unsigned variety? If EMACS_INT fitsthebill, then it is a better candidate, since unsigned arithmeticshas
its quirks.

Yes, I think it's better to use unsigned: read the sign, and thenparse the number as unsigned, and then apply the sign at theend. If the number is parsed with its sign, it needs an additionalstep at each character (the sign needs to be applied to eachdigit).

Also, I see that json-parse-string calls some utf8 encodingrelated
function before parsing, but json-parse-buffer doesn't (and it
doesn't do anything encoding related thing in the callback, itjust
calls memcpy).
This is a part I was never happy about. But, as I say above, wecan
get to handling these rare cases later.

I think this is an additional benefit of my parser: this featurecan be added to it more easily than into jansson.Even, I'm tempted to say that we could just remove utf-8 checkingfrom my code, and then Emacs's encoding method should work rightout of the box.

Or, to say that utf-8 handling should stay as is. Because as faras I understand, if the JSON contains an invalid utf-8 sequencewhich is not invalid according to Emacs's characterrepresentation, then this problem won't be detected. So checkingfor utf-8 encoding errors shouldn't be the job of the json parser,but around IO handling, which has the chance to know that the JSONstream itself must only contain a valid utf-8 encoding.

Or, as the JSON specification explcitly says that the allowedcharacter range is 0x20 .. 0x10ffff, the current solution is fine,because it is actually against JSON rules to allow anything elseoutside of this range.

Once again, we can extend the parser for codepoints outside ofthe
Unicode range later.  For now, it's okay to reject them with a
suitable error.

OK, cool, I added Qjson_utf8_decode_error to indicate decodingerrors.

How can we proceed further? This is the current state of thepatch:https://github.com/geza-herman/emacs/commit/ce5d990776a1ccdfd0b6d9c4d5e5e5df55245672.patch

I think I did everything that was asked for, except Po Lu'sparenthesis-related comment, because I still don't know what toparenthesize and what not to. I saw a lot of "a + x * y" kind ofexpressions in emacs codebase without any parenthesis. Are theexact rules documented somewhere?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: I created a faster JSON parser, (continued)
- Re: I created a faster JSON parser, Po Lu, 2024/03/08
  - Re: I created a faster JSON parser, Herman , Géza, 2024/03/08
    - Re: I created a faster JSON parser, Po Lu, 2024/03/08
- Re: I created a faster JSON parser, Christopher Wellons, 2024/03/10
  - Re: I created a faster JSON parser, Eli Zaretskii, 2024/03/10

Prev by Date: Re: Work On Todo item: Convert defvar foo-mode-map to defvar-keymap
Next by Date: Re: I created a faster JSON parser
Previous by thread: Re: I created a faster JSON parser
Next by thread: Re: I created a faster JSON parser
Index(es):
- Date
- Thread