bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70007: [PATCH] native JSON encoder


From: Mattias Engdegård
Subject: bug#70007: [PATCH] native JSON encoder
Date: Wed, 27 Mar 2024 19:57:24 +0100

Eli, thank you for your comments!

27 mars 2024 kl. 18.40 skrev Eli Zaretskii <eliz@gnu.org>:

>> +/* JSON encoding context */
> 
> This is not our comment style.

I'll go through the code and clean up all comments.

>> +typedef struct {
>> +  char *buf;
>> +  ptrdiff_t size;           /* number of bytes in buf */
>> +  ptrdiff_t capacity;               /* allocated size of buf */
>> +  ptrdiff_t chars_delta;        /* size - {number of Unicode chars in buf} 
>> */
> 
> When you say "Unicode chars", what do you mean? characters or bytes?
> If characters, then why do you need to qualify them with "Unicode"?

Characters. Will clarify.

>> +  Lisp_Object entries[];
>                        ^^
> Is this portable enough?

Something I'd like to know, too. We rely on C99 in many other aspects. Are 
there still compilers that are important to us but don't get this right?

10 years ago this was apparently an issue for IBM XL C 12.1, but modern 
versions are based on Clang. We could take our chances here; obviously we'll 
change it if someone complains but it seems unlikely. What do you think?

> I'd add an assertion here that BITS is not large enough to produce zero.

I'll deal with that in some way or another.

> This rejects unibyte non-ASCII strings, AFAU, in which case I suggest
> to think whether we really want that.  E.g., why is it wrong to encode
> a string to UTF-8, and then send it to JSON?

The way I see it, that would break the JSON abstraction: it transports strings 
of Unicode characters, not strings of bytes. A user who for some reason has a 
string of bytes that encode Unicode characters can just decode it in order to 
prove it to us. It's not the JSON encoder's job to decode the user's strings.

(It would also be a pain to deal with and risks slowing down the string 
serialiser even if it's a case that never happens.)

> Is JSON unable to handle Inf and NaN?

That's right.

>> +  /* FIXME: should this be a unibyte or multibyte string?
>> +     Right now we make a multibyte string for test compatibility,
>> +     but we are really encoding so unibyte would make more sense.  */
> 
> I indeed think this should be a unibyte string, because otherwise
> writing it to a file or a process will/might encode it, which would be
> wrong.

I would prefer that, too, but used multibyte for compatibility with the old 
code and so that its tests pass.
It should probably be a separate change if we decide that unibyte is better 
here.

>> +  json_out_t jo = {
>> +    .maxdepth = 25,
> 
> Is this arbitrary, or is it what JSON expects?  If arbitrary, should
> it be customizable? should it be documented?

It's semi-arbitrary but reasonable: the JSON_checker at json.org uses a maximum 
depth of 20 by default, and many implementations use its test suite. RFC-8259 
states that the maximum depth is implementation-dependent.

It's hardly worth making this into a parameter for the user to adjust but I'll 
clarify the code.

>> +  /* FIXME: Do we really need to do all this work below to insert a string?
>> +     Is there no function already written?  I must be missing something.  */
> 
> There is no function.  All the insert_from_* functions in insdel.c do
> something similar.

Thank you for confirming that. Looks like we could use some abstraction then.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]