bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70007: [PATCH] native JSON encoder


From: Eli Zaretskii
Subject: bug#70007: [PATCH] native JSON encoder
Date: Wed, 27 Mar 2024 19:40:59 +0200

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Wed, 27 Mar 2024 16:49:53 +0100
> Cc: 70007@debbugs.gnu.org
> 
> Here is an updated patch. It now ignores duplicated keys in objects 
> represented by alists and plists, just like the old encoder. (I didn't 
> include this in the first draft out of fear it would be slow and complicated, 
> but it turned out just to be complicated.)
> 
> The performance is still acceptable, which means at least 2x the speed of the 
> Jansson-based encoder.

Thanks.  A few initial comments and questions, based on a very cursory
reading.

> +/* JSON encoding context */

This is not our comment style.

> +typedef struct {
> +  char *buf;
> +  ptrdiff_t size;            /* number of bytes in buf */
> +  ptrdiff_t capacity;                /* allocated size of buf */
> +  ptrdiff_t chars_delta;        /* size - {number of Unicode chars in buf} */

When you say "Unicode chars", what do you mean? characters or bytes?
If characters, then why do you need to qualify them with "Unicode"?

> +struct symset_tbl
> +{
> +  /* Table used by the containing object if any, so that we can easily
> +     all tables if an error occurs.  */
> +  struct symset_tbl *up;
> +  /* Table of symbols (2**bits entries), Qunbound where unused.  */
> +  Lisp_Object entries[];
                        ^^
Is this portable enough?

> +static struct symset_tbl *
> +alloc_symset_table (int bits)
> +{
> +  struct symset_tbl *st = xmalloc (sizeof *st + (sizeof *st->entries << 
> bits));
> +  int size = 1 << bits;

I'd add an assertion here that BITS is not large enough to produce zero.

> +/* Enlarge the table used by a symset. */
                                        ^^
Two spaces there, please.

> +static NO_INLINE void
> +symset_expand (symset_t *ss)
> +{
> +  struct symset_tbl *old_table = ss->table;
> +  int oldbits = ss->bits;
> +  int oldsize = 1 << oldbits;

I'd add an assertion here about the magnitude of BITS.

> +  while (p < end)
> +    {
> +      unsigned char c = *p;
> +      if (json_plain_char[c])
> +     {
> +       json_out_byte (jo, c);
> +       p++;
> +     }
> +      else if (c > 0x7f)
> +     {
> +       if (STRING_MULTIBYTE (str))
> +         {
> +           int n;
> +           if (c <= 0xc1)
> +             string_not_unicode (str);
> +           if (c <= 0xdf)
> +             n = 2;
> +           else if (c <= 0xef)
> +             {
> +               int v = (((c & 0x0f) << 12)
> +                        + ((p[1] & 0x3f) << 6) + (p[2] & 0x3f));
> +               if (char_surrogate_p (v))
> +                 string_not_unicode (str);
> +               n = 3;
> +             }
> +           else if (c <= 0xf7)
> +             {
> +               int v = (((c & 0x07) << 18)
> +                        + ((p[1] & 0x3f) << 12)
> +                        + ((p[2] & 0x3f) << 6)
> +                        + (p[3] & 0x3f));
> +               if (v > MAX_UNICODE_CHAR)
> +                 string_not_unicode (str);
> +               n = 4;
> +             }
> +           else
> +             string_not_unicode (str);
> +           json_out_str (jo, (const char *)p, n);
> +           jo->chars_delta += n - 1;
> +           p += n;
> +         }
> +       else
> +         string_not_unicode (str);

This rejects unibyte non-ASCII strings, AFAU, in which case I suggest
to think whether we really want that.  E.g., why is it wrong to encode
a string to UTF-8, and then send it to JSON?

> +static void
> +json_out_float (json_out_t *jo, Lisp_Object f)
> +{
> +  double x = XFLOAT_DATA (f);
> +  if (isinf (x) || isnan (x))
> +    signal_error ("not a finite number", f);

Is JSON unable to handle Inf and NaN?

> +static Lisp_Object
> +json_out_string_result (json_out_t *jo)
> +{
> +  /* FIXME: should this be a unibyte or multibyte string?
> +     Right now we make a multibyte string for test compatibility,
> +     but we are really encoding so unibyte would make more sense.  */

I indeed think this should be a unibyte string, because otherwise
writing it to a file or a process will/might encode it, which would be
wrong.

> +  json_out_t jo = {
> +    .maxdepth = 25,

Is this arbitrary, or is it what JSON expects?  If arbitrary, should
it be customizable? should it be documented?

> +  /* FIXME: Do we really need to do all this work below to insert a string?
> +     Is there no function already written?  I must be missing something.  */

There is no function.  All the insert_from_* functions in insdel.c do
something similar.

Btw, shouldn't json-insert call treesit_record_change?  Yuan?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]