[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70007: [PATCH] native JSON encoder

From: Eli Zaretskii
Subject: bug#70007: [PATCH] native JSON encoder
Date: Wed, 27 Mar 2024 21:05:54 +0200

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Wed, 27 Mar 2024 19:57:24 +0100
> Cc: Yuan Fu <casouri@gmail.com>,
>  70007@debbugs.gnu.org
> Eli, thank you for your comments!

Thanks for working on this in the first place.

> > This rejects unibyte non-ASCII strings, AFAU, in which case I suggest
> > to think whether we really want that.  E.g., why is it wrong to encode
> > a string to UTF-8, and then send it to JSON?
> The way I see it, that would break the JSON abstraction: it transports 
> strings of Unicode characters, not strings of bytes.

What's the difference?  AFAIU, JSON expects UTF-8 encoded strings, and
whether that is used as a sequence of bytes or a sequence of
characters is in the eyes of the beholder: the bytestream is the same,
only the interpretation changes.  So I'm not sure I understand how
this would break the assumption.

> A user who for some reason has a string of bytes that encode Unicode 
> characters can just decode it in order to prove it to us. It's not the JSON 
> encoder's job to decode the user's strings.

I didn't suggest to decode the input string, not at all.  I suggested
to allow unibyte strings, and process them just like you process
pure-ASCII strings, leaving it to the caller to make sure the string
has only valid UTF-8 sequences.  Forcing callers to decode such
strings is IMO too harsh and largely unjustified.

> (It would also be a pain to deal with and risks slowing down the string 
> serialiser even if it's a case that never happens.)

I don't understand why.  Once again, I'm just talking about passing
the bytes through as you do with ASCII characters.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]