bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70007: [PATCH] native JSON encoder


From: Mattias Engdegård
Subject: bug#70007: [PATCH] native JSON encoder
Date: Wed, 27 Mar 2024 13:46:17 +0100

26 mars 2024 kl. 17.46 skrev Eli Zaretskii <eliz@gnu.org>:

>> - The old code incorrectly accepted strings with non-Unicode characters (raw 
>> bytes). There is no reason to do this; JSON is UTF-8 only.
> 
> Would it complicate the code not to reject raw bytes?  I'd like to
> avoid incompatibilities if it's practical.  Also, Emacs traditionally
> doesn't reject raw bytes, leaving that to the application or the user.

Actually I may have misrepresented the behaviour of the old encoder. It doesn't 
accept any raw bytes but only sequences that happen to form valid UTF-8. It's 
quite strange, and I don't really think this was ever intended, just a 
consequence of the implementation.

This means that it accepts an already encoded unibyte UTF-8 string:

  (json-serialize "\303\251") -> "\"é\""

which is doubly odd since it's supposed to be encoding, but it ends up decoding 
the characters instead.
Even worse, it accepts mixtures of encoded and decoded chars:

  (json-serialize "é\303\251") -> "\"éé\""

which is just bonkers.
So while we could try to replicate this 'interesting' behaviour it would 
definitely complicate the code and be of questionable use.

The JSON spec is quite clear that it's UTF-8 only. The only useful deviation 
that I can think of would be to allow unpaired surrogates (WTF-8) to pass 
through for transmission of Windows file names, but that would be an extension 
-- the old encoder doesn't permit those.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]