[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs |
Date: |
Mon, 13 Aug 2018 09:07:57 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Eric Blake <address@hidden> writes:
> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> The JSON parser treats each half of a surrogate pair as unpaired
>> surrogate. Fix it to recognize surrogate pairs.
>>
>> Signed-off-by: Markus Armbruster <address@hidden>
>> ---
>> qobject/json-parser.c | 16 +++++++++++++++-
>> tests/check-qjson.c | 3 +--
>> 2 files changed, 16 insertions(+), 3 deletions(-)
>>
>
>> @@ -168,6 +170,18 @@ static QString *parse_string(JSONParserContext *ctxt,
>> JSONToken *token)
>> cp |= hex2decimal(*ptr);
>> }
>> + if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
>> + && ptr[1] == '\\' && ptr[2] == 'u') {
>> + ptr += 2;
>> + leading_surrogate = cp;
>> + goto hex;
>> + }
>> + if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
>> + cp &= 0x3FF;
>> + cp |= (leading_surrogate & 0x3FF) << 10;
>> + cp += 0x010000;
>> + }
>> +
>> if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
>> parse_error(ctxt, token,
>> "\\u%.4s is not a valid Unicode character",
>
> Consider "\\udbff\\udfff" - a valid surrogate pair (in terms of being
> in range), but which decodes to u+10ffff. Since is_valid_codepoint()
> (part of mod_utf8_encode()) rejects it due to (codepoint & 0xfffe) ==
> 0xfffe, it means we end up printing this error message, but only using
> the second half of the surrogate pair. Is that okay?
It's not horrible, but I wouldn't call it okay. I'll try to improve it.
> Otherwise,
> Reviewed-by: Eric Blake <address@hidden>
Thanks!
- Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser, (continued)
- [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8"), Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP, Markus Armbruster, 2018/08/08
- [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly, Markus Armbruster, 2018/08/08