grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 1/2] json: Add function to unescape JSON-encoded strings


From: Patrick Steinhardt
Subject: Re: [PATCH v5 1/2] json: Add function to unescape JSON-encoded strings
Date: Mon, 15 Aug 2022 17:44:19 +0200

On Tue, Jul 12, 2022 at 03:39:13PM +0200, Daniel Kiper wrote:
> On Mon, Jul 11, 2022 at 09:08:09AM -0400, Nicholas Vinson wrote:
> > On 7/11/22 06:44, Patrick Steinhardt wrote:
> > > JSON strings require certain characters to be encoded, either by using a
> > > single reverse solidus character "\" for a set of popular characters, or
> > > by using a Unicode representation of "\uXXXXX". The jsmn library doesn't
> > > handle unescaping for us, so we must implement this functionality for
> > > ourselves.
> > >
> > > Add a new function `grub_json_unescape ()` that takes a potentially
> > > escaped JSON string as input and returns a new unescaped string.
> > >
> > > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > > ---
> > >   grub-core/lib/json/json.c | 118 ++++++++++++++++++++++++++++++++++++++
> > >   grub-core/lib/json/json.h |  12 ++++
> > >   2 files changed, 130 insertions(+)
> > >
> > > diff --git a/grub-core/lib/json/json.c b/grub-core/lib/json/json.c
> > > index 1c20c75ea..1eadd1ce9 100644
> > > --- a/grub-core/lib/json/json.c
> > > +++ b/grub-core/lib/json/json.c
> > > @@ -262,3 +262,121 @@ grub_json_getint64 (grub_int64_t *out, const 
> > > grub_json_t *parent, const char *ke
> > >     return GRUB_ERR_NONE;
> > >   }
> > > +
> > > +grub_err_t
> > > +grub_json_unescape (char **out, grub_size_t *outlen, const char *in, 
> > > grub_size_t inlen)
> > > +{
> > > +  grub_err_t ret = GRUB_ERR_NONE;
> > > +  grub_size_t inpos, resultpos;
> > > +  char *result;
> > > +
> > > +  if (out == NULL || outlen == NULL)
> > > +    return grub_error (GRUB_ERR_BAD_ARGUMENT, N_("output parameters are 
> > > not set"));
> > > +
> > > +  result = grub_calloc (1, inlen + 1);
> > > +  if (result == NULL)
> > > +    return GRUB_ERR_OUT_OF_MEMORY;
> > > +
> > > +  for (inpos = resultpos = 0; inpos < inlen; inpos++)
> > > +    {
> > > +      if (in[inpos] == '\\')
> > > + {
> > > +   inpos++;
> > > +   if (inpos >= inlen)
> > > +     {
> > > +       ret = grub_error (GRUB_ERR_BAD_ARGUMENT, N_("expected escaped 
> > > character"));
> > > +       goto err;
> > > +     }
> > > +
> > > +   switch (in[inpos])
> > > +     {
> > > +       case '"':
> > > +         result[resultpos++] = '"';
> > > +         break;
> > > +
> > > +       case '/':
> > > +         result[resultpos++] = '/';
> > > +         break;
> > > +
> > > +       case '\\':
> > > +         result[resultpos++] = '\\';
> > > +         break;
> > > +
> > > +       case 'b':
> > > +         result[resultpos++] = '\b';
> > > +         break;
> > > +
> > > +       case 'f':
> > > +         result[resultpos++] = '\f';
> > > +         break;
> > > +
> > > +       case 'r':
> > > +         result[resultpos++] = '\r';
> > > +         break;
> > > +
> > > +       case 'n':
> > > +         result[resultpos++] = '\n';
> > > +         break;
> > > +
> > > +       case 't':
> > > +         result[resultpos++] = '\t';
> > > +         break;
> > > +
> > > +       case 'u':
> > > +         {
> > > +           char values[4] = {0};
> > > +           unsigned i;
> > > +
> > > +           inpos++;
> > > +           if (inpos + ARRAY_SIZE(values) > inlen)
> > > +             {
> > > +               ret = grub_error (GRUB_ERR_BAD_ARGUMENT, N_("unicode 
> > > sequence too short"));
> > > +               goto err;
> > > +             }
> >
> > I recommend relaxing these errors a little bit. While technically
> > correct, I'm thinking it might serve GRUB better to be a bit more
> > forgiving.
> >
> > Perhaps something like:
> >
> > if the input is '\u????' where ? is not a hex digit, set the value to
> > 'u', and process ???? separately.
> >
> > if ???? is less than ARRAY_SIZE chars, left-pad ???? with 0s* until it's
> > the correct length then convert the value.
> >
> > (* I don't mean to literally pad. I mean to treat the short number as if
> > it was left-padded with 0s)
> >
> > > +
> > > +           for (i = 0; i < ARRAY_SIZE(values); i++)
> > > +             {
> > > +               char c = in[inpos++];
> > > +
> > > +               if (c >= '0' && c <= '9')
> > > +                 values[i] = c - '0';
> > > +               else if (c >= 'A' && c <= 'F')
> > > +                 values[i] = c - 'A' + 10;
> > > +               else if (c >= 'a' && c <= 'f')
> > > +                 values[i] = c - 'a' + 10;
> > > +               else
> > > +                 {
> >
> > With a similar argument here. Treat the short ???? as suggested above
> > and resume normal processing at the non-hex digit position.
> >
> > > +                   ret = grub_error (GRUB_ERR_BAD_ARGUMENT,
> > > +                                     N_("unicode sequence with invalid 
> > > character '%c'"), c);
> > > +                   goto err;
> > > +                 }
> > > +             }
> > > +
> > > +           if (values[0] != 0 || values[1] != 0)
> > > +             result[resultpos++] = values[0] << 4 | values[1];
> > > +           result[resultpos++] = values[2] << 4 | values[3];
> > > +
> > > +           /* Offset the increment that's coming in via the loop 
> > > increment. */
> > > +           inpos--;
> > > +
> > > +           break;
> > > +         }
> > > +
> > > +       default:
> > > +         ret = grub_error (GRUB_ERR_BAD_ARGUMENT, N_("unrecognized 
> > > escaped character '%c'"), in[inpos]);
> > > +         goto err;
> >
> > And the final prong of the suggestion would be here. Treat the improper
> > escape as if it had not been escaped at all or as '\\?' where '?' is the
> > escaped character.
> 
> Nicholas comments make sense for me.
> 
> Patrick?

They do make sense indeed. But it feels like we're now iterating more on
the way we decode JSON-strings, which is exactly the kind of reason why
I initially wanted to punt on that and just decode only those characters
that cryptsetup v2.0.0 would've written with an escape character back
then. And this really only is the backslash character.

I don't think it's sensible to make this decoding more complicated by
handling all these different cases when we know that there only was a
single version of cryptsetup that would only have written a single
character with optional escaping, and that this case was changed at a
later point in time to not do that anymore. So ultimately this all feels
rather theoretical at this point, so I lean towards being pragmatic and
making adjustments at a later point when we ever see that this is indeed
a real-world problem.

Patrick

> Daniel
> 
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]