bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7357: csplit: memory exhausted when using stdout / pipe instead of a


From: Jim Meyering
Subject: bug#7357: csplit: memory exhausted when using stdout / pipe instead of a file
Date: Thu, 11 Nov 2010 10:24:18 +0100

Paul Eggert wrote:
> On 11/10/10 10:49, Eric Blake wrote:
>> Actually, the next version of POSIX will require EOVERFLOW if printf is
>> directed to print more than INT_MAX bytes:
>
> Thanks, I didn't know that.  But this issue is slightly different.
> For example, glibc (RHEL 5 x86-64) treats printf ("%4294967297d", 0)
> as if it were printf ("%1d", 0), and returns 1 without setting
> EOVERFLOW.  I expect that this bug is common, and our code should be
> robust when it happens.
>
> There are other printf problems in this area too, I'm afraid.  max_out
> does not check for size_t overflow and can mess up big-time when it
> happens.  And get_format_flags does not reject flag combinations that
> lead to undefined behavior.  There are other issues with size_t overflow.
> It's no doubt possible to get a buffer overflow on some platforms.
>
> I scouted through the code looking for undefined behavior with printf
> formats in csplit and fixed everything I found.  I propose the
> following further patch to address them.
>
>>From 2b1f077d355b56b49c82f4f5223ee76a4f1b3396 Mon Sep 17 00:00:00 2001
> From: Paul Eggert <address@hidden>
> Date: Wed, 10 Nov 2010 20:34:52 -0800
> Subject: [PATCH] csplit: do not rely on undefined behavior in printf formats
>
> * doc/coreutils.texi (csplit invocation): Say that %d and %i are
> aliases for %u.
> * src/csplit.c (FLAG_THOUSANDS, FLAG_ALTERNATIVE): New constants.
> (get_format_flags): Now take char const * and int * and return
> size_t.  It now stores info about the flags instead of merely
> scanning them.  Also, it handles '0' correctly.  Drop support for
> the undocumented '+' and ' ' flags since the value is unsigned.
> Add support for the (undocumented) "'" flag.  All uses changed.
> (get_format_width, get_format_prec): Remove.
> (check_format_conv_type): Renamed from get_format_conv_type, with
> a different signature.  It now converts the format to one that is
> compatible with unsigned int, and checks flags.  All uses changed.
> (max_out): Have snprintf compute the number of bytes needed rather
> than attempting to do it ourselves (which doesn't work portably
> with outlandish formats such as %4294967296d).
> (check_format_conv_type, main): Check for overflow in size
> calculations.  Don't assume size_t fits in unsigned int.
> * tests/misc/csplit: Check for proper handling of flags, with
> %0#6.3x.  Coreutils 8.6 mishandles this somewhat-weird example.
...

Thanks for the nice patch.
I will push it shortly.

Note however, that using glibc's snprintf results in
potentially large heap allocation, up to 2GiB:

  $ seq 1000 | valgrind src/csplit -b %2294967296d - '/./' '{*}'
  ==10741== Memcheck, a memory error detector
  ==10741== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
  ==10741== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
  ==10741== Command: src/csplit -b %2294967296d - /./ {*}
  ==10741==
  ==10741== Warning: silly arg (-1999999968) to malloc()
  src/csplit: memory exhausted

That happens here, where format is "%2294967296d":

  1292      int maxlen = snprintf (NULL, 0, format, UINT_MAX);
  1293      if (! (0 <= maxlen && maxlen <= SIZE_MAX))
  1294        xalloc_die ();

snprintf tries to allocate space in which to format the result

While testing this, I ran this:

    echo a|src/csplit -b %02147483647d - '/./' '{*}' > k 2> err

and it took a long time.  Understandable when you realize it is
generating a 4GiB-long diagnostic, all on one line:

    src/csplit: <2GiB of spaces>xx<2GiB of 0's>: File name too long

This was on a Fedora 14 x86_64 system.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]