Re: [Lzip-bug] Version 0.1 of plzip released

lzip-bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Version 0.1 of plzip released

From:	Ersek, Laszlo
Subject:	Re: [Lzip-bug] Version 0.1 of plzip released
Date:	Thu, 10 Dec 2009 00:52:52 +0100 (CET)

On Wed, 9 Dec 2009, John Reiser wrote:

Version 0.1 of plzip uses a ./configure script to generate ./Makefile.The configure could pre-compute the various sizeof(<type>) that areinvolved in the overflow checks [...] Either way, any test which dealswith size-based overflow, and whose results are pre-determined solely onthe basis of the types involved, should take advantage of those facts.

I am not debating this wrt. to plzip, since, as you say, it does usea ./configure script.

If constant values that result from analysis of types and type rangesare not acknowledged and/or not propagated, then any resulting compilerwarnings are maintenance bugs. It is a bug that someone who compilesthe software should have to consider, "Is the compiler warning valid?Should I ignore the compiler warning?"

I both agree and disagree with you. I agree that compiler warnings areworrisome and they must be fixed or judiciously suppressed. I was lookingfor ways to selectively suppress common gcc warnings *at specificlocations* in my code. Those warnings are correct but I did think themthrough, and in some cases I won't change my code to dance around onespecific compiler's over-cautiousness. Caution is good, but there areexceptions where the "uncommon" code is not the result of oversight on theprogrammer's part.

I have no problem basing real C conditionals on constants in my code, ifthe #if alternative appears messy. (I consider ./configure messy for thecaliber of projects I do in my "free" time, but that's personal taste.)

The absolute minimum requirement is a section in a README orRELEASE-NOTES file which lists the compiler warnings and explains them.It is much better to avoid the warnings entirely.

So that bit-rot has a chance to creep into the documentation, too.Compiler warnings are not only compiler family dependent but also compilerversion dependent. If memory servers, there were big -W changes betweengcc-3 and gcc-4.

I don't program for a specific compiler. I program for SUSv2. However,this mailing list is not about my stuff, so wrt. to plzip, you may beright.

Turning to the actual code main.cc [with line numbers of plzip-0.1]:

   1091   long max_worker = sysconf(_SC_THREAD_THREADS_MAX);
   1092   if (-1L == max_worker) {
   1093     max_worker = LONG_MAX;
   1094   }
   1095   if (UINT_MAX < (long unsigned)max_worker) {
   1096     max_worker = UINT_MAX;
   1097   }
   1098   if (UINT_MAX / blf < (unsigned)max_worker) {
   1099     max_worker = UINT_MAX / blf;
   1100   }
   1101   if ((size_t)-1 / sizeof(pthread_t) < (unsigned)max_worker) {
   1102     max_worker = (size_t)-1 / sizeof(pthread_t);
   1103   }

This part of code establishes an absolute upper limit on the number of
worker threads.  ...   The value of max_worker can
only decrease, starting with line 1095, and the safety of any
conditional itself is provided for by the earlier parts.


On the usual x86_64, which has 64-bit 'long' and 32-bit 'unsigned' and
64-bit 'size_t', then the test on line 1101 always is false, as warned
by gcc-4.4.1 and others.  The left-hand side "(size_t)-1 / sizeof(pthread_t)"

has the compile-time constant value 0x1ffffffffffffffful, and the maximumvalue

of any 'unsigned' on the right-hand side is 0xffffffffu.  The test never
can succeed, regardless of the runtime value of max_worker, and this fact
is known at compile time and at programming time.


Yes!

Therefore lines 1101-1103 should be removed by conditional compilationin the usual x86_64 environment. Not doing so is a maintenance bugbecause common compilers warn, and dealing with the warning wastespeople time.

Common compilers may warn, and usefully so, and dealing with superfluouswarnings definitely wastes people's time, but I will not complicate validcode to shut up a compiler. Furthermore, a quality compiler, such as gcc,will omit the code, so there's no runtime or size penalty. (Even if thecode is left in place, that's no problem either.) If there'll be a *local*suppression #pragma or anything, I'll use it.

(Since we're already talking software maintenance or release management orsome such, please realize that I don't code free software for the massesin the first place. I do try to help packagers and do my owndistro-specific packaging work as nicely as I can; but my free softwaredevelopment is not another job for me, where I have to put up with useridiosyncrasies. I strive for maximum portability, and once I achieve itwith what I consider cleanliness and simplicity, I refuse to muddle it upin order to work around "maintenance bugs", which have nothing to do withmy valid code and everything to do with compiler pecularities. Makingthose lines dependent on a preprocessor macro would require an externalutility to define that macro "in some cases", and that utility would haveto cover *all* cases allowed under SUSv2.)


I find it entirely possible that Antonio will heed your remarks, though.

I would welcome an explicit initialized constant such as
  size_t const SIZE_T_MAX = ~(size_t)0;
  size_t const SIZE_T_MAX(~(size_t)0);   // perhaps better for C++
and then use SIZE_T_MAX as appropriate instead of "(size_t)-1".
I have programmed for some one's-complement machines, where
-1 ==> 0xff...ffE  which is not the same as
~0 ==> 0xff...ffF .

(size_t)-1 is not defined as a bit pattern. From the C99 standard (I haveno C89 on myself, sorry):


    6.3.1.3 Signed and unsigned integers

    1 When a value with integer type is converted to another integer type
    other than _Bool, if the value can be represented by the new type, it is
    unchanged.

    2 Otherwise, if the new type is unsigned, the value is converted by
    repeatedly adding or subtracting one more than the maximum value that
    can be represented in the new type until the value is in the range of
    the new type.49)

    49) The rules describe arithmetic on the mathematical value, not the
    value of a given type of expression.

Suppose size_t, being an unsigned integral type, has range [0 ..SIZE_T_MAX]. (We don't know SIZE_T_MAX by value.) If we convert the intconstant "-1" to size_t, the algorithm above describes -1 + 1 * (SIZE_T_MAX + 1 ) = SIZE_T_MAX, which is range.

Iff C89 has different rules for this, then I'm horribly wrong and offer myapology.

About the same is going on around plzip.cc:601.


No, this is worse.  Included below are explicit numerical counterexamples
which demonstrate bugs in programming logic.

The source line is
 601  SIZES(S2w_blk, plain, 2u * compr_lev[clidx].dict_size, split);

and the expanded macro begins ["gcc -E plzip.cc", pretty-printed]

  X1  do {
  X2    unsigned tmp; tmp = 2u * compr_lev[clidx].dict_size;
  X3    if ((size_t)-1 < tmp) {
  X4      show_error( "size_t overflow in sizeof_" "plain" "." ); fatal()
  X5    }

With 64-bit 'size_t' and with 32-bit 'unsigned', then the test on line X3
is essentially the same as discussed above.  Thus it should be removed
by conditional compilation in order to avoid a maintenance bug.


... Perhaps so in plzip; I repeat my above observations too.

On a machine with 32-bit 'size-t' and 32-bit 'unsigned' (such as usual i686),
then things are worse. The types of both operands to the comparison on line
X3 are effectively the same (namely: 'unsigned'), and it is impossible for
the runtime value any variable such as 'tmp', to exceed the maximum value
for that type.  This is known at compile time and at programming time,
so again conditional compilation should remove the test.
That's the first problem.


Ditto.

Here's the second problem.  If (0x80000000u <= compr_lev[clidx].dict_size)
then the computation of
  unsigned tmp = 2u * compr_lev[clidx].dict_size;
suffers an overflow.

Sure thing. That's why I've verified all static constcompr_lev[clidx].dict_size values before writing down the code in llzip.We discussed this with Antonio in private on 19-20 Nov 2009 (messageidentifiers for his reference:<address@hidden><address@hidden><address@hidden>). If dict_sizecan be selected more freely, this formula must be definitely revised.

The value is *undefined*,


Not undefined, just wrong :) Unsigned overflow is well-defined, see above.

Testing for size-based resource exhaustion must avoid arithmeticoverflow.


I concur whole-heartedly.

One of the operands to the comparison should be a variable by itself;all the constants should be folded into the other operand,

This constant folding was considered cursorily, but after reviewing theexpression step by step in the private mails mentioned above, with thosestatic constant values in mind, the more verbose expression was retained.This holds for the multiplication / division below, too.

Notice that encapsulating this into a general macro often is tedious.

The SIZES macro is specifically not there to hide implementation details.It's #defined and #undefined tightly around the client code, and onlyserves to save me from typing up the same tedious code twice (computingstruct and array sizes after range checking) and unavoidably introducingtypos. I did play it through in my head with both call sites / arguments.

That is a good warning!  Avoid such macros; they tend to cause problems.


Function-like macros are dangerous; I agree.

Careful readers may suggest that it is impossible for
 (0x80000000u <= compr_lev[clidx].dict_size)
because of resource restrictions.


I would never do such a thing.

There are more problems.

 607  SIZES(W2m_blk, compr, (4u + 1u + 1u)
 608      + ((unsigned)split_arg.sizeof_plain * 9u + 7u) / 8u + (4u + 8u + 8u),
 609      work);

Again, testing for size-based resource exhaustion must avoid arithmeticoverflow. In this case "((unsigned)split_arg.sizeof_plain * 9u + 7u) /8u" is particularly suspect. If (.sizeof_plain > 0x1c71c71c) then themultiplication by 9u suffers an overflow,

We've checked this too, step by step. If you select the highest value of.dict_size in llzip-0.03, ie. 32u * 1024u * 1024u, split_arg.sizeof_plainends up at 64M (0x04000000). Then,



    Date: Thu, 19 Nov 2009 23:30:17 +0100 (CET)
    From: ERSEK Laszlo
    To: Antonio Diaz Diaz
    Subject: Re: Question: parallel lzip?

    [...] Here's the plan:

    1. I'll use the minilzip -1 .. -9 presets for dict sizes and match
    lengths.

    2. I'll choose double dict size for input block length (->
    sizeof_plain). This seems to be at most 2 * 32MB = 64MB then
    (binary MB).

    3. I'll choose the following expression for compressed buffer
    length (-> sizeof_compr) -- single member file:

    3.a. header

    3.b. 9 bits per byte, rounded up

    3.c. trailer

    as in

    sizeof_compr = (4u + 1u + 1u) + (sizeof_plain * 9u + 7u) / 8u
        + (4u + 8u + 8u)

    The actual source code form will be different, but that's the gist
    of it. At no point during the evaluation of this expression is an
    unsigned overflow possible (UINT_MAX is at least 2^32 - 1 in
    SUSv2). Substituting 64MB for sizeof_plain:

    (0x6u + 0x24000007u / 8u) + 0x14u == 0x480001Au == 75497498u

SIZES is not a generic, "exported" macro. It's "copy-paste bypreprocessor", which I consider valid preprocessor usage.

One new problem is the "bare" constants such as in "* 9u + 7u) / 8u",
"(4u + 1u + 1u)", and "(4u + 8u + 8u)".  What is the source
of all those numerical values?


http://www.nongnu.org/lzip/manual/lzip_manual.html#File-Format

Please note that llzip was not a "final product". It was meant from thebeginning to be forked / taken over by Antonio. That's why there are nonice defines for those values. If you've read at least once the fileformat section of the lzip manual -- which is not a stretch to assumesince you're reading the source of a program creating lzip files -- thenyou'll immediately see the header-body-trailer dissociation in theformula. Constants weren't folded explicitly in order to support thatinsight (even without symbolic names, yes).

In the definition of the SIZES macro itself:

 590          < arg ## _arg . sizeof_ ## arr - (size_t)1) { \

 595          + (arg ## _arg . sizeof_ ## arr - (size_t)1); \

the "(size_t)1" is really "sizeof(S2w_blk.plain)" or
"sizeof(W2m_blk.compr)".


Yes.

The commonality of the '1' in SIZES with the
'1' in "S2w_blk.plain[1]" or in "W2m_blk.compr[1]" has been lost.

Well, they are (C89-style) flexible array members, so relying on theirfixed size of 1 byte seemed natural. (If they had any other size, thenthey would not be flexible array members, and then the whole discussedcode section would make no sense. Indeed, before options -1 .. -9 wereintroduced in lbzip2 (from which llzip was forked), those array membershad correctly #define-d sizes, and there existed no SIZES() macro at all.)

Using a parametric sizeof expression would communicate intent moreclearly, truly, at the price of complicating the SIZES macro even more.Additionally, with the current subtraction, it is easier to see that thesubtrahend, (size_t)1, cannot be greater than the minuend, "arg ## _arg .sizeof_ ## arr". (As outrageous as it might seem, I do trust myself to setsomething called "sizeof_*" to something positive, especially if it's setthree lines above.)

The definition of the SIZES macro has created a bad situation formaintenance.

Positively. I suspect Antonio will replace it altogether, and not laterthan introducing the "--dictionary-size=" parameter.

Please understand, though, that in my *hobbyist* free software activity, Idon't code for maintenance by other people above all. This may not benice, but that's how it is. I don't need those *second* eight hours spentwith hacking to be another (but unpaid) *job*. The llzip-0.03 code istechnically correct. Maintainability might not be stellar, but here weare, discussing it in relation with plzip; and if Antonio (or anybody)asked me for clarification on any part, I'd be more than happy to do mybest explaining.

In this case, careful analysis of these warnings from gcc reveals true bugs,

At worst, communication bugs from one programmer to a different one, whichin my activity are much less urgent than getting the damn code intoexistence at all, with all-nighters. I feel justified to prioritize easeof development on my part over ease of maintenance / comprehension byothers, in case those two goals appear to conflict.

particularly the arithmetic overflows in testing for resource exhaustion.


llzip-0.03 continues to feature no overflows that I know of.

-o-

Thank you for taking the time to dissect plzip, and consequently, parts ofllzip, and consequently, parts of lbzip2. Code review is boring and hard.I'm sure users will benefit if you keep doing it, it may render plzip (andif you care and succeed to convince me in some points, perhaps lbzip2)better -- or at least more easily maintainable -- software.

I'm sorry for any grammatical and/or logic errors I'm certain I'vecommitted above; it's sorta late again.


Cheers,
lacos

[Prev in Thread]

Current Thread

[Next in Thread]

[Lzip-bug] Version 0.1 of plzip released, Antonio Diaz Diaz, 2009/12/06
- Re: [Lzip-bug] Version 0.1 of plzip released, Tino Lange, 2009/12/07
  - Re: [Lzip-bug] Version 0.1 of plzip released, John Reiser, 2009/12/07
  - Re: [Lzip-bug] Version 0.1 of plzip released, Antonio Diaz Diaz, 2009/12/08
- Re: [Lzip-bug] Version 0.1 of plzip released, Ersek Laszlo, 2009/12/09
  - Re: [Lzip-bug] Version 0.1 of plzip released, John Reiser, 2009/12/09
    - Re: [Lzip-bug] Version 0.1 of plzip released, Ersek, Laszlo <=

Prev by Date: Re: [Lzip-bug] Version 0.1 of plzip released
Next by Date: Re: [Lzip-bug] Restarting an lzip member at any time
Previous by thread: Re: [Lzip-bug] Version 0.1 of plzip released
Next by thread: [Lzip-bug] Restarting an lzip member at any time
Index(es):
- Date
- Thread