lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Version 0.1 of plzip released


From: Ersek, Laszlo
Subject: Re: [Lzip-bug] Version 0.1 of plzip released
Date: Thu, 10 Dec 2009 00:52:52 +0100 (CET)

On Wed, 9 Dec 2009, John Reiser wrote:

Version 0.1 of plzip uses a ./configure script to generate ./Makefile. The configure could pre-compute the various sizeof(<type>) that are involved in the overflow checks [...] Either way, any test which deals with size-based overflow, and whose results are pre-determined solely on the basis of the types involved, should take advantage of those facts.

I am not debating this wrt. to plzip, since, as you say, it does use a ./configure script.


If constant values that result from analysis of types and type ranges are not acknowledged and/or not propagated, then any resulting compiler warnings are maintenance bugs. It is a bug that someone who compiles the software should have to consider, "Is the compiler warning valid? Should I ignore the compiler warning?"

I both agree and disagree with you. I agree that compiler warnings are worrisome and they must be fixed or judiciously suppressed. I was looking for ways to selectively suppress common gcc warnings *at specific locations* in my code. Those warnings are correct but I did think them through, and in some cases I won't change my code to dance around one specific compiler's over-cautiousness. Caution is good, but there are exceptions where the "uncommon" code is not the result of oversight on the programmer's part.

I have no problem basing real C conditionals on constants in my code, if the #if alternative appears messy. (I consider ./configure messy for the caliber of projects I do in my "free" time, but that's personal taste.)


The absolute minimum requirement is a section in a README or RELEASE-NOTES file which lists the compiler warnings and explains them. It is much better to avoid the warnings entirely.

So that bit-rot has a chance to creep into the documentation, too. Compiler warnings are not only compiler family dependent but also compiler version dependent. If memory servers, there were big -W changes between gcc-3 and gcc-4.

I don't program for a specific compiler. I program for SUSv2. However, this mailing list is not about my stuff, so wrt. to plzip, you may be right.


Turning to the actual code main.cc [with line numbers of plzip-0.1]:
   1091   long max_worker = sysconf(_SC_THREAD_THREADS_MAX);
   1092   if (-1L == max_worker) {
   1093     max_worker = LONG_MAX;
   1094   }
   1095   if (UINT_MAX < (long unsigned)max_worker) {
   1096     max_worker = UINT_MAX;
   1097   }
   1098   if (UINT_MAX / blf < (unsigned)max_worker) {
   1099     max_worker = UINT_MAX / blf;
   1100   }
   1101   if ((size_t)-1 / sizeof(pthread_t) < (unsigned)max_worker) {
   1102     max_worker = (size_t)-1 / sizeof(pthread_t);
   1103   }

This part of code establishes an absolute upper limit on the number of
worker threads.  ...   The value of max_worker can
only decrease, starting with line 1095, and the safety of any
conditional itself is provided for by the earlier parts.

On the usual x86_64, which has 64-bit 'long' and 32-bit 'unsigned' and
64-bit 'size_t', then the test on line 1101 always is false, as warned
by gcc-4.4.1 and others.  The left-hand side "(size_t)-1 / sizeof(pthread_t)"
has the compile-time constant value 0x1ffffffffffffffful, and the maximum value
of any 'unsigned' on the right-hand side is 0xffffffffu.  The test never
can succeed, regardless of the runtime value of max_worker, and this fact
is known at compile time and at programming time.

Yes!


Therefore lines 1101-1103 should be removed by conditional compilation in the usual x86_64 environment. Not doing so is a maintenance bug because common compilers warn, and dealing with the warning wastes people time.

Common compilers may warn, and usefully so, and dealing with superfluous warnings definitely wastes people's time, but I will not complicate valid code to shut up a compiler. Furthermore, a quality compiler, such as gcc, will omit the code, so there's no runtime or size penalty. (Even if the code is left in place, that's no problem either.) If there'll be a *local* suppression #pragma or anything, I'll use it.

(Since we're already talking software maintenance or release management or some such, please realize that I don't code free software for the masses in the first place. I do try to help packagers and do my own distro-specific packaging work as nicely as I can; but my free software development is not another job for me, where I have to put up with user idiosyncrasies. I strive for maximum portability, and once I achieve it with what I consider cleanliness and simplicity, I refuse to muddle it up in order to work around "maintenance bugs", which have nothing to do with my valid code and everything to do with compiler pecularities. Making those lines dependent on a preprocessor macro would require an external utility to define that macro "in some cases", and that utility would have to cover *all* cases allowed under SUSv2.)

I find it entirely possible that Antonio will heed your remarks, though.


I would welcome an explicit initialized constant such as
  size_t const SIZE_T_MAX = ~(size_t)0;
  size_t const SIZE_T_MAX(~(size_t)0);   // perhaps better for C++
and then use SIZE_T_MAX as appropriate instead of "(size_t)-1".
I have programmed for some one's-complement machines, where
-1 ==> 0xff...ffE  which is not the same as
~0 ==> 0xff...ffF .

(size_t)-1 is not defined as a bit pattern. From the C99 standard (I have no C89 on myself, sorry):

    6.3.1.3 Signed and unsigned integers

    1 When a value with integer type is converted to another integer type
    other than _Bool, if the value can be represented by the new type, it is
    unchanged.

    2 Otherwise, if the new type is unsigned, the value is converted by
    repeatedly adding or subtracting one more than the maximum value that
    can be represented in the new type until the value is in the range of
    the new type.49)

    49) The rules describe arithmetic on the mathematical value, not the
    value of a given type of expression.

Suppose size_t, being an unsigned integral type, has range [0 .. SIZE_T_MAX]. (We don't know SIZE_T_MAX by value.) If we convert the int constant "-1" to size_t, the algorithm above describes -1 + 1 * ( SIZE_T_MAX + 1 ) = SIZE_T_MAX, which is range.

Iff C89 has different rules for this, then I'm horribly wrong and offer my apology.


About the same is going on around plzip.cc:601.

No, this is worse.  Included below are explicit numerical counterexamples
which demonstrate bugs in programming logic.

The source line is
 601  SIZES(S2w_blk, plain, 2u * compr_lev[clidx].dict_size, split);

and the expanded macro begins ["gcc -E plzip.cc", pretty-printed]

  X1  do {
  X2    unsigned tmp; tmp = 2u * compr_lev[clidx].dict_size;
  X3    if ((size_t)-1 < tmp) {
  X4      show_error( "size_t overflow in sizeof_" "plain" "." ); fatal()
  X5    }

With 64-bit 'size_t' and with 32-bit 'unsigned', then the test on line X3
is essentially the same as discussed above.  Thus it should be removed
by conditional compilation in order to avoid a maintenance bug.

... Perhaps so in plzip; I repeat my above observations too.


On a machine with 32-bit 'size-t' and 32-bit 'unsigned' (such as usual i686),
then things are worse. The types of both operands to the comparison on line
X3 are effectively the same (namely: 'unsigned'), and it is impossible for
the runtime value any variable such as 'tmp', to exceed the maximum value
for that type.  This is known at compile time and at programming time,
so again conditional compilation should remove the test.
That's the first problem.

Ditto.


Here's the second problem.  If (0x80000000u <= compr_lev[clidx].dict_size)
then the computation of
  unsigned tmp = 2u * compr_lev[clidx].dict_size;
suffers an overflow.

Sure thing. That's why I've verified all static const compr_lev[clidx].dict_size values before writing down the code in llzip. We discussed this with Antonio in private on 19-20 Nov 2009 (message identifiers for his reference: <address@hidden> <address@hidden> <address@hidden>). If dict_size can be selected more freely, this formula must be definitely revised.


The value is *undefined*,

Not undefined, just wrong :) Unsigned overflow is well-defined, see above.


Testing for size-based resource exhaustion must avoid arithmetic overflow.

I concur whole-heartedly.


One of the operands to the comparison should be a variable by itself; all the constants should be folded into the other operand,

This constant folding was considered cursorily, but after reviewing the expression step by step in the private mails mentioned above, with those static constant values in mind, the more verbose expression was retained. This holds for the multiplication / division below, too.


Notice that encapsulating this into a general macro often is tedious.

The SIZES macro is specifically not there to hide implementation details. It's #defined and #undefined tightly around the client code, and only serves to save me from typing up the same tedious code twice (computing struct and array sizes after range checking) and unavoidably introducing typos. I did play it through in my head with both call sites / arguments.


That is a good warning!  Avoid such macros; they tend to cause problems.

Function-like macros are dangerous; I agree.


Careful readers may suggest that it is impossible for
 (0x80000000u <= compr_lev[clidx].dict_size)
because of resource restrictions.

I would never do such a thing.


There are more problems.

 607  SIZES(W2m_blk, compr, (4u + 1u + 1u)
 608      + ((unsigned)split_arg.sizeof_plain * 9u + 7u) / 8u + (4u + 8u + 8u),
 609      work);

Again, testing for size-based resource exhaustion must avoid arithmetic overflow. In this case "((unsigned)split_arg.sizeof_plain * 9u + 7u) / 8u" is particularly suspect. If (.sizeof_plain > 0x1c71c71c) then the multiplication by 9u suffers an overflow,

We've checked this too, step by step. If you select the highest value of .dict_size in llzip-0.03, ie. 32u * 1024u * 1024u, split_arg.sizeof_plain ends up at 64M (0x04000000). Then,


    Date: Thu, 19 Nov 2009 23:30:17 +0100 (CET)
    From: ERSEK Laszlo
    To: Antonio Diaz Diaz
    Subject: Re: Question: parallel lzip?

    [...] Here's the plan:

    1. I'll use the minilzip -1 .. -9 presets for dict sizes and match
    lengths.

    2. I'll choose double dict size for input block length (->
    sizeof_plain). This seems to be at most 2 * 32MB = 64MB then
    (binary MB).

    3. I'll choose the following expression for compressed buffer
    length (-> sizeof_compr) -- single member file:

    3.a. header

    3.b. 9 bits per byte, rounded up

    3.c. trailer

    as in

    sizeof_compr = (4u + 1u + 1u) + (sizeof_plain * 9u + 7u) / 8u
        + (4u + 8u + 8u)

    The actual source code form will be different, but that's the gist
    of it. At no point during the evaluation of this expression is an
    unsigned overflow possible (UINT_MAX is at least 2^32 - 1 in
    SUSv2). Substituting 64MB for sizeof_plain:

    (0x6u + 0x24000007u / 8u) + 0x14u == 0x480001Au == 75497498u


SIZES is not a generic, "exported" macro. It's "copy-paste by preprocessor", which I consider valid preprocessor usage.


One new problem is the "bare" constants such as in "* 9u + 7u) / 8u",
"(4u + 1u + 1u)", and "(4u + 8u + 8u)".  What is the source
of all those numerical values?

http://www.nongnu.org/lzip/manual/lzip_manual.html#File-Format

Please note that llzip was not a "final product". It was meant from the beginning to be forked / taken over by Antonio. That's why there are no nice defines for those values. If you've read at least once the file format section of the lzip manual -- which is not a stretch to assume since you're reading the source of a program creating lzip files -- then you'll immediately see the header-body-trailer dissociation in the formula. Constants weren't folded explicitly in order to support that insight (even without symbolic names, yes).


In the definition of the SIZES macro itself:

 590          < arg ## _arg . sizeof_ ## arr - (size_t)1) { \

 595          + (arg ## _arg . sizeof_ ## arr - (size_t)1); \

the "(size_t)1" is really "sizeof(S2w_blk.plain)" or
"sizeof(W2m_blk.compr)".

Yes.


The commonality of the '1' in SIZES with the
'1' in "S2w_blk.plain[1]" or in "W2m_blk.compr[1]" has been lost.

Well, they are (C89-style) flexible array members, so relying on their fixed size of 1 byte seemed natural. (If they had any other size, then they would not be flexible array members, and then the whole discussed code section would make no sense. Indeed, before options -1 .. -9 were introduced in lbzip2 (from which llzip was forked), those array members had correctly #define-d sizes, and there existed no SIZES() macro at all.)

Using a parametric sizeof expression would communicate intent more clearly, truly, at the price of complicating the SIZES macro even more. Additionally, with the current subtraction, it is easier to see that the subtrahend, (size_t)1, cannot be greater than the minuend, "arg ## _arg . sizeof_ ## arr". (As outrageous as it might seem, I do trust myself to set something called "sizeof_*" to something positive, especially if it's set three lines above.)


The definition of the SIZES macro has created a bad situation for maintenance.

Positively. I suspect Antonio will replace it altogether, and not later than introducing the "--dictionary-size=" parameter.

Please understand, though, that in my *hobbyist* free software activity, I don't code for maintenance by other people above all. This may not be nice, but that's how it is. I don't need those *second* eight hours spent with hacking to be another (but unpaid) *job*. The llzip-0.03 code is technically correct. Maintainability might not be stellar, but here we are, discussing it in relation with plzip; and if Antonio (or anybody) asked me for clarification on any part, I'd be more than happy to do my best explaining.


In this case, careful analysis of these warnings from gcc reveals true bugs,

At worst, communication bugs from one programmer to a different one, which in my activity are much less urgent than getting the damn code into existence at all, with all-nighters. I feel justified to prioritize ease of development on my part over ease of maintenance / comprehension by others, in case those two goals appear to conflict.


particularly the arithmetic overflows in testing for resource exhaustion.

llzip-0.03 continues to feature no overflows that I know of.

-o-

Thank you for taking the time to dissect plzip, and consequently, parts of llzip, and consequently, parts of lbzip2. Code review is boring and hard. I'm sure users will benefit if you keep doing it, it may render plzip (and if you care and succeed to convince me in some points, perhaps lbzip2) better -- or at least more easily maintainable -- software.

I'm sorry for any grammatical and/or logic errors I'm certain I've committed above; it's sorta late again.

Cheers,
lacos




reply via email to

[Prev in Thread] Current Thread [Next in Thread]