autoconf-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

documentating the recent -fwrapv discussion


From: Paul Eggert
Subject: documentating the recent -fwrapv discussion
Date: Tue, 02 Jan 2007 15:08:06 -0800
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)

I installed the following change to the Autoconf documentation, in an
attempt to document better the problems that we've uncovered recently
with signed integer overflow on C platforms.  This change doesn't
affect Autoconf's behavior, just the documentation.

2007-01-02  Paul Eggert  <address@hidden>

        * doc/autoconf.texi (Integer Overflow): Greatly expand and
        rewrite, taking notions from the recent discussion on the gcc and
        autoconf mailing lists; please see
        http://lists.gnu.org/archive/html/autoconf-patches/2006-12/msg00091.html
        and follow the many links.
        (Integer Overflow Basics, Signed Overflow Examples):
        (Optimization and Wraparound, Signed Overflow Advice):
        (Signed Integer Division): New sections.

--- doc/autoconf.texi   29 Dec 2006 06:44:42 -0000      1.1121
+++ doc/autoconf.texi   2 Jan 2007 23:06:03 -0000
@@ -183,7 +183,7 @@ a package for creating scripts to config
 templates and an M4 macro package.

 Copyright @copyright{} 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000,
-2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
+2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc.

 @quotation
 Permission is granted to copy, distribute and/or modify this document
@@ -14949,15 +14949,296 @@ the programs work well enough in practic

 @node Integer Overflow
 @section Integer Overflow
address@hidden overflow, arithmetic
address@hidden integer overflow
address@hidden overflow, signed integer
address@hidden signed integer overflow
address@hidden wraparound arithmetic
+
+Many portable C programs assume that signed integer overflow wraps
+around reliably using two's complement arithmetic.  Yet the C standard
+says that program behavior is undefined on overflow, and in a few cases
+C programs do not work on some modern implementations because their
+overflows do not wrap around as their authors intended.  Conversely, in
+at least one common case related to overflow, the C standard requires
+behavior that is commonly not implemented.

-In C, signed integer overflow leads to undefined behavior.  However,
-many programs and Autoconf tests assume that signed integer overflow after
-addition, subtraction, or multiplication silently
-wraps around modulo a power of two, using two's complement arithmetic,
-so long as you cast the resulting value
-to an integer type or store it into an integer variable.  Such programs
-are portable to the vast majority of modern platforms.  However, signed
address@hidden
+* Integer Overflow Basics::      Why integer overflow is a problem
+* Signed Overflow Examples::     Examples of code assuming wraparound
+* Optimization and Wraparound::  Optimizations that break uses of wraparound
+* Signed Overflow Advice::       Practical advice for signed overflow issues
+* Signed Integer Division::      @code{INT_MIN / -1} and @code{INT_MIN % -1}
address@hidden menu
+
address@hidden Integer Overflow Basics
address@hidden Basics of Integer Overflow
address@hidden integer overflow
address@hidden overflow, signed integer
address@hidden signed integer overflow
address@hidden wraparound arithmetic
+
+In languages like C, unsigned integer overflow reliably wraps around
+modulo the word size.  This is guaranteed by the C standard and is
+portable in practice, unless you specify aggressive optimization options
+suitable only for special applications.
+
+In contrast, the C standard says that signed integer overflow leads to
+undefined behavior where a program can do anything, including dumping
+core or overrunning a buffer.  The misbehavior can even precede the
+overflow.  Such an overflow can occur during addition, subtraction,
+multiplication, division, and left shift.
+
+Despite this requirement of the standard, many C programs and Autoconf
+tests assume that signed integer overflow silently wraps around modulo a
+power of two, using two's complement arithmetic, so long as you cast the
+resulting value to a signed integer type or store it into a signed
+integer variable.  If you use conservative optimization flags, such
+programs are generally portable to the vast majority of modern
+platforms, with a few exceptions discussed later.
+
+For historical reasons the C standard also allows implementations with
+ones' complement or signed magnitude arithmetic, but it is safe to
+assume two's complement nowadays.
+
address@hidden Signed Overflow Examples
address@hidden Examples of Code Assuming Wraparound Overflow
address@hidden integer overflow
address@hidden overflow, signed integer
address@hidden signed integer overflow
address@hidden wraparound arithmetic
+
+There has long been a tension between what the C standard requires for
+signed integer overflow, and what C programs commonly assume.  The
+standard allows aggressive optimizations based on assumptions that
+overflow never occurs, but many practical C programs rely on overflow
+wrapping around.  These programs do not conform to the standard, but
+they commonly work in practice because compiler writers are
+understandably reluctant to implement optimizations that would break
+many programs, unless perhaps a user specifies aggressive optimization.
+
+The C Standard says that if a program has signed integer overflow its
+behavior is undefined, and the undefined behavior can even precede the
+overflow.  To take an extreme example:
+
address@hidden Inspired by Robert Dewar's example in
address@hidden <http://gcc.gnu.org/ml/gcc/2007-01/msg00038.html> (2007-01-01).
address@hidden
+if (password == expected_password)
+  allow_superuser_privileges ();
+else
+  printf ("%d password mismatches\n", counter++);
address@hidden example
+
address@hidden
+If @code{counter} is an @code{int} and a compiler can deduce that
address@hidden == INT_MAX} or that @code{counter} previously overflowed,
+the C standard allows the compiler to optimize away the password test
+and generate code that allows superuser privileges unconditionally.
+
+Despite this requirement by the standard, it has long been common for C
+code to assume wraparound arithmetic after signed overflow, and all
+known practical C implementations support some C idioms that assume
+wraparound signed arithmetic, even if the idioms does not conform
+strictly to the standard.  If your code looks like the following
+examples it will almost surely work with real-world compilers.
+
+Here is an example derived from the 7th Edition Unix implementation of
address@hidden (1979-01-10):
+
address@hidden
+char *p;
+int f, n;
address@hidden
+while (*p >= '0' && *p <= '9')
+  n = n * 10 + *p++ - '0';
+return (f ? -n : n);
address@hidden example
+
address@hidden
+Even if the input string is in range, on most modern machines this has
+signed overflow when computing the most negative integer (the @code{-n}
+overflows) or a value near an extreme integer (the first @code{+}
+overflows).
+
+Here is another example, taken from the 7th Edition implementation of
address@hidden (1979-01-10).  Here the programmer expects both
+multiplication and addition to wrap on overflow:
+
address@hidden
+static long int randx = 1;
address@hidden
+randx = randx * 1103515245 + 12345;
+return (randx >> 16) & 077777;
address@hidden example
+
+In the following example, derived from the @acronym{GNU} C Library 2.5
+implementation of @code{mktime} (2006-09-09), the code assumes
+wraparound arithmetic in @code{+} to detect signed overflow:
+
address@hidden
+time_t t, t1, t2;
+int sec_requested, sec_adjustment;
address@hidden
+t1 = t + sec_requested;
+t2 = t1 + sec_adjustment;
+if (((t1 < t) != (sec_requested < 0))
+    | ((t2 < t1) != (sec_adjustment < 0)))
+  return -1;
address@hidden example
+
+If your code looks like these examples, it is probably safe even though
+it does not strictly conform to the C standard.  This might lead one to
+believe that one can generally assume wraparound on overflow, but that
+is not always true, as can be seen in the next section.
+
address@hidden Optimization and Wraparound
address@hidden Optimizations That Break Wraparound Arithmetic
address@hidden loop induction
+
+Compilers sometimes generate code that is incompatible with wraparound
+integer arithmetic.  A simple example is an algebraic simplification: a
+compiler might translate @code{(i * 2000) / 1000} to @code{i * 2}
+because it assumes that @code{i * 2000} does not overflow.  The
+translation is not equivalent to the original when overflow occurs:
+e.g., in the typical case of 32-bit signed two's complement wraparound
address@hidden, if @code{i} has type @code{int} and value @code{1073742},
+the original expression returns @minus{}2147483 but the optimized
+version returns the mathematically correct value 2147484.
+
+More subtly, loop induction optimizations often exploit the undefined
+behavior of signed overflow.  Consider the following contrived function
address@hidden:
+
address@hidden
+int
+sumc (int lo, int hi)
address@hidden
+  int sum = 0;
+  int i;
+  for (i = lo; i <= hi; i++)
+    sum ^= i * 53;
+  return sum;
address@hidden
address@hidden example
+
address@hidden
+To avoid multiplying by 53 each time through the loop, an optimizing
+compiler might internally transform @code{sumc} to the equivalent of the
+following:
+
address@hidden
+int
+transformed_sumc (int lo, int hi)
address@hidden
+  int sum = 0;
+  int hic = hi * 53;
+  int ic;
+  for (ic = lo * 53; ic <= hic; ic += 53)
+    sum ^= ic;
+  return sum;
address@hidden
address@hidden example
+
address@hidden
+This transformation is allowed by the C standard, but it is invalid for
+wraparound arithmetic when @code{INT_MAX / 53 < hi}, because then the
+overflow in computing expressions like @code{hi * 53} can cause the
+expression @code{i <= hi} to yield a different value from the
+transformed expression @code{ic <= hic}.
+
+For this reason, compilers that use loop induction and similar
+techniques often do not support reliable wraparound arithmetic when a
+loop induction variable like @code{ic} is involved.  Since loop
+induction variables are generated by the compiler, and are not visible
+in the source code, it is not always trivial to say whether the problem
+affects your code.
+
+Hardly any code actually depends on wraparound arithmetic in cases like
+these, so in practice these loop induction optimizations are almost
+always useful.  However, edge cases in this area can cause problems.
+For example:
+
address@hidden
+int j;
+for (j = 1; 0 < j; j *= 2)
+  test (j);
address@hidden example
+
address@hidden
+Here, the loop attempts to iterate through all powers of 2 that
address@hidden can represent, but some test versions of @acronym{GCC}
+optimize away the comparison to zero and thus generate an infinite loop,
+under the argument that behavior is undefined on overflow.  As of this
+writing this optimization is not done by any production version of
address@hidden with @option{-O2}, but it might be performed by more
+aggressive @acronym{GCC} optimization options, or by other compilers.
+
address@hidden Signed Overflow Advice
address@hidden Practical Advice for Signed Overflow Issues
address@hidden integer overflow
address@hidden overflow, signed integer
address@hidden signed integer overflow
address@hidden wraparound arithmetic
+
+Ideally the safest approach is to avoid signed integer overflow
+entirely.  For example, instead of multiplying two signed integers, you
+can convert them to unsigned integers, multiply the unsigned values,
+then test whether the result is in signed range.
+
+Rewriting code in this way will be inconvenient, though, particularly if
+the signed values might be negative.  Also, it will probably hurt
+performance.  Using unsigned arithmetic to check for overflow is
+particularly painful to do portably and efficiently when dealing with an
+integer type like @code{uid_t} whose width and signedness vary from
+platform to platform.
+
+Furthermore, many C applications pervasively assume wraparound behavior
+and typically it is not easy to find and remove all these assumptions.
+Hence it is often useful to maintain nonstandard code that assumes
+wraparound on overflow, instead of rewriting the code.  The rest of this
+section attempts to give practical advice for this situation.
+
+If your code uses a signed loop index, make sure that the index cannot
+overflow, along with all signed expressions derived from the index.
+Here is a contrived example of problematic code with two instances of
+overflow.
+
address@hidden
+for (i = INT_MAX - 10; i <= INT_MAX; i++)
+  if (i + 1 < 0)
+    @{
+      report_overflow ();
+      break;
+    @}
address@hidden example
+
address@hidden
+Because of the two overflows, a compiler might optimize away or
+transform the two comparisons in a way that is incompatible with the
+wraparound assumption.
+
+If your code uses an expression like @code{(i * 2000) / 1000} and you
+actually want the multiplication to wrap around reliably, put the
+product into a temporary variable and divide that by 1000.  This
+inhibits the algebraic optimization on many platforms.
+
+If your code assumes wraparound behavior and you want to insulate it
+against any @acronym{GCC} optimizations that would fail to support that
+behavior, you should use @acronym{GCC}'s @option{-fwrapv} option, which
+causes signed overflow to wrap around reliably (except for division and
+remainder, as discussed in the next section).
+
+If you need to port to platforms where signed integer overflow does not
+reliably wrap around (e.g., due to hardware overflow checking, or to
+highly aggressive optimizations), you should consider using
address@hidden's @option{-ftrapv} option, which causes signed overflow to
+raise an exception.
+
address@hidden Signed Integer Division
address@hidden Signed Integer Division and Integer Overflow
address@hidden division, integer
+
+Overflow in signed
 integer division is not always harmless: for example, on CPUs of the
 i386 family, dividing @code{INT_MIN} by @code{-1} yields a SIGFPE signal
 which by default terminates the program.  Worse, taking the remainder
@@ -14965,14 +15246,6 @@ of these two values typically yields the
 even though the C standard requires @code{INT_MIN % -1} to yield zero
 because the expression does not overflow.

address@hidden users might consider using the
address@hidden option if they are worried about porting their code to
-the rare platforms where signed integer overflow does not wrap around
-after addition, subtraction, or multiplication.
-
-Unsigned integer overflow reliably wraps around modulo the word size.
-This is guaranteed by the C standard and is portable in practice.
-
 @node Null Pointers
 @section Properties of Null Pointers
 @cindex null pointers




reply via email to

[Prev in Thread] Current Thread [Next in Thread]