[PATCH] Modernize discussion of integer overflow

autoconf-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] Modernize discussion of integer overflow

From:	Paul Eggert
Subject:	[PATCH] Modernize discussion of integer overflow
Date:	Tue, 31 Aug 2021 16:15:51 -0700
* doc/autoconf.texi (Integer Overflow, Integer Overflow Basics)
(Signed Overflow Examples, Optimization and Wraparound):
Modernize discussion to take current compiler and Gnulib
technology into account.
---
 doc/autoconf.texi | 173 +++++++++++++++++++++++++---------------------
 1 file changed, 95 insertions(+), 78 deletions(-)

diff --git a/doc/autoconf.texi b/doc/autoconf.texi
index 590be1c4..34559414 100644
--- a/doc/autoconf.texi
+++ b/doc/autoconf.texi
@@ -21706,13 +21706,10 @@ the programs work well enough in practice.
 @cindex signed integer overflow
 @cindex wraparound arithmetic
 
-In practice many portable C programs assume that signed integer overflow wraps
-around reliably using two's complement arithmetic.  Yet the C standard
-says that program behavior is undefined on overflow, and in a few cases
-C programs do not work on some modern implementations because their
-overflows do not wrap around as their authors expected.  Conversely, in
-signed integer remainder, the C standard requires overflow
-behavior that is commonly not implemented.
+Although some traditional C programs assume that signed integer overflow
+wraps around reliably using two's complement arithmetic, the C standard
+says that program behavior is undefined on overflow, and these C
+programs many not work on many modern implementations.
 
 @menu
 * Integer Overflow Basics::     Why integer overflow is a problem
@@ -21729,7 +21726,8 @@ behavior that is commonly not implemented.
 @cindex signed integer overflow
 @cindex wraparound arithmetic
 
-In languages like C, unsigned integer overflow reliably wraps around;
+In languages like C, integer overflow wraps around for unsigned
+integer types that are at least as wide as @code{unsigned int};
 e.g., @code{UINT_MAX + 1} yields zero.
 This is guaranteed by the C standard and is
 portable in practice, unless you specify aggressive,
@@ -21740,15 +21738,18 @@ In contrast, the C standard says that signed integer 
overflow leads to
 undefined behavior where a program can do anything, including dumping
 core or overrunning a buffer.  The misbehavior can even precede the
 overflow.  Such an overflow can occur during addition, subtraction,
-multiplication, division, and left shift.
-
-Despite this requirement of the standard, many C programs and Autoconf
-tests assume that signed integer overflow silently wraps around modulo a
-power of two, using two's complement arithmetic, so long as you cast the
-resulting value to a signed integer type or store it into a signed
-integer variable.  If you use conservative optimization flags, such
-programs are generally portable to the vast majority of modern
-platforms, with a few exceptions discussed later.
+multiplication, division, and left shift.  It can even occur for
+unsigned types like @code{unsigned short int} that are narrower
+than @code{int}, as values of these types are widened to @code{int}
+before computation.
+
+Despite this requirement of the standard, some C programs assume that
+signed integer overflow silently wraps around modulo a power of two,
+using two's complement arithmetic, so long as you convert the resulting
+value to a signed integer type.  These programs can have problems,
+especially when optimization is enabled.  If you assume a GCC-like
+compiler, you can work around the problems by compiling with GCC's
+@code{-fwrapv} option; however, this is not portable.
 
 For historical reasons the C standard also allows implementations with
 ones' complement or signed magnitude arithmetic, but it is safe to
@@ -21756,9 +21757,9 @@ assume two's complement nowadays.
 
 Also, overflow can occur when converting an out-of-range value to a
 signed integer type.  Here a standard implementation must define what
-happens, but this might include raising an exception.  In practice all
-known implementations support silent wraparound in this case, so you need
-not worry about other possibilities.
+happens, and this can include raising an exception.  Although practical
+implementations typically wrap around silently in this case, a few
+debugging implementations trap instead.
 
 @node Signed Overflow Examples
 @subsection Examples of Code Assuming Wraparound Overflow
@@ -21767,14 +21768,15 @@ not worry about other possibilities.
 @cindex signed integer overflow
 @cindex wraparound arithmetic
 
-There has long been a tension between what the C standard requires for
-signed integer overflow, and what C programs commonly assume.  The
+There was long a tension between what the C standard requires for signed
+integer overflow, and what traditional C programs commonly assumed.  The
 standard allows aggressive optimizations based on assumptions that
-overflow never occurs, but many practical C programs rely on overflow
-wrapping around.  These programs do not conform to the standard, but
-they commonly work in practice because compiler writers are
-understandably reluctant to implement optimizations that would break
-many programs, unless perhaps a user specifies aggressive optimization.
+overflow never occurs, but traditionally many C programs relied on overflow
+wrapping around.  Although these programs did not conform to the standard,
+they formerly worked in practice because traditionally compilers did not
+optimize in such a way that would break the programs.  Nowadays, though,
+compilers do perform these optimizations, so portable programs can no
+longer assume reliable wraparound on signed integer overflow.
 
 The C Standard says that if a program has signed integer overflow its
 behavior is undefined, and the undefined behavior can even precede the
@@ -21801,13 +21803,6 @@ Worse, if an earlier bug in the program lets the 
compiler deduce that
 the C standard allows the compiler to optimize away the password test
 and generate code that allows superuser privileges unconditionally.
 
-Despite this requirement by the standard, it has long been common for C
-code to assume wraparound arithmetic after signed overflow, and all
-known practical C implementations support some C idioms that assume
-wraparound signed arithmetic, even if the idioms do not conform
-strictly to the standard.  If your code looks like the following
-examples it will almost surely work with real-world compilers.
-
 Here is an example derived from the 7th Edition Unix implementation of
 @code{atoi} (1979-01-10):
 
@@ -21823,7 +21818,7 @@ return (f ? -n : n);
 @noindent
 Even if the input string is in range, on most modern machines this has
 signed overflow when computing the most negative integer (the @code{-n}
-overflows) or a value near an extreme integer (the first @code{+}
+overflows) or a value near an extreme integer (the @code{+}
 overflows).
 
 Here is another example, derived from the 7th Edition implementation of
@@ -21837,8 +21832,8 @@ randx = randx * 1103515245 + 12345;
 return (randx >> 16) & 077777;
 @end example
 
-In the following example, derived from the GNU C Library 2.5
-implementation of @code{mktime} (2006-09-09), the code assumes
+In the following example, derived from the GNU C Library 2.15
+implementation of @code{mktime} (2012-03-21), the code assumes
 wraparound arithmetic in @code{+} to detect signed overflow:
 
 @example
@@ -21852,10 +21847,12 @@ if (((t1 < t) != (sec_requested < 0))
   return -1;
 @end example
 
-If your code looks like these examples, it is probably safe even though
-it does not strictly conform to the C standard.  This might lead one to
-believe that one can generally assume wraparound on overflow, but that
-is not always true, as can be seen in the next section.
+Although some of these examples will likely behave as if signed integer
+overflow wraps around reliably, other examples are likely to misbehave
+when optimization is enabled.  All these examples should be avoided in
+portable code because signed integer overflow is not reliable on modern
+systems, and it's not worth worrying about which of these examples
+happen to work on most platforms and which do not.
 
 @node Optimization and Wraparound
 @subsection Optimizations That Break Wraparound Arithmetic
@@ -21880,8 +21877,7 @@ int
 sumc (int lo, int hi)
 @{
   int sum = 0;
-  int i;
-  for (i = lo; i <= hi; i++)
+  for (int i = lo; i <= hi; i++)
     sum ^= i * 53;
   return sum;
 @}
@@ -21898,8 +21894,7 @@ transformed_sumc (int lo, int hi)
 @{
   int sum = 0;
   int hic = hi * 53;
-  int ic;
-  for (ic = lo * 53; ic <= hic; ic += 53)
+  for (int ic = lo * 53; ic <= hic; ic += 53)
     sum ^= ic;
   return sum;
 @}
@@ -21925,8 +21920,7 @@ always useful.  However, edge cases in this area can 
cause problems.
 For example:
 
 @example
-int j;
-for (j = 1; 0 < j; j *= 2)
+for (int j = 1; 0 < j; j *= 2)
   test (j);
 @end example
 
@@ -21935,11 +21929,8 @@ Here, the loop attempts to iterate through all powers 
of 2 that
 @code{int} can represent, but the C standard allows a compiler to
 optimize away the comparison and generate an infinite loop,
 under the argument that behavior is undefined on overflow.  As of this
-writing this optimization is not done by any production version of
-GCC with @option{-O2}, but it might be performed by other
-compilers, or by more aggressive GCC optimization options,
-and the GCC developers have not decided whether it will
-continue to work with GCC and @option{-O2}.
+writing this optimization is done on some platforms by
+GCC with @option{-O2}, so this code is not portable in practice.
 
 @node Signed Overflow Advice
 @subsection Practical Advice for Signed Overflow Issues
@@ -21950,25 +21941,56 @@ continue to work with GCC and @option{-O2}.
 
 Ideally the safest approach is to avoid signed integer overflow
 entirely.  For example, instead of multiplying two signed integers, you
-can convert them to unsigned integers, multiply the unsigned values,
-then test whether the result is in signed range.
+can convert them to double-width integers, multiply the wider values,
+then test whether the result is in the narrower range.  Or you can use
+more-complicated code employing unsigned integers of the same width.
 
-Rewriting code in this way will be inconvenient, though, particularly if
-the signed values might be negative.  Also, it may hurt
-performance.  Using unsigned arithmetic to check for overflow is
+Rewriting code in this way will be inconvenient, though, especially if
+the signed values might be negative and no wider type is available.
+Using unsigned arithmetic to check for overflow is
 particularly painful to do portably and efficiently when dealing with an
 integer type like @code{uid_t} whose width and signedness vary from
-platform to platform.
+platform to platform.  Also, this approach may hurt performance.
 
-Furthermore, many C applications pervasively assume wraparound behavior
-and typically it is not easy to find and remove all these assumptions.
-Hence it is often useful to maintain nonstandard code that assumes
+Hence it is often useful to maintain code that needs
 wraparound on overflow, instead of rewriting the code.  The rest of this
 section attempts to give practical advice for this situation.
 
-If your code wants to detect signed integer overflow in @code{sum = a +
-b}, it is generally safe to use an expression like @code{(sum < a) != (b
-< 0)}.
+To detect integer overflow portably when attempting operations like
+@code{sum = a + b}, you can use the @code{intprops} module of Gnulib.
+@xref{Gnulib}.  For example:
+
+@example
+#include <intprops.h>
+...
+/* Set sum = a + b, diagnosing overflow.  */
+if (!INT_ADD_OK (a, b, &sum))
+  return "integer overflow detected";
+/* Now the code can use 'sum'.  */
+@end example
+
+To add two integers with overflow wrapping around reliably in the sum,
+you can use @code{INT_ADD_WRAPV (a, b, &sum)} instead:
+
+@example
+#include <intprops.h>
+...
+/* Set sum = a + b, with wraparound.  */
+if (INT_ADD_WRAPV (a, b, &sum))
+  /* 'sum' has just the low order bits.  */;
+else
+  /* 'sum' is the correct answer.  */;
+@end example
+
+The @code{intprops} module supports similar macros for other arithmetic
+operations, e.g., @code{INT_SUBTRACT_OK} and @code{INT_MULTIPLY_WRAPV}.
+If your code is intended to run only on GCC 7 or later, you can instead
+use the GNU C primitives @code{__builtin_add_overflow},
+@code{__builtin_sub_overflow}, and @code{__builtin_mul_overflow}.
+The @code{intprops} module uses these GCC 7 primitives if available,
+so that the cost of invoking these macros is typically just one machine
+instruction for the arithmetic and another instruction for the rare
+branch on overflow.
 
 If your code uses a signed loop index, make sure that the index cannot
 overflow, along with all signed expressions derived from the index.
@@ -21976,7 +21998,7 @@ Here is a contrived example of problematic code with 
two instances of
 overflow.
 
 @example
-for (i = INT_MAX - 10; i <= INT_MAX; i++)
+for (int i = INT_MAX - 10; i <= INT_MAX; i++)
   if (i + 1 < 0)
     @{
       report_overflow ();
@@ -21989,22 +22011,18 @@ Because of the two overflows, a compiler might 
optimize away or
 transform the two comparisons in a way that is incompatible with the
 wraparound assumption.
 
-If your code uses an expression like @code{(i * 2000) / 1000} and you
-actually want the multiplication to wrap around on overflow, use
-unsigned arithmetic
-to do it, e.g., @code{((int) (i * 2000u)) / 1000}.
-
-If your code assumes wraparound behavior and you want to insulate it
+If your code is intended to be compiled only by GCC and
+assumes wraparound behavior, and you want to insulate it
 against any GCC optimizations that would fail to support that
 behavior, you should use GCC's @option{-fwrapv} option, which
 causes signed overflow to wrap around reliably (except for division and
 remainder, as discussed in the next section).
 
-If you need to port to platforms where signed integer overflow does not
-reliably wrap around (e.g., due to hardware overflow checking, or to
-highly aggressive optimizations), you should consider debugging with
-GCC's @option{-ftrapv} option, which causes signed overflow to
-raise an exception.
+If you need to write portable code and therefore cannot assume that
+signed integer overflow wraps around reliably, you should consider
+debugging with a GCC option that causes signed overflow to raise an
+exception.  These options include @option{-fsanitize=undefined} and
+@option{-ftrapv}.
 
 @node Signed Integer Division
 @subsection Signed Integer Division and Integer Overflow
@@ -22015,8 +22033,7 @@ integer division is not always harmless: for example, 
on CPUs of the
 i386 family, dividing @code{INT_MIN} by @code{-1} yields a SIGFPE signal
 which by default terminates the program.  Worse, taking the remainder
 of these two values typically yields the same signal on these CPUs,
-even though the C standard requires @code{INT_MIN % -1} to yield zero
-because the expression does not overflow.
+behavior that the C standard allows.
 
 @node Preprocessor Arithmetic
 @section Preprocessor Arithmetic
-- 
2.31.1
[Prev in Thread]
Current Thread
[Next in Thread]
[PATCH] Modernize discussion of integer overflow, Paul Eggert <=
Prev by Date: [PATCH] Port test to next m4 version
Previous by thread: [PATCH] Port test to next m4 version
Index(es):
- Date
- Thread