m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: branch-1_4 8-bit clean translit


From: Eric Blake
Subject: Re: branch-1_4 8-bit clean translit
Date: Sat, 11 Nov 2006 06:58:46 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.8) Gecko/20061025 Thunderbird/1.5.0.8 Mnenhy/0.7.4.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 11/11/2006 5:54 AM:
> 
> 2006-11-11  Eric Blake  <address@hidden>
> 
>       * src/builtin.c: Remove unnecessary casts.
>       (expand_ranges): Make 8-bit clean.

Ported to head as follows:

2006-11-11  Eric Blake  <address@hidden>

        * m4/macro.c (trace_format): Use canonical type name.
        * m4/output.c (m4_freeze_diversions): Likewise.
        * src/freeze.c (produce_module_dump, dump_symbol_CB)
        (produce_frozen_state): Likewise.
        * m4/m4private.h (to_uchar): Grab from branch.
        * m4/input.c (string_peek, string_read): Use it.
        * m4/utility.c (skip_space): Likewise.
        * src/main.c (main): Likewise.
        * doc/m4.texinfo (Translit): Remerge from branch.
        * tests/builtins.at (translit): Test 8-bit range.
        * modules/m4.c (m4_expand_ranges): Merge from branch.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFVdcW84KuGfSFAYARAuSRAJoDCm5zj5rMti1TzJVrCTFLZ19KiACfTppi
OzKurlD1d32dP0v6G0q85zs=
=jsLg
-----END PGP SIGNATURE-----
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.77
diff -u -p -r1.77 m4.texinfo
--- doc/m4.texinfo      8 Nov 2006 19:06:00 -0000       1.77
+++ doc/m4.texinfo      11 Nov 2006 13:57:48 -0000
@@ -4904,14 +4904,15 @@ translation pass is made, even if charac
 appear in @var{chars}.
 
 As a @acronym{GNU} extension, both @var{chars} and @var{replacement} can
-contain character-ranges,
-e.g., @samp{a-z} (meaning all lowercase letters) or @samp{0-9} (meaning
-all digits).  To include a dash @samp{-} in @var{chars} or
address@hidden, place it first or last.
-
-It is not an error for the last character in the range to be `larger'
-than the first.  In that case, the range runs backwards, i.e.,
address@hidden means the string @samp{9876543210}.
+contain character-ranges, e.g., @samp{a-z} (meaning all lowercase
+letters) or @samp{0-9} (meaning all digits).  To include a dash @samp{-}
+in @var{chars} or @var{replacement}, place it first or last in the
+entire string, or as the last character of a range.  Back-to-back ranges
+can share a common endpoint.  It is not an error for the last character
+in the range to be `larger' than the first.  In that case, the range
+runs backwards, i.e., @samp{9-0} means the string @samp{9876543210}.
+The expansion of a range is dependent on the underlying encoding of
+characters, so using ranges is not always portable between machines.
 
 The macro @code{translit} is recognized only with parameters.
 @end deffn
@@ -4923,17 +4924,21 @@ translit(`GNUs not Unix', `a-z', `A-Z')
 @result{}GNUS NOT UNIX
 translit(`GNUs not Unix', `A-Z', `z-a')
 @result{}tmfs not fnix
+translit(`+,-12345', `+--1-5', `<;>a-c-a')
address@hidden<;>abcba
 translit(`abcdef', `aabdef', `bcged')
 @result{}bgced
 @end example
 
-The first example deletes all uppercase letters, the second converts
-lowercase to uppercase, and the third `mirrors' all uppercase letters,
-while converting them to lowercase.  The two first cases are by far the
-most common.  The final example shows that @samp{a} is mapped to
address@hidden, not @samp{c}; the resulting @samp{b} is not further remapped
-to @samp{g}; the @samp{d} and @samp{e} are swapped, and the @samp{f} is
-discarded.
+In the @sc{ascii} encoding, the first example deletes all uppercase
+letters, the second converts lowercase to uppercase, and the third
+`mirrors' all uppercase letters, while converting them to lowercase.
+The two first cases are by far the most common, even though they are not
+portable to @sc{ebcdic} or other encodings.  The fourth example shows a
+range ending in @samp{-}, as well as back-to-back ranges.  The final
+example shows that @samp{a} is mapped to @samp{b}, not @samp{c}; the
+resulting @samp{b} is not further remapped to @samp{g}; the @samp{d} and
address@hidden are swapped, and the @samp{f} is discarded.
 
 Omitting @var{chars} evokes a warning, but still produces output.
 
Index: m4/input.c
===================================================================
RCS file: /sources/m4/m4/m4/input.c,v
retrieving revision 1.56
diff -u -p -r1.56 input.c
--- m4/input.c  27 Oct 2006 17:03:51 -0000      1.56
+++ m4/input.c  11 Nov 2006 13:57:49 -0000
@@ -450,7 +450,7 @@ static struct input_funcs string_funcs =
 static int
 string_peek (m4_input_block *me)
 {
-  int ch = (unsigned char) *me->u.u_s.current;
+  int ch = to_uchar (*me->u.u_s.current);
 
   return (ch == '\0') ? CHAR_RETRY : ch;
 }
@@ -459,7 +459,7 @@ static int
 string_read (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
             bool retry M4_GNUC_UNUSED)
 {
-  int ch = (unsigned char) *me->u.u_s.current;
+  int ch = to_uchar (*me->u.u_s.current);
   if (ch == '\0')
     return CHAR_RETRY;
   me->u.u_s.current++;
Index: m4/m4private.h
===================================================================
RCS file: /sources/m4/m4/m4/m4private.h,v
retrieving revision 1.70
diff -u -p -r1.70 m4private.h
--- m4/m4private.h      31 Oct 2006 02:24:50 -0000      1.70
+++ m4/m4private.h      11 Nov 2006 13:57:49 -0000
@@ -352,6 +352,16 @@ struct m4__search_path_info {
 extern void m4__include_init (m4 *);
 
 
+/* Convert a possibly-signed character to an unsigned character.  This is
+   a bit safer than casting to unsigned char, since it catches some type
+   errors that the cast doesn't.  */
+#if HAVE_INLINE
+static inline unsigned char to_uchar (char ch) { return ch; }
+#else
+# define to_uchar(C) ((unsigned char) (C))
+#endif
+
+
 /* Debugging the memory allocator.  */
 
 #if WITH_DMALLOC
Index: m4/macro.c
===================================================================
RCS file: /sources/m4/m4/m4/macro.c,v
retrieving revision 1.62
diff -u -p -r1.62 macro.c
--- m4/macro.c  27 Oct 2006 17:03:51 -0000      1.62
+++ m4/macro.c  11 Nov 2006 13:57:49 -0000
@@ -580,7 +580,7 @@ trace_format (m4 *context, const char *f
            size_t z = va_arg (args, size_t);
            char nbuf[INT_BUFSIZE_BOUND (size_t)];
 
-           sprintf (nbuf, "%lu", (unsigned long) z);
+           sprintf (nbuf, "%lu", (unsigned long int) z);
            s = nbuf;
          }
          break;
Index: m4/output.c
===================================================================
RCS file: /sources/m4/m4/m4/output.c,v
retrieving revision 1.37
diff -u -p -r1.37 output.c
--- m4/output.c 8 Nov 2006 05:11:47 -0000       1.37
+++ m4/output.c 11 Nov 2006 13:57:49 -0000
@@ -765,11 +765,11 @@ m4_freeze_diversions (m4 *context, FILE 
                 fix frozen file format to support 64-bit
                 integers.  */
              if (file_stat.st_size < 0
-                 || file_stat.st_size != (unsigned long) file_stat.st_size)
+                 || file_stat.st_size != (unsigned long int) file_stat.st_size)
                m4_error (context, EXIT_FAILURE, errno,
                          _("diversion too large"));
              fprintf (file, "D%d,%lu", diversion->divnum,
-                      (unsigned long) file_stat.st_size);
+                      (unsigned long int) file_stat.st_size);
            }
 
          m4_insert_diversion_helper (context, diversion, node);
Index: m4/utility.c
===================================================================
RCS file: /sources/m4/m4/m4/utility.c,v
retrieving revision 1.54
diff -u -p -r1.54 utility.c
--- m4/utility.c        13 Oct 2006 16:46:47 -0000      1.54
+++ m4/utility.c        11 Nov 2006 13:57:49 -0000
@@ -62,7 +62,7 @@ m4_bad_argc (m4 *context, int argc, m4_s
 static const char *
 skip_space (m4 *context, const char *arg)
 {
-  while (m4_has_syntax (M4SYNTAX, (unsigned char) *arg, M4_SYNTAX_SPACE))
+  while (m4_has_syntax (M4SYNTAX, to_uchar (*arg), M4_SYNTAX_SPACE))
     arg++;
   return arg;
 }
Index: modules/m4.c
===================================================================
RCS file: /sources/m4/m4/modules/m4.c,v
retrieving revision 1.91
diff -u -p -r1.91 m4.c
--- modules/m4.c        7 Nov 2006 19:18:10 -0000       1.91
+++ modules/m4.c        11 Nov 2006 13:57:49 -0000
@@ -920,8 +920,8 @@ M4BUILTIN_HANDLER (substr)
 const char *
 m4_expand_ranges (const char *s, m4_obstack *obs)
 {
-  char from;
-  char to;
+  unsigned char from;
+  unsigned char to;
 
   assert (obstack_object_size (obs) == 0);
   for (from = '\0'; *s != '\0'; from = *s++)
Index: src/freeze.c
===================================================================
RCS file: /sources/m4/m4/src/freeze.c,v
retrieving revision 1.54
diff -u -p -r1.54 freeze.c
--- src/freeze.c        27 Oct 2006 17:03:51 -0000      1.54
+++ src/freeze.c        11 Nov 2006 13:57:49 -0000
@@ -142,7 +142,7 @@ produce_module_dump (FILE *file, lt_dlha
   if (handle)
     produce_module_dump (file, handle);
 
-  fprintf (file, "M%lu\n", (unsigned long) strlen (name));
+  fprintf (file, "M%lu\n", (unsigned long int) strlen (name));
   fputs (name, file);
   fputc ('\n', file);
 }
@@ -168,10 +168,10 @@ dump_symbol_CB (m4_symbol_table *symtab,
   if (m4_is_symbol_text (symbol))
     {
       fprintf (file, "T%lu,%lu",
-              (unsigned long) strlen (symbol_name),
-              (unsigned long) strlen (m4_get_symbol_text (symbol)));
+              (unsigned long int) strlen (symbol_name),
+              (unsigned long int) strlen (m4_get_symbol_text (symbol)));
       if (handle)
-       fprintf (file, ",%lu", (unsigned long) strlen (module_name));
+       fprintf (file, ",%lu", (unsigned long int) strlen (module_name));
       fputc ('\n', file);
 
       fputs (symbol_name, file);
@@ -189,12 +189,12 @@ dump_symbol_CB (m4_symbol_table *symtab,
        assert (!"INTERNAL ERROR: builtin not found in builtin table!");
 
       fprintf (file, "F%lu,%lu",
-              (unsigned long) strlen (symbol_name),
-              (unsigned long) strlen (bp->name));
+              (unsigned long int) strlen (symbol_name),
+              (unsigned long int) strlen (bp->name));
 
       if (handle)
        fprintf (file, ",%lu",
-                (unsigned long) strlen (module_name));
+                (unsigned long int) strlen (module_name));
       fputc ('\n', file);
 
       fputs (symbol_name, file);
@@ -241,8 +241,8 @@ produce_frozen_state (m4 *context, const
       || strcmp (m4_get_syntax_rquote (M4SYNTAX), DEF_RQUOTE))
     {
       fprintf (file, "Q%lu,%lu\n",
-              (unsigned long) context->syntax->lquote.length,
-              (unsigned long) context->syntax->rquote.length);
+              (unsigned long int) context->syntax->lquote.length,
+              (unsigned long int) context->syntax->rquote.length);
       fputs (context->syntax->lquote.string, file);
       fputs (context->syntax->rquote.string, file);
       fputc ('\n', file);
@@ -254,8 +254,8 @@ produce_frozen_state (m4 *context, const
       || strcmp (m4_get_syntax_ecomm (M4SYNTAX), DEF_ECOMM))
     {
       fprintf (file, "C%lu,%lu\n",
-              (unsigned long) context->syntax->bcomm.length,
-              (unsigned long) context->syntax->ecomm.length);
+              (unsigned long int) context->syntax->bcomm.length,
+              (unsigned long int) context->syntax->ecomm.length);
       fputs (context->syntax->bcomm.string, file);
       fputs (context->syntax->ecomm.string, file);
       fputc ('\n', file);
Index: src/main.c
===================================================================
RCS file: /sources/m4/m4/src/main.c,v
retrieving revision 1.101
diff -u -p -r1.101 main.c
--- src/main.c  8 Nov 2006 19:06:00 -0000       1.101
+++ src/main.c  11 Nov 2006 13:57:49 -0000
@@ -395,7 +395,7 @@ main (int argc, char *const *argv, char 
        /* In 1.4.x, -B<num> was a no-op option for compatibility with
           Solaris m4.  Warn if optarg is all numeric.  FIXME -
           silence this warning after 2.0.  */
-       if (isdigit ((unsigned char) *optarg))
+       if (isdigit (to_uchar (*optarg)))
          {
            char *end;
            errno = 0;
Index: tests/builtins.at
===================================================================
RCS file: /sources/m4/m4/tests/builtins.at,v
retrieving revision 1.32
diff -u -p -r1.32 builtins.at
--- tests/builtins.at   8 Nov 2006 04:26:53 -0000       1.32
+++ tests/builtins.at   11 Nov 2006 13:57:49 -0000
@@ -932,6 +932,12 @@ AT_DATA([[in]],
 AT_CHECK_M4([in], [0], [[c]m4_for([i],[1],[5000],[],[[d]])
 ])
 
+dnl This validates that ranges are built using unsigned chars.
+AT_DATA([in], [[translit(`«abc~', `~-»')
+]])
+AT_CHECK_M4([in], [0], [[abc
+]])
+
 AT_CLEANUP
 
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]