m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch-1_4 8-bit clean translit


From: Eric Blake
Subject: branch-1_4 8-bit clean translit
Date: Sat, 11 Nov 2006 05:54:26 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.8) Gecko/20061025 Thunderbird/1.5.0.8 Mnenhy/0.7.4.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

$ cat <<\EOF | m4
translit(`«abc~', `~-»')
EOF
«

Oops - ranges that extended across the 0x7f-0x80 boundary misbehaved on
machines where char is signed.  Also, our testsuite assumes ASCII in the
translit tests, but so far no one has reported failures when porting to
EBCDIC platforms (where A-Z is more than just 26 letters), so I doubt it
is worth worrying about.

2006-11-11  Eric Blake  <address@hidden>

        * src/builtin.c: Remove unnecessary casts.
        (expand_ranges): Make 8-bit clean.
        * doc/m4.texinfo (Translit): Add tests and wording.
        * NEWS: Document this fix.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFVcgC84KuGfSFAYARAqwsAKC16u8NpG48in0OOMQslWt66JxO9QCguVHT
wpE2vBw5R1xMMN431yt6WE4=
=fE0O
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.79
diff -u -p -r1.1.1.1.2.79 NEWS
--- NEWS        1 Nov 2006 13:44:53 -0000       1.1.1.1.2.79
+++ NEWS        11 Nov 2006 12:48:54 -0000
@@ -43,7 +43,8 @@ Version 1.4.8 - ?? ??? 2006, by ??  (CVS
 * The `changecom' and `changequote' macros now treat an empty second
   argument the same as if it were missing, rather than using the empty
   string and making it impossible to end a comment or quote.
-* The `translit' macro now operates in linear instead of quadratic time.
+* The `translit' macro now operates in linear instead of quadratic time,
+  and is now eight-bit clean.
 * The `-D', `-U', `-s', and `-t' command line options now take effect
   after any files encountered earlier on the command line, rather than up
   front, as is done in traditional implementations and required by POSIX.
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.99
diff -u -p -r1.1.1.1.2.99 m4.texinfo
--- doc/m4.texinfo      8 Nov 2006 05:08:26 -0000       1.1.1.1.2.99
+++ doc/m4.texinfo      11 Nov 2006 12:48:56 -0000
@@ -2828,9 +2828,9 @@ foo
 
 The quotation strings can safely contain eight-bit characters.
 @ignore
-Yuck.  I know of no clean way to render an 8-bit character in both info
-and dvi.  This example uses the `open-guillemot' and `close-guillemot'
-characters of the Latin-1 character set.
address@hidden Yuck.  I know of no clean way to render an 8-bit character in
address@hidden both info and dvi.  This example uses the `open-guillemot' and
address@hidden `close-guillemot' characters of the Latin-1 character set.
 
 @example
 define(`a', `b')
@@ -3058,9 +3058,9 @@ changecom(`#', `')
 
 The comment strings can safely contain eight-bit characters.
 @ignore
-Yuck.  I know of no clean way to render an 8-bit character in both info
-and dvi.  This example uses the `open-guillemot' and `close-guillemot'
-characters of the Latin-1 character set.
address@hidden Yuck.  I know of no clean way to render an 8-bit character in
address@hidden both info and dvi.  This example uses the `open-guillemot' and
address@hidden `close-guillemot' characters of the Latin-1 character set.
 
 @example
 define(`a', `b')
@@ -4134,14 +4134,15 @@ translation pass is made, even if charac
 appear in @var{chars}.
 
 As a @acronym{GNU} extension, both @var{chars} and @var{replacement} can
-contain character-ranges,
-e.g., @samp{a-z} (meaning all lowercase letters) or @samp{0-9} (meaning
-all digits).  To include a dash @samp{-} in @var{chars} or
address@hidden, place it first or last.
-
-It is not an error for the last character in the range to be `larger'
-than the first.  In that case, the range runs backwards, i.e.,
address@hidden means the string @samp{9876543210}.
+contain character-ranges, e.g., @samp{a-z} (meaning all lowercase
+letters) or @samp{0-9} (meaning all digits).  To include a dash @samp{-}
+in @var{chars} or @var{replacement}, place it first or last in the
+entire string, or as the last character of a range.  Back-to-back ranges
+can share a common endpoint.  It is not an error for the last character
+in the range to be `larger' than the first.  In that case, the range
+runs backwards, i.e., @samp{9-0} means the string @samp{9876543210}.
+The expansion of a range is dependent on the underlying encoding of
+characters, so using ranges is not always portable between machines.
 
 The macro @code{translit} is recognized only with parameters.
 @end deffn
@@ -4153,17 +4154,31 @@ translit(`GNUs not Unix', `a-z', `A-Z')
 @result{}GNUS NOT UNIX
 translit(`GNUs not Unix', `A-Z', `z-a')
 @result{}tmfs not fnix
+translit(`+,-12345', `+--1-5', `<;>a-c-a')
address@hidden<;>abcba
 translit(`abcdef', `aabdef', `bcged')
 @result{}bgced
 @end example
 
-The first example deletes all uppercase letters, the second converts
-lowercase to uppercase, and the third `mirrors' all uppercase letters,
-while converting them to lowercase.  The two first cases are by far the
-most common.  The final example shows that @samp{a} is mapped to
address@hidden, not @samp{c}; the resulting @samp{b} is not further remapped
-to @samp{g}; the @samp{d} and @samp{e} are swapped, and the @samp{f} is
-discarded.
+In the @sc{ascii} encoding, the first example deletes all uppercase
+letters, the second converts lowercase to uppercase, and the third
+`mirrors' all uppercase letters, while converting them to lowercase.
+The two first cases are by far the most common, even though they are not
+portable to @sc{ebcdic} or other encodings.  The fourth example shows a
+range ending in @samp{-}, as well as back-to-back ranges.  The final
+example shows that @samp{a} is mapped to @samp{b}, not @samp{c}; the
+resulting @samp{b} is not further remapped to @samp{g}; the @samp{d} and
address@hidden are swapped, and the @samp{f} is discarded.
+
address@hidden
address@hidden No need to fight 8-bit characters, as it is difficult to get
address@hidden rendering right in both info and dvi.
+
address@hidden
+translit(`«abc~', `~-»')
address@hidden
address@hidden example
address@hidden ignore
 
 Omitting @var{chars} evokes a warning, but still produces output.
 
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.50
diff -u -p -r1.1.1.1.2.50 builtin.c
--- src/builtin.c       1 Nov 2006 22:29:08 -0000       1.1.1.1.2.50
+++ src/builtin.c       11 Nov 2006 12:48:56 -0000
@@ -359,12 +359,12 @@ numeric_arg (token_data *macro, const ch
 static char const digits[] = "0123456789abcdefghijklmnopqrstuvwxyz";
 
 static const char *
-ntoa (register eval_t value, int radix)
+ntoa (eval_t value, int radix)
 {
   bool negative;
   unsigned_eval_t uvalue;
   static char str[256];
-  register char *s = &str[sizeof str];
+  char *s = &str[sizeof str];
 
   *--s = '\0';
 
@@ -667,9 +667,9 @@ m4_dumpdef (struct obstack *obs, int arg
 
   /* Make table of symbols invisible to expand_macro ().  */
 
-  (void) obstack_finish (obs);
+  obstack_finish (obs);
 
-  qsort ((char *) data.base, data.size, sizeof (symbol *), dumpdef_cmp);
+  qsort (data.base, data.size, sizeof (symbol *), dumpdef_cmp);
 
   for (; data.size > 0; --data.size, data.base++)
     {
@@ -1645,14 +1645,14 @@ m4_substr (struct obstack *obs, int argc
 static const char *
 expand_ranges (const char *s, struct obstack *obs)
 {
-  char from;
-  char to;
+  unsigned char from;
+  unsigned char to;
 
-  for (from = '\0'; *s != '\0'; from = *s++)
+  for (from = '\0'; *s != '\0'; from = to_uchar (*s++))
     {
       if (*s == '-' && from != '\0')
        {
-         to = *++s;
+         to = to_uchar (*++s);
          if (to == '\0')
            {
              /* trailing dash */
@@ -1772,7 +1772,7 @@ static void
 substitute (struct obstack *obs, const char *victim, const char *repl,
            struct re_registers *regs)
 {
-  register unsigned int ch;
+  int ch;
 
   for (;;)
     {
@@ -2031,7 +2031,7 @@ void
 expand_user_macro (struct obstack *obs, symbol *sym,
                   int argc, token_data **argv)
 {
-  register const char *text;
+  const char *text;
   int i;
 
   for (text = SYMBOL_TEXT (sym); *text != '\0';)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]