sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

stop octet storm in --debug (UTF-8)


From: Hideo Haga
Subject: stop octet storm in --debug (UTF-8)
Date: Mon, 17 Jun 2019 06:15:11 +0900
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0

Sample(from online manual
 5.9 Multibyte characters and Locale Considerations)

please set LANG=en_US.UTF-8
(or LANG=ja_JP.UTF-8)
in UTF-8 \u03A3b is 0xCE 0xA3

```
$ printf 'a\u03A3b' | sed 's/./X/g'
XXX
$ printf 'a\u03A3b' | sed --debug 's/./X/g'
SED PROGRAM:
  s/./X/g
INPUT:   'STDIN' line 1
PATTERN: a\o37777777716\o37777777643b
COMMAND: s/./X/g
MATCHED REGEX REGISTERS
  regex[0] = 0-1 'a'
PATTERN: XXX
END-OF-CYCLE:
XXX
```

then, all UTF-8 multibytes is displayed 11-digits octet.

I want to stop octet storm,

and Big ambition, all unicod-er want hex rather than octet.
because all most uni-code list by hex.

only stop storm, adding only mask 0xff.

```
diff --git a/sed/debug.c b/sed/debug.c
index 9ec37b6..4c40b97 100644
--- a/sed/debug.c
+++ b/sed/debug.c
@@ -66,7 +66,7 @@ debug_print_char (char c)
       break;

     default:
-      printf ("o%03o", (unsigned int) c);
+      printf ("x%02x", (unsigned int) c & 0xff);
     }
 }
```

...but in l command not octet storm.
why? l command (do_list functon), get c by "unsigned char" from buffer --> cast to int. but, in debug_print_char function get c by "(sigined) char" --> cast to int (minus value spread to 64bit) --> (unsined int) c --> printf.


--
------------------------------
Hideo Haga<address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]