[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24924: GNU pr only working with singlebyte 1-width characters
From: |
Stephane Chazelas |
Subject: |
bug#24924: GNU pr only working with singlebyte 1-width characters |
Date: |
Thu, 1 Dec 2016 08:49:39 +0000 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
2016-12-01 07:04:05 +0000, Stephane Chazelas:
> 2016-11-30 18:37:05 -0800, Paul Eggert:
> [...]
> > In the meantime if you could submit a patch for the
> > documentation that should fix the immediate documentation
> > problem.
> [...]
>
> What about:
[...]
> +Please note that @command{pr} currently doesn't support multi-byte characters
> +or non-ASCII characters that have a null or double width. If such characters
> +occur in the input or column separators, column alignment may be off or lines
> +may exceed the page width. There is also no provision to support
> bidirectional
> +text.
[...]
Actually, it seems it can also truncate lines in the middle of
some characters though it seems it's confined to multibyte
characters that have byte values <= 127 like:
$ locale charmap
BIG5-HKSCS
$ printf '\ue9\ue9\ue9\n' | pr -w5 -t2 | hd
00000000 88 6d 88 6d 88 0a |.m.m..|
00000006
See how that third é (0x88 0x6d in BIG5-HKSCS) was truncated in
the middle.
It's as if it was considering all byte values >= 128 as having
zero width in multi-byte locales (and only in multi-byte
locales, that doesn't seem to occur in single-byte ones).
So maybe:
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index cc85f22..15088ce 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -1838,6 +1838,13 @@ For single
column output no line truncation occurs by default. Use @option{-W} option to
truncate lines in that case.
+Please note that @command{pr} currently doesn't support multi-byte characters
+or non-ASCII characters that have a null or double width. If such characters
+occur in the input or column separators, column alignment may be off or lines
+may exceed the page width, or truncation may occur in the middle of some
+characters producing invalid text output. There is also no provision to support
+bidirectional text.
+
The following changes were made in version 1.22i and apply to later
versions of @command{pr}:
@c FIXME: this whole section here sounds very awkward to me. I
--
Stephane