bug#69369: wc -w ignores breaking space over UCHAR

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#69369: wc -w ignores breaking space over UCHAR_MAX

From:	Pádraig Brady
Subject:	bug#69369: wc -w ignores breaking space over UCHAR_MAX
Date:	Sun, 25 Feb 2024 12:16:48 +0000
User-agent:	Mozilla Thunderbird

On 24/02/2024 20:44, Aearil via GNU coreutils Bug Reports wrote:

Hi,

wc -w doesn't seem to recognize whitespace characters with a codepoint
over UCHAR_MAX (255) as word separators. For example, using the
character EM SPACE U+2003:

$ printf "foo\u2003bar" | ./wc -w
1

I should get a word count of 2, but instead the space is ignored while
counting words. Meanwhile, wc v9.4 gives the correct answer:

$ printf "foo\u2003bar" | wc -w
2

It looks like the regression has been introduced by [f40c6b5] and
would be fixed by something like the following change:

diff --git a/src/wc.c b/src/wc.c
index f5a921534..9d456f8c0 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -528,7 +528,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus, 
off_t current_pos)
                            if (width > 0)
                              linepos += width;
                          }
-                      in_word2 = !iswnbspace (wide_char);
+                      in_word2 = !iswspace (wide_char) && !iswnbspace 
(wide_char);
                      }

                    /* Count words by counting word starts, i.e., each


Nice one.
Great to catch this before release.
I've augmented your patch with a test,
and will push the attached later.

Marking this as done.

thanks!
Pádraig

wc-wide-space.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

bug#69369: wc -w ignores breaking space over UCHAR_MAX, Aearil, 2024/02/25
- bug#69369: wc -w ignores breaking space over UCHAR_MAX, Pádraig Brady <=

Prev by Date: bug#69368: [PATCH] Allow --zero with --check
Next by Date: bug#69418: test failure when no french locale is installed
Previous by thread: bug#69369: wc -w ignores breaking space over UCHAR_MAX
Next by thread: bug#69418: test failure when no french locale is installed
Index(es):
- Date
- Thread