[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF8 above U+10FFFF treated inconsistently
From: |
arnold |
Subject: |
Re: UTF8 above U+10FFFF treated inconsistently |
Date: |
Wed, 29 Sep 2021 03:55:25 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Eli Zaretskii <eliz@gnu.org> wrote:
> > If this is indeed an underlying C library issue, then I shall reach
> > out to the gnu LIBC team instead. Thanks for your time.
>
> I think it _is_ a libc issue, because Gawk uses the library function
> 'mbrlen' to parse the string into characters.
Eli is exactly right. I stepped through the code with a debugger
and mbrlen returns 4 for the invalid byte sequence. This is on
Ubuntu 18.04.
Note that if you tested using Homebrew you're on a Mac, and not
using GLIBC.
In short, there's nothing for me to do here.
Thanks,
Arnold