bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF8 above U+10FFFF treated inconsistently


From: arnold
Subject: Re: UTF8 above U+10FFFF treated inconsistently
Date: Wed, 29 Sep 2021 03:55:25 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Eli Zaretskii <eliz@gnu.org> wrote:

> > If this is indeed an underlying C library issue, then I shall reach
> > out to the gnu LIBC team instead. Thanks for your time.
>
> I think it _is_ a libc issue, because Gawk uses the library function
> 'mbrlen' to parse the string into characters.

Eli is exactly right. I stepped through the code with a debugger
and mbrlen returns 4 for the invalid byte sequence.  This is on
Ubuntu 18.04.

Note that if you tested using Homebrew you're on a Mac, and not
using GLIBC.

In short, there's nothing for me to do here.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]