bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dealing with non-ASCII-safe encodings


From: Bruno Haible
Subject: Re: dealing with non-ASCII-safe encodings
Date: Sat, 06 Mar 2021 21:17:57 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-203-generic; KDE/5.18.0; x86_64; ; )

Paul Eggert wrote:
> However, my worry is that good support for non-ASCII-safe encodings like 
> Shift-JIS is hard to do, and that any such support we'd add to 
> Gnulib/coreutils/etc. would not only increase maintenance costs and 
> reduce runtime performance

Shift_JIS is not the only non-ASCII-safe encoding; GB18030, BIG5, BIG5-HKSCS,
and GBK are as well, and among these GB18030 is used as locale encoding
in China. Therefore it is important for programs to support these locale
encodings.

Gnulib has the support for it:

  - It has replacement functions that operate correctly with these locale
    encodings:
      strstr, c_strstr -> mbsstr
      strchr -> mbschr
      strrchr -> mbsrchr
      strspn -> mbsspn
      strcspn -> mbscspn
      strpbrk -> mbspbrk
      strsep -> mbssep
      strtok_r -> mbstok_r

  - It has warnings (through _GL_WARN_ON_USE) for uses of the functions
    that are not OK for non-ASCII-safe encodings.

  - It has modules mbchar, mbiter, mbfile for iterating through the
    multibyte characters of a string or file, that work for all locale
    encodings.

Yes, it does reduce the performance to use these safer functions.
I have shown in the past, through coreutils patches, how to accommodate
both a "fast path" and a "safe path" in the same binary.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]