bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sed cannot process non-ASCII characters correctly


From: Bruno Haible
Subject: sed cannot process non-ASCII characters correctly
Date: Tue, 8 May 2001 15:42:36 +0200 (CEST)

Hi,

sed 3.02 has severe problems with multibyte character encodings.
According to SUSV2, the LANG/LC_CTYPE/LC_ALL environment variables should
influence the character notion of sed. But it doesn't in sed-3.02.

A test script is appended below, to be executed in an UTF-8 locale (e.g.
glibc-2.2.2 ko_KR.UTF-8 locale). The regexp engine in glibc-2.2.2 has now
all i18n support. The remaining problem in sed is:

sed doesn't call setlocale, and thus ignores the user's
LANG/LC_CTYPE/LC_ALL environment variables.

Here is a fix for it.

2001-05-05  Bruno Haible  <address@hidden>

        * configure.in: Test for setlocale.
        * sed/sed.c: Include locale.h.
        (main): Call setlocale.

*** sed-3.02/configure.in.bak   Sun Aug  2 02:38:33 1998
--- sed-3.02/configure.in       Sun May  6 01:28:48 2001
***************
*** 99,105 ****
  AC_FUNC_VPRINTF
  
  AC_REPLACE_FUNCS(memchr memcmp memmove strerror)
! AC_CHECK_FUNCS(isatty  bcopy bzero isascii memcpy)
  
  AC_ARG_PROGRAM
  AC_OUTPUT(Makefile djgpp/Makefile doc/Makefile dnl
--- 99,105 ----
  AC_FUNC_VPRINTF
  
  AC_REPLACE_FUNCS(memchr memcmp memmove strerror)
! AC_CHECK_FUNCS(isatty  bcopy bzero isascii memcpy setlocale)
  
  AC_ARG_PROGRAM
  AC_OUTPUT(Makefile djgpp/Makefile doc/Makefile dnl
*** sed-3.02/sed/sed.c.bak      Fri Jul  3 03:06:26 1998
--- sed-3.02/sed/sed.c  Sun May  6 01:27:48 2001
***************
*** 33,38 ****
--- 33,40 ----
  # include <stdlib.h>
  #endif
  
+ #include <locale.h>
+ 
  #ifdef HAVE_MMAP
  # ifdef HAVE_UNISTD_H
  #  include <unistd.h>
***************
*** 129,134 ****
--- 131,141 ----
    flagT bad_input;    /* If this variable is non-zero at exit, one or
                           more of the input files couldn't be opened. */
  
+ #ifdef HAVE_SETLOCALE
+   /* Set locale via LC_ALL.  */
+   setlocale (LC_ALL, "");
+ #endif
+ 
    POSIXLY_CORRECT = (getenv("POSIXLY_CORRECT") != NULL);
  #ifdef STUB_FROM_RX_LIBRARY_USAGE
    if (!rx_default_cache)


2) The autoconfiguration fails to recognize the regex in glibc and uses its
own. The user has to configure "--with-regex=" so that lib/regex.o is not
built. This should be fixed to use glibc's regex by default if the system
is glibc 2.2.2 or newer.


Bruno


begin 644 sed-sample-run-good
M)"!E8VAO(,address@hidden"address@hidden@)W,O7"@N7"E<,2]<,2\G"L.D"address@hidden
C;R##I,address@hidden"address@hidden@)W,O6\.D72\O9R<*P[;#O`H`
`
end
begin 644 sed-sample-run-bad
M)"!E8VAO(,address@hidden"address@hidden@)W,O7"@N7"E<,2]<,2\G"L.DPZ0*)"!E
F8VAO(,.DP[;#O"!\('-E9"`M92`G<R];PZ1=+R]G)PK#O>^_O0H`
`
end



reply via email to

[Prev in Thread] Current Thread [Next in Thread]