more on failing test 'invalid-mb-seq-UMR.sh'

sed-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

more on failing test 'invalid-mb-seq-UMR.sh'

From:	Assaf Gordon
Subject:	more on failing test 'invalid-mb-seq-UMR.sh'
Date:	Sun, 5 Jun 2016 01:16:17 -0400

Hello,

The test 'invalid-mb-seq-UMR.sh' still fails on few systems even with the 
latest update [1].

The test uses valgrind to ensure invalid multibyte sequence does not cause 
uninitialized memory access,
but also validates the returned output (after the invalid multiple sequence is 
processed).
The failure seems to be that the returned value does not match the expected 
result (so not valgrind or invalid memory access).

At least on Mac OS X 10.10, it seems the locale 'ja_JP.EUCJP' behaves 
differently than other locales, and also differ from the same locale on Debian, 
when presented with invalid input.

The following demonstrates:

On Mac OS X (ja_JP.eucJP results differ):

    $ for l in $(locale -a | grep ja_JP) ; do
        echo "Locale: $l" ;
        echo a | LC_ALL="$l" ./sed/sed 's/a/b\U\xb2c/' | od -tx1co1 ;
    done

    Locale: ja_JP
    0000000  62  b2  43  0a
              b   ≤   C  \n
            142 262 103 012

    Locale: ja_JP.eucJP
    0000000  62  b2  e3  0a
              b   ≤   „  \n
            142 262 343 012

    Locale: ja_JP.SJIS
    0000000  62  b2  43  0a
              b   ≤   C  \n
            142 262 103 012

    Locale: ja_JP.UTF-8
    0000000  62  b2  43  0a
              b   ≤   C  \n
            142 262 103 012


While on Debian 8.4, all locales return the same result:

    $ for l in $(locale -a | grep ja_JP) ; do
        echo "Locale: $l" ;
        echo a | LC_ALL="$l" ./sed/sed 's/a/b\U\xb2c/' | od -tx1co1 ;
    done

    Locale: ja_JP
    0000000  62  b2  43  0a
              b 262   C  \n
            142 262 103 012

    Locale: ja_JP.eucjp
    0000000  62  b2  43  0a
              b 262   C  \n
            142 262 103 012

    Locale: ja_JP.ujis
    0000000  62  b2  43  0a
              b 262   C  \n
            142 262 103 012

    Locale: ja_JP.utf8
    0000000  62  b2  43  0a
              b 262   C  \n
            142 262 103 012



I'm not sure where the problem is (also not that familiar with EUC encodings), 
but I'll continue to investigate.

regards,
 - assaf


[1] 
http://git.savannah.gnu.org/cgit/sed.git/commit/?id=49a0f87d9bbc66038de74afb9c25a53cd89a4ec5

[Prev in Thread]

Current Thread

[Next in Thread]

more on failing test 'invalid-mb-seq-UMR.sh', Assaf Gordon <=
- Re: more on failing test 'invalid-mb-seq-UMR.sh', Assaf Gordon, 2016/06/17
  - Re: more on failing test 'invalid-mb-seq-UMR.sh', Assaf Gordon, 2016/06/17

Prev by Date: Re: updating sed's web page
Next by Date: Re: tests
Previous by thread: updating sed's web page
Next by thread: Re: more on failing test 'invalid-mb-seq-UMR.sh'
Index(es):
- Date
- Thread