coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] tr: case mapping anomaly


From: Eric Blake
Subject: Re: [coreutils] tr: case mapping anomaly
Date: Wed, 29 Sep 2010 08:01:04 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc13 Mnenhy/0.8.3 Thunderbird/3.1.4

On 09/29/2010 06:40 AM, Pádraig Brady wrote:
+    # Ensure the size of the case classes are accounted
+    # for as a unit.
+    echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' |
+    tr '[:upper:]A-B' '[:lower:]0'>out ||  _fail=1
+    echo '00cdefghijklmnopqrstuvwxyz'>  exp

Huh?  A and B are both in [:upper:]; when a character is listed more
than once in string1, it is only transliterated according to the first
listing.  I think this should be 'abc...' not '00c...' for the expected
results.

Does POSIX specify that?

Hmm - POSIX appears to be silent for both tr and m4's translit (but whereas 'tr long short' is unspecified and we extend the last byte of short to match, m4's translit(data,long,short) is explicitly documented as deleting bytes from long with no match in short).

That's not what we do, nor what I would expect.

$ echo 'A' | LANG=C tr 'AA' '01'
1

And Solaris' tr does this as well.

But m4 behaves in the way I specified:

$ echo 'translit(a,aa,01)' | m4
0
$ echo 'translit(a,aa,01)' | /usr/ccs/bin/m4
0

Time for me to ask for clarification from the Austin Group, I suppose. But given existing practice, I guess the argument should be that tr is explicitly different than m4.

--
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]