[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#15450: SORT failing on some lines
From: |
Eric Blake |
Subject: |
bug#15450: SORT failing on some lines |
Date: |
Wed, 25 Sep 2013 13:41:52 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 |
tag 15450 -moreinfo
tag 15450 +notabug
thanks
On 09/25/2013 12:28 PM, address@hidden wrote:
>
> Hello Eric,
> Thank you kindly for your speedy reply.
> I should apologize for the lack of information included with my email.
> It was a hurried one.
Re-adding the list for closure, with permission.
>
> In fact your suggestions and link and a bit of tinkering have cured the
> problem. SORT works fine it seems. I should have had more faith.
> The problem was purely with Locale, which I read up on in the FAQ link
> you sent. I had looked at Locale previously but didn't seem to have any
> success with it. I had also been trying various options for SORT,
> including -i, -d and even the field separation. (-t'#' -k1,1) I didn't
> have any luck but I realized after reading through your reply that it
> was the combination of these things which hadn't come right.
>
> I'd just like to add here for anybody else who stumbles across this same
> problem, a description of the problem I was having in more detail (now
> solved)
>
> The text file was a 605MB list of title texts extracted from Wikipedia,
> separated by a #--# and followed by the 'long long' integer offsets of
> where the article appeared in the dump file. (XML)
> Example lines:
>
> Alps Electric#--#7701298893,12,24,364,394,420
> Alps Electric Co.#--#4280442890,12,28,339,3144,3170
> Alps Electric Corporation#--#9562165739,12,36,447,477,503
>
> My machine was set to en-GB locale, although I had switched this to
> en-US with same (wrong) results.
>
> It was necessary to set the locale to LC_ALL=C and also to instruct SORT
> only to look at the first field (up to the first #) using the -t'#' and
> -k1,1 switches as you mentioned.
> Obvious really, but the combination of the two is what caused my confusion.
>
> It is really worth reading up on Locale for anybody using SORT and other
> utilities as it can profoundly change the results of an operation.
> Even setting locale to en-US doesn't help, as I read in the FAQ you
> linked, because en-US quite drastically reduces sort possibilities
> (case, punctuation etc ignored)
>
> I'm sorry for the bother - but you put me on the right track.
> Many thanks for that.
Glad to hear it. As such, I've closed the bug in the tracker.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature