[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep
From: |
Petr Pajas |
Subject: |
grep |
Date: |
Mon, 12 Jul 2004 12:27:50 +0200 |
User-agent: |
Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (gnu/linux) |
Hi folks,
I'm using grep to extract lines that start with '15' from a file
approx. 15MB in size. On a 3GHz Linux box it run for 1m30s. I found
that it was due to UTF-8 locales. If I switch to 8bit locales, it only
takes a fraction of a second. Strangely, it also takes only about 2s
if used with UTF-8 locales, but searching for lines that *contain* 15,
not only begin with.
$ grep --version
grep (GNU grep) 2.5.1
Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ LC_CTYPE=en_US.UTF-8 time grep '^15' u0057.lst >/dev/null
73.46user 0.19system 1:18.93elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (173major+61minor)pagefaults 0swaps
$ LC_CTYPE=en_US time grep '^15' u0057.lst >/dev/null
0.05user 0.02system 0:00.13elapsed 51%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (163major+37minor)pagefaults 0swaps
$ LC_CTYPE=en_US.UTF-8 time grep '15' u0057.lst >/dev/null
1.84user 0.01system 0:01.91elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (170major+53minor)pagefaults 0swaps
$ LC_CTYPE=en_US time grep '15' u0057.lst >/dev/null
0.07user 0.00system 0:00.13elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (160major+36minor)pagefaults 0swaps
These results make me believe there is something odd in the
implementation of either locale support or of '^'.
Thanks,
-- Petr
pgpQniVXEXEP4.pgp
Description: PGP signature
- grep, Daniel Skinner, 2004/07/01
- Re: grep, Stepan Kasal, 2004/07/05
- Re: grep, Chris F.A. Johnson, 2004/07/06
- grep,
Petr Pajas <=