bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70540: grep -c -r | grep -v ':0$'


From: Dennis Clarke
Subject: bug#70540: grep -c -r | grep -v ':0$'
Date: Wed, 24 Apr 2024 08:55:49 -0400
User-agent: Mozilla Thunderbird

On 4/23/24 14:32, Dale R. Worley wrote:
> At least once a week, and often several times a day ...

Dear Sir :

    This is a task I can certainly relate to.  Dragging through massive
storage servers with find and grep is a terrible way to get things done.

> I want to search a tree of files to list the files in a directory
> containing a pattern ...

    That is usually the easy part of the problem.

> along with the *numbers* of patterns in the files.

    That is not the easy part.

> Usually this is because I'm looking for a file that contains a number
> of instances of the pattern, from among which I will choose to copy
> something.

    Perhaps a specific example would be helpful. Do you mean to say that
you run "find" on a directory "./foo" and you search for all filenames
that have a case sensitive pattern "BaR" in the filename? Then within
the result set of filenames you count the instances of the string "BaR"
inside the files that match? Are you only searching text files or will
there be multi-lingual UTF-8 char encoded files? What about binary bit
pattern match?

> But often the total number of files to be examined is large, and the
> total number of matches in any file might also be large.

    Here the word "large" can be tens of millions of files or perhaps
even billions or trillions. Not sure what large means but certainly we
are in the region of something possible with a decent modern server.

> So "grep -r" is inconvenient, because it may return many more matches
> than I want to examine, and it can be hard to see what all the
> alternative files are among the large number of matches that can be
> returned from any one file.
>

    Without really understanding the problem you are trying to solve I
have the sudden feeling what you really want is a custom written bit of
code that walks down the directory structure and then does the read and
inspection of each filename that matches some pattern. Making changes to
grep for that purpose feels like making changes to a good working hammer
in order to produce a chainsaw.  However I am not sure what you mean by
counting a "instances of the pattern". I have to guess that you want any
filename with a pattern match AND twelve or fifty thousand instances of
that pattern within the contents of the file.


>
> What do people think?
>
> Dale

    I think I want to setup an experiment and test this problem.


--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken





reply via email to

[Prev in Thread] Current Thread [Next in Thread]