coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determination of file lists from selected folders without returning


From: Kaz Kylheku (Coreutils)
Subject: Re: Determination of file lists from selected folders without returning directory names
Date: Tue, 18 Jul 2017 07:37:47 -0700
User-agent: Roundcube Webmail/0.9.2

On 18.07.2017 01:17, SF Markus Elfring wrote:
I imagine that there are more advanced possibilities to improve the software
run time characteristics for this use case.

Well, if it has to be fast, perhaps don't write the code in the shell language.

To which “shell” would you like to refer to?

The "Shell Command Language", called by that name in POSIX, and to its
dialects.

Even an interpreted scripting language that can do string handling without resorting to fork()-based command substitution will beat the shell at many tasks.

How do you think about additional approaches to reduce the forking of
special processes?

I think: don't do text processing whose speed matters in a language where you have to even think about the issue "how do I reduce fork() occurrences
in string processing code" and in which you don't even know whether
some command involves a fork or is a built-in primitive.

If you've resigned to developing something in the shell, and that something
has to process many items of data, try not to write a shell loop for the
task, and try to avoid idioms which run a process for each item.
Rather, coordinate commands which do the heavy lifting.

If I had to strip a large number of paths to their basenames, and it had
to be done in portable shell code, I would filter those names through sed:
one process invocations and some pipe inter-process I/O.

I.e. we can use the basename function:

  for name in dir/*txt; do
    basename "$name"
  done

prints the basenames of the matching files, one per line.

There is also the GNU variant available for such a command.

   for X in $(basename --suffix=.txt dir/*txt); do my_work $X; done

What you're doing here is destroying the validity of these expanded
paths; the "my_work" command or function cannot access things
through these paths, unless it restores the "dir/" prefix,
which it has not been given as an input.

When you expand dir/*txt, each one of the expansions is a correct
relative path to an object. The stripped basenames aren't.

Whatever "my_work" is doing, if it involves accessing the files,
you're probably making its job more difficult.

But how often can it be avoided to delete extra data like prefixes
(and suffixes)?

Pretty much all of the time.

Can it occasionally be a bit more efficient to provide only the essential
values at the beginning of an algorithm so that so they will be
extended on demand?

That sounds like a generic description of the whole body of "lazy"
or "late binding" techniques; but it's unclear how it is supposed
to apply here.

Maybe "my_work" could be given relative paths that resolve; if it needs
shortened names for some reason, let it compute them.

Or "my_work" could be given a quoted pattern:

   my_work '*.txt'

then it can expand it as needed, in whatever directory it wants.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]