[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Is there a memory leak in find ?
From: |
Paul E Condon |
Subject: |
Is there a memory leak in find ? |
Date: |
Tue, 8 Apr 2014 13:38:57 -0600 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
This is my first email to this list, so don't hesitate to correct me when I
violate the local etiquette.
I am attempting to use find in a way for which I can find little prior
art. I have a huge file structure on a 3 terabyte hard drive. I want
to generate a single file that lists all the plane files, directory
files, and symlinks in the whole structure. I don't care how long it
takes, but I do want it to terminate without error. When I run it, I
can monitor memory usage using Gnome-system-monitor and I can see the
memory usage slowly rising.
I am using Debian Wheezy on an oldish HP desktop tower with Pentium
dual core CPU. I have two of these 3terabyte disks and several 500gigabyte
disk, all with USB2 interfaces. My project is an attempt at a massive
at home data de-duplication of backup data structures that have accumulated
over time. Are there difficulties with coreutils on these big disks? My
program is a script written in bash. It runs for about an hour and quits
without any error message that I can find, at a point where it is deep in
a very deep tree traversal, and as I say. with memory usage growing.
Some tips on gathering debug data would be much appreciated.
The script follows:
<begin>
#!/bin/bash
this=bld-spcl
# for use in grand cleanup on disk, gfx2 starting 20140407,pec
#
#
###############################
. /root/bin/arx-declares
export arxiv="$PWD"
find glbl -warn -depth -mindepth 1 \( \
\( -type f -empty -printf "mdx f e hst etm ${fndnfmt}\n" \) -o \
\( -type f -printf "mdx f f hst etm ${fndnfmt}\n" \) -o \
\( -type d -printf "mdx f d hst etm ${fndnfmt}\n" \) -o \
\( -type l -printf "mdx f l hst etm ${fndnfmt} -> %l\n" \) \) |& \
sed 's|\.\([0-9]\)\{10\} glbl| glbl|' > "$arxiv/find.out_$('date'
$TIME_STYLE)" 2>&1
exit
#####################################
# mnemonic field names for $fndnfmt:
# 1 2 3 4 5 6 7 8
# mdx fqx dfl hst etm nod mod lnx usr grp siz mtm fqnm
# 1 2 3 4 5 6 7 8 9 10 11 12 13
##########################################
<eof>
Relavant declarations from arx-declares are:
export TIME_STYLE='+%Y%m%d_%H%M%S'
export fndnfmt="%i %M %n %U %G %s %TY%Tm%Td_%TH%TM%TS %p"
The directory, glbl, is the root of the file tree that I am trying to
exhaustively search. The complication with sed is to remove the subsecond
decimal fractions from the output. It works correctly for a few million
times, before the time of the crash. The fixed data are placeholders to
data that will be filled in by future processing of this output file,
if/when I get this script working.
Want more information? Ask, please.
--
Paul E Condon
address@hidden
- Is there a memory leak in find ?,
Paul E Condon <=