bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is there a memory leak in find ?


From: Paul E Condon
Subject: Re: Is there a memory leak in find ?
Date: Thu, 10 Apr 2014 22:28:53 -0600
User-agent: Mutt/1.5.21 (2010-09-15)

On 20140408_1338-0600, Paul E Condon wrote:
> This is my first email to this list, so don't hesitate to correct me when I
> violate the local etiquette.
> 
> I am attempting to use find in a way for which I can find little prior
> art.  I have a huge file structure on a 3 terabyte hard drive. I want
> to generate a single file that lists all the plane files, directory
> files, and symlinks in the whole structure. I don't care how long it
> takes, but I do want it to terminate without error. When I run it, I
> can monitor memory usage using Gnome-system-monitor and I can see the
> memory usage slowly rising. 
> 
> I am using Debian Wheezy on an oldish HP desktop tower with Pentium
> dual core CPU. I have two of these 3terabyte disks and several 500gigabyte
> disk, all with USB2 interfaces. My project is an attempt at a massive
> at home data de-duplication of backup data structures that have accumulated
> over time. Are there difficulties with coreutils on these big disks? My 
> program is a script written in bash. It runs for about an hour and quits
> without any error message that I can find, at a point where it is deep in
> a very deep tree traversal, and as I say. with memory usage growing. 

I've done more testing using several different trees as input data. In
all cases the script terminates well before it has completely walked
the tree, always when the output file has grown to about 2.8GB. The
last line is always a complete well formed output line with no sign of
syserr stuff.  The project that I'm working on is my own variation of
using rsync to do frequent snapshot backups along the lines discussed
by Mike Rubel in

http://www.mikerubel.org/computers/rsync_snapshots/

In my (growing) backup tree(s) there are many files with hundreds of
hard links. Find does nothing more the report the numerical value of
the hard link count, so I don't think the hard links are the source of
the problem.  But they are a reason that the problem needs to be
solved. But first ...  you must be able to duplicate the problem. On
my hardware the program crashes a little over an hour into the
run. This same hardware can run multiple days on other problems, so an
hour is tedious to wait, but I hope you will do it.

A revised version of the program follows. It can be text copy&paste
into a file named 'bld-spcl' somewhere in your PATH for root and then
run with PWD pointing to the directory above the root directory of the
test data.  (I'm giving these instruction more as a way of telling you
what I have done than with any thought that you need my help in
figuring out what to do with what follows.)

<bof>
#!/bin/bash
this=bld-spcl
# for use in grand cleanup on disk, gfx2 starting 20140407,pec
# 
# Use the old 'dfl' field to tag f (file), q (qum), and x (aux) file sources.
# Do nothing at this stage concerning file tags, i.e. .~_"$etm"_"$Nn"f_*~
# 
# This version should demonstrate the silent failure of find on any directory
# tree of sufficient size to make the running job to run out of memory.
# Navigate a bash instance to the directory above the treeroot directory
# and give the name of the tree root as the first arg in the invocation of
# bld-spcl
###############################
#. /root/bin/arx-declares only the following are needed for this test case
export TIME_STYLE='+%Y%m%d_%H%M%S'
export fndnfmt="%i %M %n %U %G %s %TY%Tm%Td_%TH%TM%TS %p"

export arxiv="$PWD"
export treeroot=glbl
[ -n "$1" ] && treeroot="$1"

find "$treeroot" -warn -depth -mindepth 1 \
    \(  \
        \( -type d        -printf "mdx f d hst etm ${fndnfmt}\n"       \) \
    -o  \( -type l        -printf "mdx f l hst etm ${fndnfmt} -> %l\n" \) \
    -o  \( -type f -empty -printf "mdx f e hst etm ${fndnfmt}\n"       \) \
    -o  \( -type f        -printf "mdx f f hst etm ${fndnfmt}\n"       \) \
    \)  \
    > "$arxiv/find.out_$('date' $TIME_STYLE)" 2>&1
exit

#####################################
# mnemonic field names for $fndnfmt: 
#                           1   2   3   4   5   6   7   8
#      mdx fqx dfl hst etm nod mod lnx usr grp siz mtm fqnm
#       1   2   3   4   5   6   7   8   9  10  11   12  13
######################################
<eof>

Find my error. please.

-- 
Paul E Condon           
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]