help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Octave suddenly slow


From: John W. Eaton
Subject: Re: Octave suddenly slow
Date: Mon, 17 Nov 2008 20:30:41 -0500

On 17-Nov-2008, Rob Mahurin wrote:

| On Nov 17, 2008, at 3:32 PM, Scott A. McDermott wrote:
| > Michael Goffioul, kudos sir, you came the closest:
| >> Sometimes, this kind of problem is triggered by having a large
| >> number of m-files in the current directory. But it does not look
| >> to be the case for you.
| >
| > Ends up that it's a large number of /any/ file in the current  
| > directory
| > induces this problem.  Pretty amazing --- even an operation that has
| > absolutely nothing to do with files (like "for x=1:100,x,end") sits
| > there for second after second before starting up, if there are a large
| > number of files in the current directory.  (My accidental "test case"
| > had just over 8000 data files.)
| 
| I can reproduce this bug using octave 3.0.2 on Mac OSX and Linux.
| 
| >> $ time echo "printf('hi\n')" | octave -q
| >> hi
| >> real    0m4.468s
| >>
| >> $ time seq 1e4 | xargs touch # make ten thousand files
| >> real    0m3.579s
| >>
| >> $ time echo "printf('hi\n')" | octave -q
| >> hi
| >> real    0m6.692s
| >>
| >> $ time seq 1e5 | xargs touch # make 100,000 files
| >> real    0m41.446s
| >>
| >> $ time echo "printf('hi\n')" | octave -q
| >> hi
| >> real    8m34.034s
| 
| 
| Adding 10^4 empty files to a directory makes octave open a couple  
| seconds slower; adding 10^5 empty files makes it take eight minutes.   
| I have heard of bugs in filesystems that cause this problem, but that  
| doesn't seem to be the case since "ls | wc" is fast in the shell.
| 
| The filesystem bug I remember (or maybe an ls bug? or both?) was an  
| inefficient sort in the code that lists the names of files in a  
| directory.  Poking through the code, I don't see anything like that.
| 
| Hmmmm ... dir_entry::read() seems to guess that it will live in a  
| small directory.  For the case I've outlined above, the attached  
| patch should reduce the number of calls to Array::resize() from ~1000  
| to ~10.  Not tested, though.

I checked in the following change instead.  It seems to improve
things considerably for me.  On my system with 3.0.1 your test above
takes about 40 seconds when run in a directory with about 54,000
files.  With the current sources and the patch, it takes less than 6
seconds.  There is still a penalty any time a directory with a large
number of files is added to the load path because Octave caches the
contents of all directories that are added to the load path.  I don't
see a way around that.

jwe


# HG changeset patch
# User John W. Eaton <address@hidden>
# Date 1226970263 18000
# Node ID 545b9f62adcfc8abc2216b75afc3b1e585f6fc2e
# Parent  b93ac0586e4bca87c524fceda96600f4df29986c
dir-ops.cc (dir_entry::read): use std::list<std::string> to cache names before 
converting to string_vector

diff --git a/liboctave/ChangeLog b/liboctave/ChangeLog
--- a/liboctave/ChangeLog
+++ b/liboctave/ChangeLog
@@ -1,3 +1,8 @@
+2008-11-17  John W. Eaton  <address@hidden>
+
+       * dir-ops.cc (dir_entry::read): Use std::list<std::string> to
+       cache names before converting to string_vector.
+
 2008-11-14  David Bateman  <address@hidden>
 
        * Array2.h (Array2<T> Array2<T>::index): Correct use of
diff --git a/liboctave/dir-ops.cc b/liboctave/dir-ops.cc
--- a/liboctave/dir-ops.cc
+++ b/liboctave/dir-ops.cc
@@ -27,6 +27,9 @@
 #include <cerrno>
 #include <cstdlib>
 #include <cstring>
+
+#include <list>
+#include <string>
 
 #include "sysdir.h"
 
@@ -69,40 +72,26 @@
 string_vector
 dir_entry::read (void)
 {
-  static octave_idx_type grow_size = 100;
-
-  octave_idx_type len = 0;
-
-  string_vector dirlist;
+  string_vector retval;
 
   if (ok ())
     {
-      int count = 0;
+      std::list<std::string> dirlist;
 
       struct dirent *dir_ent;
 
       while ((dir_ent = readdir (static_cast<DIR *> (dir))))
        {
          if (dir_ent)
-           {
-             if (count >= len)
-               {
-                 len += grow_size;
-                 dirlist.resize (len);
-               }
-
-             dirlist[count] = dir_ent->d_name;
-
-             count++;
-           }
+           dirlist.push_back (dir_ent->d_name);
          else
            break;
        }
 
-      dirlist.resize (count);
+      retval = string_vector (dirlist);
     }
 
-  return dirlist;
+  return retval;
 }
 
 void

reply via email to

[Prev in Thread] Current Thread [Next in Thread]