bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FYI: rm now performs run-time check for buggy readdir


From: Jim Meyering
Subject: FYI: rm now performs run-time check for buggy readdir
Date: Fri, 13 May 2005 12:15:56 +0200

I found a way to make GNU rm work around this type of readdir bug

  http://lists.gnu.org/archive/html/bug-coreutils/2005-04/msg00102.html

without incurring any noticeable overhead on working systems.

The run-time cost is essentially zero for any directory removed by `rm -r'
that contains 200 or fewer entries.  For each one that contains 201 or
more, the sole cost (on systems with a POSIX-compliant readdir) is that
of calling rewinddir once after all entries have been processed, and
then using readdir to iterate over any remaining entries one more time.
In some pathological cases (e.g., a directory with N>200 entries, each of
which is unremovable), the patched rm makes N additional readdir calls.
But I can live with that.  Barring pathological cases, I did some
before/after timings on different types of file systems and found no
difference in performance.

The cost on a buggy system can be as high as an additional O(N^2)
readdir calls.  The worst case is when there are N >>> 200 entries in a
directory, and the first N/2 of them are not removable.  But even that
is not bad in practice, since the constant is smaller than 1/500 and
few directories have enough entries to make a noticeable difference.

============================

2005-05-13  Jim Meyering  <address@hidden>

        * Version 5.3.1.

        * NEWS: `rm -r' now removes all of the files it should, even on
        systems with a buggy readdir affecting file systems inaccessible
        at configure time.

        In some unusual circumstances `rm -r' would fail to remove --
        or even consider -- all entries in a directory with more than 254
        (SunOS) or 338 (Darwin) entries.  This could cause trouble even on
        other types of systems when using an affected file system via e.g.,
        NFS.  The underlying cause was a bug in readdir on those systems.
        Coreutils-5.2.1 and earlier used a configure-time test designed
        to detect precisely those problem systems, but it would detect
        the problem and enable remove.c's work-around code only when its
        configure-time test was run on a losing file system.  Obviously,
        it couldn't detect a problem if the offending file system wasn't
        tested or even mounted at coreutils configure time.  Now, rm itself
        performs a minimal-cost run-time test to detect the problem.

        * src/remove.c (CONSECUTIVE_READDIR_UNLINK_THRESHOLD): Define.
        (remove_cwd_entries):  When readdir returns NULL for a directory from
        which we've removed more than CONSECUTIVE_READDIR_UNLINK_THRESHOLD
        entries, call rewinddir and then resume the readdir/unlink loop.
        (UNLINK_CAN_UNLINK_DIRS): Rename from ROOT_CAN_UNLINK_DIRS.

m4/ChangeLog:

2005-05-13  Jim Meyering  <address@hidden>

        * readdir.m4 (GL_FUNC_READDIR): Remove, now that remove.c no
        longer needs it.
        * jm-macros.m4 (gl_MACROS): Don't require GL_FUNC_READDIR.

Index: src/remove.c
===================================================================
RCS file: /fetish/cu/src/remove.c,v
retrieving revision 1.120
retrieving revision 1.121
diff -u -p -u -r1.120 -r1.121
--- src/remove.c        28 Mar 2005 19:29:17 -0000      1.120
+++ src/remove.c        13 May 2005 07:39:50 -0000      1.121
@@ -46,16 +46,29 @@
 #define obstack_chunk_alloc malloc
 #define obstack_chunk_free free
 
-/* FIXME: if possible, use autoconf...  */
+/* If anyone knows of another system for which unlink can never
+   remove a directory, please report it to address@hidden
+   The code below is slightly more efficient if it *knows* that
+   unlink(2) cannot possibly unlink a directory.  */
 #ifdef __GLIBC__
-# define ROOT_CAN_UNLINK_DIRS 0
+# define UNLINK_CAN_UNLINK_DIRS 0  /* Good!  */
 #else
-# define ROOT_CAN_UNLINK_DIRS 1
+# define UNLINK_CAN_UNLINK_DIRS 1  /* Less efficient.  */
 #endif
 
-#ifndef HAVE_WORKING_READDIR
-# define HAVE_WORKING_READDIR 0
-#endif
+/* This is the maximum number of consecutive readdir/unlink calls that
+   can be made (with no intervening rewinddir or closedir/opendir)
+   before triggering a bug that makes readdir return NULL even though
+   some directory entries have not been processed.  The bug afflicts
+   SunOS's readdir when applied to ufs file systems and Darwin 6.5's
+   (and OSX v.10.3.8's) HFS+.  This maximum is conservative in that
+   demonstrating the problem seems to require a directory containing
+   at least 254 deletable entries (which doesn't count . and ..), so
+   we could conceivably increase the maximum value to 254.  */
+enum
+  {
+    CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 200
+  };
 
 enum Ternary
   {
@@ -718,15 +731,19 @@ remove_entry (Dirstack_state const *ds, 
 
   /* Why bother with the following #if/#else block?  Because on systems with
      an unlink function that *can* unlink directories, we must determine the
-     type of each entry before removing it.  Otherwise, we'd risk unlinking an
-     entire directory tree simply by unlinking a single directory;  then all
-     the storage associated with that hierarchy would not be freed until the
-     next reboot.  Not nice.  To avoid that, on such slightly losing systems, 
we
-     need to call lstat to determine the type of each entry, and that 
represents
-     extra overhead that -- it turns out -- we can avoid on GNU-libc-based
-     systems, since there, unlink will never remove a directory.  */
+     type of each entry before removing it.  Otherwise, we'd risk unlinking
+     an entire directory tree simply by unlinking a single directory;  then
+     all the storage associated with that hierarchy would not be freed until
+     the next reboot.  Not nice.  To avoid that, on such slightly losing
+     systems, we need to call lstat to determine the type of each entry,
+     and that represents extra overhead that -- it turns out -- we can
+     avoid on GNU-libc-based systems, since there, unlink will never remove
+     a directory.  Also, on systems where unlink may unlink directories,
+     we're forced to allow a race condition: we lstat a non-directory, then
+     go to unlink it, but in the mean time, a malicious someone has replaced
+     it with a directory.  */
 
-#if ROOT_CAN_UNLINK_DIRS
+#if UNLINK_CAN_UNLINK_DIRS
 
   /* If we don't already know whether FILENAME is a directory, find out now.
      Then, if it's a non-directory, we can use unlink on it.  */
@@ -780,7 +797,7 @@ remove_entry (Dirstack_state const *ds, 
     }
 
 
-#else /* ! ROOT_CAN_UNLINK_DIRS */
+#else /* ! UNLINK_CAN_UNLINK_DIRS */
 
   if (is_dir == T_YES && ! x->recursive)
     {
@@ -838,7 +855,7 @@ remove_cwd_entries (Dirstack_state *ds, 
   DIR *dirp = opendir (".");
   struct AD_ent *top = AD_stack_top (ds);
   enum RM_status status = top->status;
-  bool need_rewinddir = false;
+  size_t n_unlinked_since_opendir_or_last_rewind = 0;
 
   assert (VALID_STATUS (status));
   *subdir = NULL;
@@ -874,12 +891,14 @@ remove_cwd_entries (Dirstack_state *ds, 
              /* Arrange to give a diagnostic after exiting this loop.  */
              dirp = NULL;
            }
-         else if (need_rewinddir)
+         else if (CONSECUTIVE_READDIR_UNLINK_THRESHOLD
+                  < n_unlinked_since_opendir_or_last_rewind)
            {
-             /* On buggy systems, call rewinddir if we've called unlink
-                or rmdir since the opendir or a previous rewinddir.  */
+             /* Call rewinddir if we've called unlink or rmdir so many times
+                (since the opendir or the previous rewinddir) that this
+                NULL-return may be the symptom of a buggy readdir.  */
              rewinddir (dirp);
-             need_rewinddir = false;
+             n_unlinked_since_opendir_or_last_rewind = 0;
              continue;
            }
          break;
@@ -899,9 +918,11 @@ remove_cwd_entries (Dirstack_state *ds, 
       switch (tmp_status)
        {
        case RM_OK:
-         /* On buggy systems, record the fact that we've just
-            removed a directory entry.  */
-         need_rewinddir = ! HAVE_WORKING_READDIR;
+         /* Count how many files we've unlinked since the initial
+            opendir or the last rewinddir.  On buggy systems, if you
+            remove too many, readdir returns NULL even though there
+            remain unprocessed directory entries.  */
+         ++n_unlinked_since_opendir_or_last_rewind;
          break;
 
        case RM_ERROR:
@@ -929,7 +950,7 @@ remove_cwd_entries (Dirstack_state *ds, 
                /* It is much more common that we reach this point for an
                   inaccessible directory.  Hence the second diagnostic, below.
                   However it is also possible that F is a non-directory.
-                  That can happen when we use the `! ROOT_CAN_UNLINK_DIRS'
+                  That can happen when we use the `! UNLINK_CAN_UNLINK_DIRS'
                   block of code and when DO_UNLINK fails due to EPERM.
                   In that case, give a better diagnostic.  */
                if (errno == ENOTDIR)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]