bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12339: Gnu rm, changed only recently (4-5 years), and didn't follow


From: Linda Walsh
Subject: bug#12339: Gnu rm, changed only recently (4-5 years), and didn't follow letter of posix...(statement follows)
Date: Wed, 12 Sep 2012 15:51:51 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666

I hope to prove the subject convincingly in the following sections, If you can, reading this in the original HTML might be useful, as I don't know it will end up when converted to text. I tried to format it for readability .. so if the text format isn't...(still tried to limit margins and use monospace font)... Writing this to appeal to those with an open mind
is the best I can hope for.

I. In regards to /*Compatibility with Default Behavior*/. If there is a change to a function or functionality that people use, then you have broken compatibility with past functionality. If it was something that generated an error in the past, fixing it doesn't break compatibility with past function. Example:

       >  mkdir aaa
       >  ln -s aaa bbb
       >  rm -r bbb         ## deletes symlink 'recursively' leaving
                           ## aaa untouched.
       >  ln -s aaa bbb
       >  rm -r bbb/        ## this would make it clear that we
                           ## are addressing a dir that bbb is
                           ## pointing to;  however...

       rm: cannot remove directory `bbb': Not a directory.

                            ## rm is confused, as ls shows:
       >  'ls' -gGld bbb/
       drwxrwxr-x 2 6 2012-09-11 09:48 bbb/

Rumor has it 'rm' may now include a fix for this so that it knows that "/" means '/address/ the directory at the end of this symlink'. Did this change default behavior? Yes. Did it change default functional behavior? No. Before, "rm -r bbb/" was not valid syntax -- and wasn't part of "default behavior". A new feature was created where before there was none. Implementing functionality where before there was none, is not normally considered "changing default behavior", as there was no legal or functional prior behavior that someone would have been _relying_ on. To someone unfamiliar with "/" being the directory separator, you could just as easily have added a "/" flag that when appended to a word, tells rm to treat it as a directory.

II. Basenames vs. dirnames. What are they? Basenames are the final part of a name that has been chosen to name the entry located in "some dir". Let's look at some different "Cases" of sample pathnames (skipping the easy ones, given the audience):

Case A:   "/"     What is the dirname and what is the basename?

Answer: <null> and <null>.

That's why it is called the root -- it doesn't have a name entry somewhere as it is the top of the tree -- there is nothing above it to hold a name for it. It has to exist on every system, so by convention, it is called the 'root' directory.


 Case B: "ENTRY/"     What is the dirname and what is the basename?

Answer: Dirname="ENTRY", basename=<null>.  (Explanation seems unnecessary).


 Case C: "ENTRY"    Dir and Basenames?

Answer:  it depends on context and what it really is.

Whether or not 'ENTRY' is a basename or a dirname depends on whether or not 'ENTRY' is a directory.

Example.  In it's default mode rm removes only files.  I.e.
"rm a b c ENTRY d" -- rm expects all entries to to simple basenames. If it encounters a dirname, it issues an error and refuses to operate on it.

In it's recursive mode, rm will accept basenames and dirnames.  It will
inspect each entry to determine if it is a file (basename) or a dir(dirname).

As POSIX states:

If, on the other hand, you specify the "-r" switch to rm, it enables it to remove directories, but it doesn't treat them the same as other files (because they still are not 'basenames' - they are directories). So the first difference that might be noted, is that rm begins a depth first traversal on the contents of that directory. Once it is empty an is no longer containing any non-structural entries, it will then enable it to be deleted using rmdir (that handles deconstructing internal structural components that unlink wouldn't handle). Only after the contents have been removed is it no longer a dirpath -- and it becomes the next active "operand" of the "rm" verb/action.

At that point, it either deletes it or not, depending on what is permitted by policy (usually security settings). There are 2 entries not covered usually covered by security policies, as they are not discretionary entries -- they are mandatory components of the OS, namely "." and "..". If "rm" encounters one of those at the point such an entry becomes the "entry to be operated on" (the operand), POSIX directs that rm shouldn't even try to delete those entries (as on some systems this might succeed, if the underlying OS allows it, so in order to provide portable behavior, a compliant "rm" will not attempt to delete those entries.

POSIX states:

   If either of the files dot or dot-dot are specified as the basename
   portion of an operand (that is, the final pathname component), rm
   shall write a diagnostic message to standard error and do nothing
   more with such operands.

However, "rm" if it encounters "." or ".." and the user has specified that they are to be treated as files (i.e. they do no specify -r), "rm" already refuses to delete those entries as they are not normal files or basename components -- but are structural components to facilitate addressing -- specifically ".." allows addressing the directory above the current dir, while "." allows one to specify an explicit starting point for an operation as the root of the current directory. The wording for treatment of "." and ".." is consistent for the case that they are being addressed as "targets" not being used as addresses.

POSIX states for directories, with (recursion),

       For each entry contained in /file/, other than dot or dot-dot,
       the four steps listed here (1 to 4) shall be taken with the
       entry as if it were a /file/ operand. The /rm/ utility shall not
       traverse directories by following symbolic links into other
       parts of the hierarchy, but shall remove the links themselves.

The first thing it does when operating on a directory is to enter the directory and remove associated files. Only when a directory is empty,
is it able to be addressed as a file-type operand of the parent that
can be removed with 1 operation (rmdir vs. unlink).  If, however, you remove
the contents, and now when you back up you have been removing contents under
".", then -- discovering "." as an operand, "it should issue an error and have nothing more to do with that file". This is entirely consistent with
the wording in POSIX.  It would also be the case if "-f" had been specified,
that any diagnostic would be suppressed.

As POSIX is a computer portability standard, one would imagine that they know the difference between dirnames and basenames and that a dirname can only be treated as a basename when it has been emptied of files.

Gnu's version of rm worked as I described prior to version 6.  Version 5
in openSuse 10.1 and earlier was only *retired about 4 years ago* -- not "23 years ago" like some people would make up. This is a relatively new change
that is causing problems and I would like to see it reverted to the letter
of computer science practice -- as it would still adhere to the letter of POSIX's requirements.



III) Please note, that just as in section I, we are [re]adding functionality where before there was none. Thus this is no more a compatibility feature than adding any other "new feature", as before,
only "dysfunction" existed.  Adding new features in the place of a void
is not something that can break previous compatibility, as it makes no sense to have dependencies on a void...







reply via email to

[Prev in Thread] Current Thread [Next in Thread]