bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tar --unlink-first and symlinked directories


From: James Bonfield
Subject: tar --unlink-first and symlinked directories
Date: Thu, 3 May 2001 12:27:18 +0100 (BST)

(Apologies if you get this twice - my mailer just bombed out.)

GNU tar (1.13) doesn't seem to honour the --unlink-first when dealing
with directories.

Eg:


bash$ pwd
/tmp/ttt
bash$ ls -lR
total 16
drwxr-xr-x   2 pubseq   system      8192 May  3 10:51 d1
drwxr-xr-x   3 jkb      system      8192 May  3 10:51 d2

/d1:
total 0
-rw-r--r--   1 pubseq   system         0 May  3 10:51 aaa

/d2:
total 8
-rw-r--r--   1 jkb      system         0 May  3 10:49 a
drwxr-xr-x   2 jkb      system      8192 May  3 10:52 d1

/d2/d1:
total 1
-rw-r--r--   1 jkb      system         4 May  3 10:52 aaa

So, d1/aaa is owned by pubseq. d2/d1/aaa is owned by jkb.

As root, I'll backup /tmp/ttt to /tmp/ttt2:

bash# mkdir /tmp/ttt2
bash# (cd /tmp/ttt; $tar -cpf - .) | (cd /tmp/ttt2; $tar -xvvpf - 
--unlink-first)
drwxrwxrwx pubseq/system     0 2001-05-03 11:04 ./
drwxr-xr-x pubseq/system     0 2001-05-03 10:51 d1/
-rw-r--r-- pubseq/system     0 2001-05-03 10:51 d1/aaa
drwxr-xr-x jkb/system        0 2001-05-03 10:51 d2/
drwxr-xr-x jkb/system        0 2001-05-03 10:52 d2/d1/
-rw-r--r-- jkb/system        4 2001-05-03 10:52 d2/d1/aaa
-rw-r--r-- jkb/system        0 2001-05-03 10:49 d2/a
bash# 

That's all fine. If I look in /tmp/ttt2 then I see exactly what I expect - a
copy of /tmp/ttt.

Now comes the bug:

bash# cd /tmp/ttt2/d2
bash# ls
a   d1
bash# rm -rf d1
bash# ln -s ../d1 .
bash# ls -l
total 0
-rw-r--r--   1 jkb      system         0 May  3 10:49 a
lrwxrwxrwx   1 root     system         5 May  3 11:07 d1 -> ../d1


This is in the directory the backups are being written TO.

So I'll repeat the backup command:

bash# (cd /tmp/ttt; $tar -cpf - .) | (cd /tmp/ttt2; $tar -xvvpf - 
--unlink-first)
drwxrwxrwx pubseq/system     0 2001-05-03 11:04 ./
drwxr-xr-x pubseq/system     0 2001-05-03 10:51 d1/
-rw-r--r-- pubseq/system     0 2001-05-03 10:51 d1/aaa
drwxr-xr-x jkb/system        0 2001-05-03 10:51 d2/
drwxr-xr-x jkb/system        0 2001-05-03 10:52 d2/d1/
-rw-r--r-- jkb/system        4 2001-05-03 10:52 d2/d1/aaa
-rw-r--r-- jkb/system        0 2001-05-03 10:49 d2/a
bash# 

The output is just as before - we're copying d1/aaa and d2/d1/aaa. However:

bash# pwd
/tmp/ttt2/d2
bash# ls -l 
total 0
-rw-r--r--   1 jkb      system         0 May  3 10:49 a
lrwxrwxrwx   1 root     system         5 May  3 11:07 d1 -> ../d1

How come d1 is still a symlink? The original copy in /tmp/ttt/d2 was a
directory owned by jkb and I haven't changed that. The --unlink-first should
have removed the d1 link before recreating it. This is even true if I also
specify --recursive-unlink.

Consequently the backup copy in /tmp/ttt2/d1 has changed ownership:

bash# pwd
/tmp/ttt2/d2
bash# ls -la ../d1
total 17
drwxr-xr-x   2 pubseq   system      8192 May  3 10:51 .
drwxrwxrwx   4 pubseq   system      8192 May  3 11:04 ..
-rw-r--r--   1 jkb      system         4 May  3 10:52 aaa


This has serious implication for security when using GNU tar with backups.
Consider the case of using GNU tar (with --list-incremental for example) to do 
nightly backups to another disk. A user can create a symlink to /etc in their
directory. This is then backed up. The next day they remove their etc symlink
and create a directory called etc. In there they create a new password
file. This is then subsequently copied over the top of /etc/passwd. (I haven't 
tested this theory, but it seems to reasonably follow from my own observations 
so far.)

Here's my patch. I've only done the minimal of testing on this.

*** extract.c.orig      Thu May  3 12:03:07 2001
--- extract.c   Thu May  3 12:18:08 2001
***************
*** 841,846 ****
--- 841,849 ----
        while (name_length && CURRENT_FILE_NAME[name_length] == '/')
        CURRENT_FILE_NAME[name_length--] = '\0';
  
+       if (unlink_first_option)
+       remove_any_file (CURRENT_FILE_NAME, recursive_unlink_option);
+ 
        if (incremental_option)
        {
          /* Read the entry and delete files that aren't listed in the


Basically it just adds the unlink_first_option check to DIRTYPE extraction
too. This seems to work for me. Indeed I cannot work out the reason for having 
a recursive_unlink_option if directories did not check for unlink-first.

James

--
James Bonfield (address@hidden)   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]