[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: cross-platform backup tool Same files from different source dir caus
From: |
Mr. Clif |
Subject: |
Re: cross-platform backup tool Same files from different source dir causes spurious diff files |
Date: |
Wed, 9 Feb 2022 12:46:24 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 |
Howdy,
I've dug into this further and think I now know what's going on. Just
FYI this is more than an academic question for me because I have several
vms that I would like to take snapshots of and this first one is by far
the smallest.
These VMs are LXC containers, and when I started out a long time ago, I
would just manually create the filesystems and use some cli tools to
install a fresh distro. Eventually the linux kernal started supporting
namspaces to improve security and they were adopted by the
virtualization ecosystems.
I'm not sure when it happened because I just noticed it, maybe it was
when I switched to letting proxmox spin up the new VMs, but now the UIDs
and GIDs in the filesystems for the unprivileged containers have all
been shifted by adding 100000 to them. This is why rdiff-backup updated
all that metadata.
This is not just a mapping in ram, it's actually in the filesystem image
on disk. There are several ways of dealing with this, some tools will
update the UID/GIDs for you when you reboot the vm. Other tools act like
layer in a bind mount to mostly duplicate a filesystem somewhere else,
and they rewrite the UID/GIDs on the fly. Some utilities like
rdiff-backup and rsync have some ability to rewrite or map the UID/GIDs
as they copy. The last two seem most attractive to me.
rsync has --usermap, and --groupmap, and rdiff-backup has
--user-mapping-file, and --group-mapping-file. In the filesystem mount
utility area there are, shiftfs, idmapped mounts, and bindfs.
Shiftfs is deprecated in favor of idmapped mounts, though some of my
kernels don't have that yet. Bindfs is a FUSE based solution and so
might be slower, however it might be the only one that is really
workable for me at the moment. This is because it has the --uid-offset,
and --gid-offset options. Bye the way, you can put in negative offsets
too, good thing. :-)
It would be great if rdiff-backup would allow offsets like this or even
better the ability to specify a range like
100000-165535:0-65535
Or you could just have the starting UID after the colon.
In the man page under USERS AND GROUPS, it says:
"If you specify both --preserve-numerical-ids and one of the mapping
options, the behavior is undefined."
I think it would be better to allow both with the user-mapping-file
overriding the preserve-numerical-ids behavior when necessary. As in my
use case I never want user name mapping.
What do you think? I appreciator the discussion, and everyone's help.
Thanks,
Clif
On 2/8/22 6:03 PM, Robert Nichols wrote:
On 2/8/22 6:44 PM, Mr. Clif wrote:
ok cool, good info,
I was just digging into it again, and the date I switched to the
snapshot was recorded as Feb 1st. Here is a list of the
mirror_metadata files leading up to that:
-rw------- 1 root root 2.7M Jan 21 05:25
mirror_metadata.2022-01-21T05:20:05-09:00.snapshot.gz
-rw------- 1 root root 632 Jan 23 05:25
mirror_metadata.2022-01-22T05:20:26-09:00.diff.gz
-rw------- 1 root root 790 Jan 24 05:26
mirror_metadata.2022-01-23T05:20:04-09:00.diff.gz
-rw------- 1 root root 783 Jan 25 05:24
mirror_metadata.2022-01-24T05:20:33-09:00.diff.gz
-rw------- 1 root root 778 Jan 26 05:29
mirror_metadata.2022-01-25T05:19:31-09:00.diff.gz
-rw------- 1 root root 731 Jan 27 05:25
mirror_metadata.2022-01-26T05:23:21-09:00.diff.gz
-rw------- 1 root root 723 Jan 28 05:27
mirror_metadata.2022-01-27T05:20:37-09:00.diff.gz
-rw------- 1 root root 786 Jan 29 05:29
mirror_metadata.2022-01-28T05:21:17-09:00.diff.gz
-rw------- 1 root root 772 Jan 30 05:26
mirror_metadata.2022-01-29T05:23:55-09:00.diff.gz
-rw------- 1 root root 2.7M Jan 30 05:26
mirror_metadata.2022-01-30T05:20:43-09:00.snapshot.gz
-rw------- 1 root root 725 Feb 1 05:26
mirror_metadata.2022-01-31T05:21:21-09:00.diff.gz
-rw------- 1 root root 2.6M Feb 3 15:33
mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz
-rw------- 1 root root 613 Feb 4 05:16
mirror_metadata.2022-02-03T14:20:54-09:00.diff.gz
-rw------- 1 root root 1.7K Feb 5 05:17
mirror_metadata.2022-02-04T05:13:29-09:00.diff.gz
-rw------- 1 root root 852 Feb 6 05:55
mirror_metadata.2022-02-05T05:14:57-09:00.diff.gz
-rw------- 1 root root 1.7K Feb 7 06:36
mirror_metadata.2022-02-06T05:52:59-09:00.diff.gz
-rw------- 1 root root 73K Feb 8 05:39
mirror_metadata.2022-02-07T06:33:04-09:00.diff.gz
-rw------- 1 root root 2.7M Feb 8 05:39
mirror_metadata.2022-02-08T05:33:08-09:00.snapshot.gz
You will see that the
mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz with the modified
date of Feb 3rd is about the same size as the previous snapshot file
a couple of days before.
If you grep for the lines that match "^File" then I presume you get a
good count of the number of files that changed, or at least recorded
for some reason. Here are those stats:
find increments -name "*2022-02-01*" -exec ls -lh {} \; | wc
85287 767583 11064660
gzip -dc mirror_metadata.2022-01-30T05:20:43-09:00.snapshot.gz |
egrep "^File " | wc
89287 178574 4535737
gzip -dc mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz | egrep
"^File " | wc
85288 170576 4374253
Notice how the number of files with that date in the name, (the first
wc output) is almost the same as the number of files listed in the
diff.gz file on the last wc call for the diff.gz file.
I also compared some of the entries in the snapshot file to the
diff.gz file, and never found any differences. Of course I only
checked a dozen or two.
I believe you are comparing the wrong files. Welcome to the confusing
world of reverse diffs. Everything works backward. That 2.6MB
mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz has the differences
that would be applied to a 2022-02-03T14:20:54-09:00 snapshot (i.e.,
the next _newer_ state) to construct a 2022-02-01 snapshot. The huge
perceived change occurred between the 2022-02-01 backup and the
2022-02-03 backup.
I would first look at some of the entries in that
mirror_metadata.2022-02-03T14:20:54-09:00.diff.gz file and see if some
of the same filenames appear in the huge 2022-02-01 diff. Hopefully
you can spot what metadata changed. If you can't find any matching
names in the 2022-02-03 diff, try the 2022-02-04 diff. As a last
resort, I can send you a rather large** awk script that you
can use to work back from the nearest future snapshot (currently
2022-02-08) to reconstruct a 2022-02-03 snapshot. Then you should
certainly be able to see what the differences that 2022-02-01 diff is
applying.
** A bit over 3KB, somewhat more than I care to spew out to a mailing
list.
- cross-platform backup tool Same files from different source dir causes spurious diff files, Mr. Clif, 2022/02/05
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Mr. Clif, 2022/02/07
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Leland Best, 2022/02/07
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Mr. Clif, 2022/02/08
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Robert Nichols, 2022/02/08
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Mr. Clif, 2022/02/08
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Robert Nichols, 2022/02/08
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files,
Mr. Clif <=
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, ewl+rdiffbackup, 2022/02/10
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Mr. Clif, 2022/02/10
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Dominic Raferd, 2022/02/10
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Mr. Clif, 2022/02/10
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Mr. Clif, 2022/02/14
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Dominic Raferd, 2022/02/15
- Re: cross-platform backup tool Same files from different source dir causes spurious diff files, Mr. Clif, 2022/02/15