rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cross-platform backup tool Same files from different source dir caus


From: Mr. Clif
Subject: Re: cross-platform backup tool Same files from different source dir causes spurious diff files
Date: Tue, 8 Feb 2022 16:44:04 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0

ok cool, good info,

I was just digging into it again, and the date I switched to the snapshot was recorded as Feb 1st. Here is a list of the mirror_metadata files leading up to that:

-rw------- 1 root root 2.7M Jan 21 05:25 mirror_metadata.2022-01-21T05:20:05-09:00.snapshot.gz -rw------- 1 root root  632 Jan 23 05:25 mirror_metadata.2022-01-22T05:20:26-09:00.diff.gz -rw------- 1 root root  790 Jan 24 05:26 mirror_metadata.2022-01-23T05:20:04-09:00.diff.gz -rw------- 1 root root  783 Jan 25 05:24 mirror_metadata.2022-01-24T05:20:33-09:00.diff.gz -rw------- 1 root root  778 Jan 26 05:29 mirror_metadata.2022-01-25T05:19:31-09:00.diff.gz -rw------- 1 root root  731 Jan 27 05:25 mirror_metadata.2022-01-26T05:23:21-09:00.diff.gz -rw------- 1 root root  723 Jan 28 05:27 mirror_metadata.2022-01-27T05:20:37-09:00.diff.gz -rw------- 1 root root  786 Jan 29 05:29 mirror_metadata.2022-01-28T05:21:17-09:00.diff.gz -rw------- 1 root root  772 Jan 30 05:26 mirror_metadata.2022-01-29T05:23:55-09:00.diff.gz -rw------- 1 root root 2.7M Jan 30 05:26 mirror_metadata.2022-01-30T05:20:43-09:00.snapshot.gz -rw------- 1 root root  725 Feb  1 05:26 mirror_metadata.2022-01-31T05:21:21-09:00.diff.gz -rw------- 1 root root 2.6M Feb  3 15:33 mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz -rw------- 1 root root  613 Feb  4 05:16 mirror_metadata.2022-02-03T14:20:54-09:00.diff.gz -rw------- 1 root root 1.7K Feb  5 05:17 mirror_metadata.2022-02-04T05:13:29-09:00.diff.gz -rw------- 1 root root  852 Feb  6 05:55 mirror_metadata.2022-02-05T05:14:57-09:00.diff.gz -rw------- 1 root root 1.7K Feb  7 06:36 mirror_metadata.2022-02-06T05:52:59-09:00.diff.gz -rw------- 1 root root  73K Feb  8 05:39 mirror_metadata.2022-02-07T06:33:04-09:00.diff.gz -rw------- 1 root root 2.7M Feb  8 05:39 mirror_metadata.2022-02-08T05:33:08-09:00.snapshot.gz

You will see that the mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz with the modified date of Feb 3rd is about the same size as the previous snapshot file a couple of days before.

If you grep for the lines that match "^File" then I presume you get a good count of the number of files that changed, or at least recorded for some reason. Here are those stats:

find increments -name "*2022-02-01*" -exec ls -lh {} \; | wc
  85287  767583 11064660
gzip -dc mirror_metadata.2022-01-30T05:20:43-09:00.snapshot.gz | egrep "^File " | wc
  89287  178574 4535737
gzip -dc mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz | egrep "^File " | wc
  85288  170576 4374253

Notice how the number of files with that date in the name, (the first wc output) is almost the same as the number of files listed in the diff.gz file on the last wc call for the diff.gz file.

I also compared some of the entries in the snapshot file to the diff.gz file, and never found any differences. Of course I only checked a dozen or two. Here are a couple:

File bin/bash
  Type reg
  Size 1168776
  SHA1Digest 0533efae0065e72c1d833b9f7a678a20995bd5a6
  ModTime 1555560756
  Uid 0
  Uname root
  Gid 0
  Gname root
  Permissions 493
File bin/bunzip2
  Type reg
  Size 38984
  NumHardLinks 3
  Inode 131113
  DeviceLoc 64798
  SHA1Digest 6e86f2cf232ab7becc73013d2bc743f8668e2536
  ModTime 1562786272
  Uid 0
  Uname root
  Gid 0
  Gname root
  Permissions 493
File bin/bzcat
  Type reg
  Size 38984
  NumHardLinks 3
  Inode 131113
  DeviceLoc 64798
  ModTime 1562786272
  Uid 0
  Uname root
  Gid 0
  Gname root
  Permissions 493
[...]
File usr/share/mime/application/x-doom-wad.xml
  Type reg
  Size 1663
  SHA1Digest 11509c6e2a188657e2e53edfd9796d1b1efdddf6
  ModTime 1616737565
  Uid 0
  Uname root
  Gid 0
  Gname root
  Permissions 420
File usr/share/mime/application/x-dvi.xml
  Type reg
  Size 3079
  SHA1Digest abfc865aba5c7cec58075b577463c92ec96c9ad8
  ModTime 1616737592
  Uid 0
  Uname root
  Gid 0
  Gname root
  Permissions 420
File usr/share/mime/application/x-e-theme.xml
  Type reg
  Size 3230
  SHA1Digest 6d990accf51c683bbe4eb2f131380a7084e7bb04
  ModTime 1616737594
  Uid 0
  Uname root
  Gid 0
  Gname root
  Permissions 420
File usr/share/mime/application/x-egon.xml
  Type reg
  Size 3414
  SHA1Digest 323298d7d9d47cd4c544074107e6c5612915aa3f
  ModTime 1616737567
  Uid 0
  Uname root
  Gid 0
  Gname root
  Permissions 420
File usr/share/mime/application/x-executable.xml
  Type reg
  Size 2745
  SHA1Digest 53135426082c05de56084b5923ce7da105cba309
  ModTime 1616737570
  Uid 0
  Uname root
  Gid 0
  Gname root
  Permissions 420

Anyway, why do we have 85,288 files listed that apparently didn't change? Is there another part of the puzzle that I haven't looked at yet?

    Thanks for your help,
    Clif


On 2/8/22 9:40 AM, Robert Nichols wrote:
On 2/8/22 1:05 AM, Mr. Clif wrote:
Hey folks,

thanks for the feedback. :-) More comments below...

On 2/7/22 8:25 PM, Robert Nichols wrote:
On 2/7/22 7:23 PM, Leland Best wrote:
Hi Cliff,

On Mon, 2022-02-07 at 11:45 -0800, Mr. Clif wrote:
Hey Eric,

any ideas on this? How do these diff files normally work?
[...]

I'm not an 'rdiff-backup' developer or anything so all you experts out there
correct me if I'm wrong but ...

IIRC 'rdiff-backup' keeps inode info as part of the metadata for each file. When you mount a filesystem Linux assigns "fake" inode numbers to avoid collisions between filesystems on different devices/partitions/etc.. So if you change the mount point, every file could potentially get a new inode number and, consequently, have changed metadata.  That results in 'rdiff-backup' creating a
'*.diff*' file for every source file.

Device and inode metadata is kept only for files with multiple hard links. That's to keep track of which links reference the same file. That information is not needed for files with just a single hard link, and unless something has changed
in the latest release that metadata is not kept. You can look in the
mirror_metadata file (it's compressed ASCII) and see what fields are present
for each file.

Cool, these are the diff.gz files? I tried ungzipping them but the first "line" of data still seems to be binary. Is it encoded somehow?

No, I'm talking about the files named "mirror_metadata..." in the rdiff-backup-data directory itself. Those are gzip-ed ASCII files that hold the principal metadata for every file in the mirror. The most recent will have a name that ends in ".snapshot.gz". The one for the previous backup date will most likely have a name ending in ".diff.gz", but is also a gzip-ed ASCII file that contains the
metadata for every file that was somehow different then than it is in the
latest backup. You can look at those files and see what was somehow "different".





reply via email to

[Prev in Thread] Current Thread [Next in Thread]