|
From: | Mr. Clif |
Subject: | Re: cross-platform backup tool Same files from different source dir causes spurious diff files |
Date: | Tue, 8 Feb 2022 16:44:04 -0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 |
ok cool, good info,I was just digging into it again, and the date I switched to the snapshot was recorded as Feb 1st. Here is a list of the mirror_metadata files leading up to that:
-rw------- 1 root root 2.7M Jan 21 05:25 mirror_metadata.2022-01-21T05:20:05-09:00.snapshot.gz -rw------- 1 root root 632 Jan 23 05:25 mirror_metadata.2022-01-22T05:20:26-09:00.diff.gz -rw------- 1 root root 790 Jan 24 05:26 mirror_metadata.2022-01-23T05:20:04-09:00.diff.gz -rw------- 1 root root 783 Jan 25 05:24 mirror_metadata.2022-01-24T05:20:33-09:00.diff.gz -rw------- 1 root root 778 Jan 26 05:29 mirror_metadata.2022-01-25T05:19:31-09:00.diff.gz -rw------- 1 root root 731 Jan 27 05:25 mirror_metadata.2022-01-26T05:23:21-09:00.diff.gz -rw------- 1 root root 723 Jan 28 05:27 mirror_metadata.2022-01-27T05:20:37-09:00.diff.gz -rw------- 1 root root 786 Jan 29 05:29 mirror_metadata.2022-01-28T05:21:17-09:00.diff.gz -rw------- 1 root root 772 Jan 30 05:26 mirror_metadata.2022-01-29T05:23:55-09:00.diff.gz -rw------- 1 root root 2.7M Jan 30 05:26 mirror_metadata.2022-01-30T05:20:43-09:00.snapshot.gz -rw------- 1 root root 725 Feb 1 05:26 mirror_metadata.2022-01-31T05:21:21-09:00.diff.gz -rw------- 1 root root 2.6M Feb 3 15:33 mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz -rw------- 1 root root 613 Feb 4 05:16 mirror_metadata.2022-02-03T14:20:54-09:00.diff.gz -rw------- 1 root root 1.7K Feb 5 05:17 mirror_metadata.2022-02-04T05:13:29-09:00.diff.gz -rw------- 1 root root 852 Feb 6 05:55 mirror_metadata.2022-02-05T05:14:57-09:00.diff.gz -rw------- 1 root root 1.7K Feb 7 06:36 mirror_metadata.2022-02-06T05:52:59-09:00.diff.gz -rw------- 1 root root 73K Feb 8 05:39 mirror_metadata.2022-02-07T06:33:04-09:00.diff.gz -rw------- 1 root root 2.7M Feb 8 05:39 mirror_metadata.2022-02-08T05:33:08-09:00.snapshot.gz
You will see that the mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz with the modified date of Feb 3rd is about the same size as the previous snapshot file a couple of days before.
If you grep for the lines that match "^File" then I presume you get a good count of the number of files that changed, or at least recorded for some reason. Here are those stats:
find increments -name "*2022-02-01*" -exec ls -lh {} \; | wc 85287 767583 11064660gzip -dc mirror_metadata.2022-01-30T05:20:43-09:00.snapshot.gz | egrep "^File " | wc
89287 178574 4535737gzip -dc mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz | egrep "^File " | wc
85288 170576 4374253Notice how the number of files with that date in the name, (the first wc output) is almost the same as the number of files listed in the diff.gz file on the last wc call for the diff.gz file.
I also compared some of the entries in the snapshot file to the diff.gz file, and never found any differences. Of course I only checked a dozen or two. Here are a couple:
File bin/bash Type reg Size 1168776 SHA1Digest 0533efae0065e72c1d833b9f7a678a20995bd5a6 ModTime 1555560756 Uid 0 Uname root Gid 0 Gname root Permissions 493 File bin/bunzip2 Type reg Size 38984 NumHardLinks 3 Inode 131113 DeviceLoc 64798 SHA1Digest 6e86f2cf232ab7becc73013d2bc743f8668e2536 ModTime 1562786272 Uid 0 Uname root Gid 0 Gname root Permissions 493 File bin/bzcat Type reg Size 38984 NumHardLinks 3 Inode 131113 DeviceLoc 64798 ModTime 1562786272 Uid 0 Uname root Gid 0 Gname root Permissions 493 [...] File usr/share/mime/application/x-doom-wad.xml Type reg Size 1663 SHA1Digest 11509c6e2a188657e2e53edfd9796d1b1efdddf6 ModTime 1616737565 Uid 0 Uname root Gid 0 Gname root Permissions 420 File usr/share/mime/application/x-dvi.xml Type reg Size 3079 SHA1Digest abfc865aba5c7cec58075b577463c92ec96c9ad8 ModTime 1616737592 Uid 0 Uname root Gid 0 Gname root Permissions 420 File usr/share/mime/application/x-e-theme.xml Type reg Size 3230 SHA1Digest 6d990accf51c683bbe4eb2f131380a7084e7bb04 ModTime 1616737594 Uid 0 Uname root Gid 0 Gname root Permissions 420 File usr/share/mime/application/x-egon.xml Type reg Size 3414 SHA1Digest 323298d7d9d47cd4c544074107e6c5612915aa3f ModTime 1616737567 Uid 0 Uname root Gid 0 Gname root Permissions 420 File usr/share/mime/application/x-executable.xml Type reg Size 2745 SHA1Digest 53135426082c05de56084b5923ce7da105cba309 ModTime 1616737570 Uid 0 Uname root Gid 0 Gname root Permissions 420Anyway, why do we have 85,288 files listed that apparently didn't change? Is there another part of the puzzle that I haven't looked at yet?
Thanks for your help, Clif On 2/8/22 9:40 AM, Robert Nichols wrote:
On 2/8/22 1:05 AM, Mr. Clif wrote:Hey folks, thanks for the feedback. :-) More comments below... On 2/7/22 8:25 PM, Robert Nichols wrote:On 2/7/22 7:23 PM, Leland Best wrote:Hi Cliff, On Mon, 2022-02-07 at 11:45 -0800, Mr. Clif wrote:Hey Eric, any ideas on this? How do these diff files normally work?[...]I'm not an 'rdiff-backup' developer or anything so all you experts out therecorrect me if I'm wrong but ...IIRC 'rdiff-backup' keeps inode info as part of the metadata for each file. When you mount a filesystem Linux assigns "fake" inode numbers to avoid collisions between filesystems on different devices/partitions/etc.. So if you change the mount point, every file could potentially get a new inode number and, consequently, have changed metadata. That results in 'rdiff-backup' creating a'*.diff*' file for every source file.Device and inode metadata is kept only for files with multiple hard links. That's to keep track of which links reference the same file. That information is not needed for files with just a single hard link, and unless something has changedin the latest release that metadata is not kept. You can look in themirror_metadata file (it's compressed ASCII) and see what fields are presentfor each file.Cool, these are the diff.gz files? I tried ungzipping them but the first "line" of data still seems to be binary. Is it encoded somehow?No, I'm talking about the files named "mirror_metadata..." in the rdiff-backup-data directory itself. Those are gzip-ed ASCII files that hold the principal metadata for every file in the mirror. The most recent will have a name that ends in ".snapshot.gz". The one for the previous backup date will most likely have a name ending in ".diff.gz", but is also a gzip-ed ASCII file that contains themetadata for every file that was somehow different then than it is in thelatest backup. You can look at those files and see what was somehow "different".
[Prev in Thread] | Current Thread | [Next in Thread] |