[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: change timestamps of backups?
From: |
griffin tucker |
Subject: |
Re: change timestamps of backups? |
Date: |
Thu, 22 Apr 2021 17:43:36 +1000 |
On Thu, 22 Apr 2021 at 17:38, Dominic Raferd <dominic@timedicer.co.uk> wrote:
>
>
> On 22/04/2021 08:31, griffin tucker wrote:
> > On Thu, 22 Apr 2021 at 17:17, Dominic Raferd <dominic@timedicer.co.uk>
> > wrote:
> >>
> >> On 22/04/2021 08:07, Dominic Raferd wrote:
> >>> On 22/04/2021 08:01, griffin tucker wrote:
> >>>> I've tried using deduplication, but only get about 6gb savings per 30gb.
> >>>>
> >>>> I intend on using squashfs on top of rdiff-backup, btrfs is just being
> >>>> used temporarily.
> >>>>
> >>>> On Thu, 22 Apr 2021 at 16:41, Dominic Raferd
> >>>> <dominic@timedicer.co.uk> wrote:
> >>>>> On 22/04/2021 07:03, griffin tucker wrote:
> >>>>>> i have a collection of the last 5 monthly dumps of various wikis from
> >>>>>> dumps.wikimedia.org
> >>>>>>
> >>>>>> each dump has numbered directories in the format 20210501, 20210401,
> >>>>>> 20210301, etc.
> >>>>>>
> >>>>>> all the filenames in these directories remain the same with each
> >>>>>> wiki's dump, with the exception of enwiki
> >>>>>>
> >>>>>> other than enwiki, these range from about 30gb to about 370gb
> >>>>>> uncompressed with each successive dump
> >>>>>>
> >>>>>> enwiki, the main english wikipedia, has mostly the same named files,
> >>>>>> but has the pages-meta-history.xml file split up into various 1-55gb
> >>>>>> compressed files (mostly 1-2gb) making a total of about 700gb
> >>>>>> compressed (disregarding redundant files)
> >>>>>>
> >>>>>> i'm not sure how big enwiki is uncompressed, but could be close to
> >>>>>> 25tb. i haven't figured out how i could make rdiff-backup more
> >>>>>> efficient with these files, aside from a script to merge each
> >>>>>> metahistory file into a single huge >100gb file and then running
> >>>>>> rdiff-backup, and then splitting the file back into their separate
> >>>>>> files with an index after restoring
> >>>>>>
> >>>>>> i'm using btrfs zstd:15 to store the files uncompressed, however i
> >>>>>> don't have enough storage to store enwiki uncompressed, zstd
> >>>>>> compression just isn't that good, even at maximum - i've used xz
> >>>>>> compression which attains much better rates of compression for other
> >>>>>> wikis but that isn't exactly seamless (experiments with fuse failed)
> >>>>>>
> >>>>>> so, to save space, i thought i would use rdiff-backup so that it would
> >>>>>> only store the differences between dumps, and it works very well in
> >>>>>> initial tests, however, if i run the reverse incremental backups one
> >>>>>> after the other today, they would be dated today, rather than
> >>>>>> 20210501, 20210401, etc. which isn't informative
> >>>>>>
> >>>>>> if i could add a comment next to each datetime stamp, this would be
> >>>>>> useful, otherwise i'll have to keep a separate index, which isn't a
> >>>>>> huge problem, i just thought i'd ask if i could change the datetime
> >>>>>> stamps before i write such a script
> >>>>>>
> >>>>>> On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <Eric@lavar.de> wrote:
> >>>>>>> Hi Griffin,
> >>>>>>>
> >>>>>>> On 22/04/2021 06:39, griffin tucker wrote:
> >>>>>>>> is there a way to change the timestamps of the backups?
> >>>>>>> no
> >>>>>>>
> >>>>>>>> or perhaps replace the timestamps with a unique name?
> >>>>>>> no
> >>>>>>>
> >>>>>>>> would this cause a faulty restore or a damaged backup?
> >>>>>>> yes, rdiff-backup makes a lot of date/time comparaisons so the
> >>>>>>> timestamp
> >>>>>>> is meaningful.
> >>>>>>>
> >>>>>>> What are you trying to do?
> >>>>>>>
> >>>>>>> KR, Eric
> >>>>> Since you are already using btrfs, have you considered using
> >>>>> deduplication? Likely to work better if you store uncompressed.
> >>>>>
> >>> In your scenario I would expect deduplication to give big savings if
> >>> you store uncompressed. If not, YMMV. (I tried with rdiff-backup on
> >>> btrfs + deduplication a few years ago but found it all a bit scary and
> >>> retreated to ext4.)
> >> To clarify, I mean turning off compression within rdiff-backup, and
> >> instead using compression (+deduplication) at fs level.
> > well, i suppose i was using windows server's dedupe in that 6gb per
> > 30gb savings, maybe i should try again with btrfs' dedupe
> >
> > come to think of it, dedupe seems to be already enabled which would
> > explain <5 second copies for hundreds of gigabytes, but i can't get
> > the dedupe status when i run:
> >
> > btrfs dedupe status <mountpoint>
> >
> > with an error
> >
> > btrfs: unknown token 'dedupe'
> >
> > i'll investiage this further
> Another option is to use ZFS, Patrik wrote about it here:
> https://www.ikus-soft.com/en/blog/2020-07-22-configure-zfs-for-rdiff-backup/
i'm reluctant to use zfs because linus torvalds said not to
- change timestamps of backups?, griffin tucker, 2021/04/22
- Re: change timestamps of backups?, Eric Lavarde, 2021/04/22
- Re: change timestamps of backups?, griffin tucker, 2021/04/22
- Re: change timestamps of backups?, Dominic Raferd, 2021/04/22
- Re: change timestamps of backups?, griffin tucker, 2021/04/22
- Re: change timestamps of backups?, Dominic Raferd, 2021/04/22
- Re: change timestamps of backups?, Dominic Raferd, 2021/04/22
- Re: change timestamps of backups?, griffin tucker, 2021/04/22
- Re: change timestamps of backups?, Dominic Raferd, 2021/04/22
- Re: change timestamps of backups?,
griffin tucker <=
- Re: change timestamps of backups?, Patrik Dufresne, 2021/04/22
- Re: change timestamps of backups?, Robert Nichols, 2021/04/22
- Re: change timestamps of backups?, griffin tucker, 2021/04/22