[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: change timestamps of backups?
From: |
griffin tucker |
Subject: |
Re: change timestamps of backups? |
Date: |
Thu, 22 Apr 2021 17:31:53 +1000 |
On Thu, 22 Apr 2021 at 17:17, Dominic Raferd <dominic@timedicer.co.uk> wrote:
>
>
> On 22/04/2021 08:07, Dominic Raferd wrote:
> > On 22/04/2021 08:01, griffin tucker wrote:
> >> I've tried using deduplication, but only get about 6gb savings per 30gb.
> >>
> >> I intend on using squashfs on top of rdiff-backup, btrfs is just being
> >> used temporarily.
> >>
> >> On Thu, 22 Apr 2021 at 16:41, Dominic Raferd
> >> <dominic@timedicer.co.uk> wrote:
> >>> On 22/04/2021 07:03, griffin tucker wrote:
> >>>> i have a collection of the last 5 monthly dumps of various wikis from
> >>>> dumps.wikimedia.org
> >>>>
> >>>> each dump has numbered directories in the format 20210501, 20210401,
> >>>> 20210301, etc.
> >>>>
> >>>> all the filenames in these directories remain the same with each
> >>>> wiki's dump, with the exception of enwiki
> >>>>
> >>>> other than enwiki, these range from about 30gb to about 370gb
> >>>> uncompressed with each successive dump
> >>>>
> >>>> enwiki, the main english wikipedia, has mostly the same named files,
> >>>> but has the pages-meta-history.xml file split up into various 1-55gb
> >>>> compressed files (mostly 1-2gb) making a total of about 700gb
> >>>> compressed (disregarding redundant files)
> >>>>
> >>>> i'm not sure how big enwiki is uncompressed, but could be close to
> >>>> 25tb. i haven't figured out how i could make rdiff-backup more
> >>>> efficient with these files, aside from a script to merge each
> >>>> metahistory file into a single huge >100gb file and then running
> >>>> rdiff-backup, and then splitting the file back into their separate
> >>>> files with an index after restoring
> >>>>
> >>>> i'm using btrfs zstd:15 to store the files uncompressed, however i
> >>>> don't have enough storage to store enwiki uncompressed, zstd
> >>>> compression just isn't that good, even at maximum - i've used xz
> >>>> compression which attains much better rates of compression for other
> >>>> wikis but that isn't exactly seamless (experiments with fuse failed)
> >>>>
> >>>> so, to save space, i thought i would use rdiff-backup so that it would
> >>>> only store the differences between dumps, and it works very well in
> >>>> initial tests, however, if i run the reverse incremental backups one
> >>>> after the other today, they would be dated today, rather than
> >>>> 20210501, 20210401, etc. which isn't informative
> >>>>
> >>>> if i could add a comment next to each datetime stamp, this would be
> >>>> useful, otherwise i'll have to keep a separate index, which isn't a
> >>>> huge problem, i just thought i'd ask if i could change the datetime
> >>>> stamps before i write such a script
> >>>>
> >>>> On Thu, 22 Apr 2021 at 15:19, Eric Lavarde <Eric@lavar.de> wrote:
> >>>>> Hi Griffin,
> >>>>>
> >>>>> On 22/04/2021 06:39, griffin tucker wrote:
> >>>>>> is there a way to change the timestamps of the backups?
> >>>>> no
> >>>>>
> >>>>>> or perhaps replace the timestamps with a unique name?
> >>>>> no
> >>>>>
> >>>>>> would this cause a faulty restore or a damaged backup?
> >>>>> yes, rdiff-backup makes a lot of date/time comparaisons so the
> >>>>> timestamp
> >>>>> is meaningful.
> >>>>>
> >>>>> What are you trying to do?
> >>>>>
> >>>>> KR, Eric
> >>> Since you are already using btrfs, have you considered using
> >>> deduplication? Likely to work better if you store uncompressed.
> >>>
> > In your scenario I would expect deduplication to give big savings if
> > you store uncompressed. If not, YMMV. (I tried with rdiff-backup on
> > btrfs + deduplication a few years ago but found it all a bit scary and
> > retreated to ext4.)
> To clarify, I mean turning off compression within rdiff-backup, and
> instead using compression (+deduplication) at fs level.
well, i suppose i was using windows server's dedupe in that 6gb per
30gb savings, maybe i should try again with btrfs' dedupe
come to think of it, dedupe seems to be already enabled which would
explain <5 second copies for hundreds of gigabytes, but i can't get
the dedupe status when i run:
btrfs dedupe status <mountpoint>
with an error
btrfs: unknown token 'dedupe'
i'll investiage this further
- change timestamps of backups?, griffin tucker, 2021/04/22
- Re: change timestamps of backups?, Eric Lavarde, 2021/04/22
- Re: change timestamps of backups?, griffin tucker, 2021/04/22
- Re: change timestamps of backups?, Dominic Raferd, 2021/04/22
- Re: change timestamps of backups?, griffin tucker, 2021/04/22
- Re: change timestamps of backups?, Dominic Raferd, 2021/04/22
- Re: change timestamps of backups?, Dominic Raferd, 2021/04/22
- Re: change timestamps of backups?,
griffin tucker <=
- Re: change timestamps of backups?, Dominic Raferd, 2021/04/22
- Re: change timestamps of backups?, griffin tucker, 2021/04/22
- Re: change timestamps of backups?, Patrik Dufresne, 2021/04/22
- Re: change timestamps of backups?, Robert Nichols, 2021/04/22
- Re: change timestamps of backups?, griffin tucker, 2021/04/22