[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] dd: add punchhole feature
From: |
Pádraig Brady |
Subject: |
Re: [PATCH] dd: add punchhole feature |
Date: |
Mon, 13 Feb 2017 19:34:04 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
On 13/02/17 13:32, Maxime de Roucy wrote:
> Le lundi 06 février 2017 à 20:19 -0800, Pádraig Brady a écrit :
>> On 03/02/17 04:58, address@hidden wrote:
>>> I sometimes face some machine with big log file that take 90% of
>>> partition space.
>>> If those logs are importants I can't just remove it to free space
>>> and have to archive it (gzip usually).
>>> But the log file + it's archive doesn't fit in the partition so I
>>> can't just `gzip my.log`.
>>> On situation like these I usually do :
>>>
>>> $ gzip -c my.log | dd of=my.log conv=notrunc
>>> …
>>> X bytes (…) copied, …
>>> $ truncate -s X my.log
>>>
>>> But when my.log is opened by another process it's not recommended ;
>>> as I would ending up with my.log containing a zip and new logs (non
>>> zipped) at the end.
>>>
>>> I end-up developing: https://github.com/tchernomax/dump-deallocate
>>> A some utility that output and deallocate (fallocate punch-hole) a
>>> file at the same time.
>>>
>>> I think it would be interesting to include this feature in dd so it
>>> would be possible to do:
>>>
>>> $ dd if=my.log conv=punchhole | gzip > my.log.gzip
>>
>> That's not a robust operation as if gzip fails for any reason
>> like disk full etc. some data will be lost.
>
> Indeed. I didn't think it was a problem as dd is a tool to use with
> care.
> I will add a warning in the man page.
>
>> So while punchhole functionality might be useful,
>> I'm not so sure about coupling it just with read()?
>> BTW there is already a punch_hole() function in copy.c
>> that should be reused if we were to add this.
>
> I will use this function.
>
>> The reason we haven't added just punchhole functionality to dd,
>> is because it's already available from fallocate(1).
>
> But fallocate can't output the data it erase.
>
>> It seems like a specialized tool to couple the following ops would be
>> required:
>>
>> while (read(chunk))
>> compress
>> write
>> if (sync())
>> collapse_range(chunk)
>>
>> Note I used collapse_range rather than punch_hole there
>> as that would probably simplify restarts for partial completions,
>> as only the unprocessed data would be left in the file.
>
> It would be the safest but it means compressing the file in dd.
> Which is not what this tool is for (AFAIK).
Right. I mentioned that would be the flow of a "specialized tool",
like the mooted `inplace` command, where the "compress" functionality
would be pluggable by specifying other commands.
> Also I think using collapse_range isn't a good idea. It become
> difficult to handle when the input file is write open by another
> process.
Hmm maybe. I'm not sure how offsets would be handled.
For an O_APPEND log file there wouldn't be an issue.
For random access there wouldn't be a worse issue compared to
punching a hole in the data.
Anyway thanks for the patch.
I'm still slightly against merging as it's a guaranteed way
to lose data if you ctrl-c the command or whatever.
I'll let others weigh in at this point.
cheers,
Pádraig