[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFC: dd oflag=trunc to support in place filtering of files
From: |
Pádraig Brady |
Subject: |
Re: RFC: dd oflag=trunc to support in place filtering of files |
Date: |
Fri, 06 Jun 2014 12:21:51 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 06/06/2014 07:34 AM, Bernhard Voelker wrote:
> On 06/05/2014 03:27 PM, Pádraig Brady wrote:
>> The thought just occurred to me that this could be useful
>> to filter large files in place? For example:
>>
>> grep whatever file.big | dd bs=1M conv=notrunc oflag=trunc
>
> I guess you meant this:
>
> grep whatever file.big | dd bs=1M conv=notrunc oflag=trunc \
> of=file.big
>
right
>> That would assume that grep never outputs more than it reads,
>> and would issue a final truncate along the lines of:
>>
>> ftruncate(STDOUT_FILENO, lseek(STDOUT_FILENO, 0, SEEK_CUR));
>>
>> Useful enough to add?
>
> While it sounds very useful, it looks like a powerful
> way to shoot oneself in the foot, e.g. when the producer
> command aborts
>
> grep --unknown PAT file | dd ...
> grep: unrecognized option '--unknown'
>
> ... then dd probably wouldn't be able to detect
> the failure and truncate the file - so the original data would
> be lost.
Good point. Also if there was an I/O error reading the file,
dd would nuke any data after that.
> Second, regarding the already mentioned restriction that the
> producer doesn't output more data than the original size of
> the input file, e.g.
>
> cat -n file | dd conv=notrunc of=file ...
>
> Is this really an issue? It (surprisingly!) already seems to
> work, even with "obs=1". And if it is, how could we detect this?
This could be working due to readahead buffering in the kernel,
but would not be general and fail eventually.
> As a side note, "oflag=trunc" may not be enough to describe
> what it does ... it truncates the output file *after* the
> data copying. So what about something like "oflag=truncpost"?
Yes better.
Given the I/O error handling above I'm not sure thie is a feasible option.
thanks,
Pádraig.