[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: inplace stream editing with tee --quiet --overwrite
From: |
Jim Meyering |
Subject: |
Re: inplace stream editing with tee --quiet --overwrite |
Date: |
24 May 2001 07:56:45 +0200 |
User-agent: |
Gnus/5.090003 (Oort Gnus v0.03) Emacs/21.0.104 |
Roman Czyborra <address@hidden> wrote:
| Dear fellow GNUzees,
|
| how do you shorten (shuffle up) a huge logfile or mailbox
| without losing the latest appends or the inode hardlinks?
|
| There are several approaches to inplace editing that involve temporary
| files. "cp -p A B && tail B > A && rm B" reuses A's inode but
| requires enough disk space for two copies of the unshortened A and
| desires A.lock to prevent lost appends during the lengthy copy
| operation. The more appending "tail A > B && mv B A" approach leaves
| anybody attached to the old inode in the rain and requires you to copy
| the file permissions. Perl -i will do this for you but is even
| quicker with its rename and basically does a "mv A B && tail B > A"
| interspersing any incoming "date >> A" during the tail copy. Example:
|
| flushlines=`grep -n '^From ' $mbox | tail -$count | head -1 | cut -d: -f1`
| test "$flushlines" -gt 0 && perl -ne $flushlines'<$.||print' -i $mbox
|
| So why don't we stream-edit such large files in place? Unix files can
| be opened for simultaneous reading and writing with O_RDWR alias "r+"
| in fopen() <stdio.h> or "+<" in perlfunc open but unavailable in bash
| whose <> equals "w+" with O_TRUNC. Proof of concept:
|
| into='open(A,"+<".shift);while(<>){print A};truncate(A,tell A)'
| test "$flushlines" -gt 0 && tail -$flushlines $mbox | perl -e "$into" $mbox
|
| This process fills A from the beginning without touching the end until
| A is processed completely. There is only a minimal time slot between
| the read EOF and the written truncate susceptible to data losses.
| No extra inodes are put on disk nor temporary disk space needed.
| But I find both Perl and the one-liner too big for such a basic task
| and would prefer to abbreviate this into
|
| tail -$flushlines $mbox | tee -qo $mbox
|
| Why so? I found tee the simplest of all existing commands to redirect
| output into named files. I found that I often don't need the cat-like
| extra standard output produced by tee and just bear it because
| tee file is easier to type than tee >file or tee file >/dev/null and
| therefore I suggest a new option tee -q file that is quiet on stdout.
| Furthermore I found that tee -a $mbox appends instead of overwriting
| and plain tee $mbox truncates $mbox before it was read. Just like the
| sort file > file truncation dilemma is solved by sort file -o file I
| would love to get the nonsorting general-purpose GNU tee to overwrite
| with delayed truncation and therefore suggest the following patch
Thanks!
I like the new options.
Would you please make the following changes?
- remove the short option names -o and -q. They might conflict with short
options in another version of tee or in some future standards spec.
To use these new features, people will have to use the long options,
--overwrite or --o.
I.e., replace `'o'' in the long_options initializer list with
`OVERWRITE_OPTION'
where OVERWRITE_OPTION is defined like this
enum
{
OVERWRITE_OPTION = CHAR_MAX + 1,
QUIET_OPTION
...
}
- include diffs to doc/sh-utils.texi that describe the new options and
give an example (I like the one above) showing how they're useful
- fail if --overwrite is used but no file is specified,
and add a line under usage()'s Usage: to reflect this. i.e.:
Usage: tee [OPTION]... [FILE]...
or: tee [OPTION]... --overwrite FILE...
Please make your changes relative to the latest test release
ftp://alpha.gnu.org/gnu/fetish/sh-utils-2.0.11.tar.gz
and send `--unidiff' style diffs -- again to address@hidden
Jim
| *** sh-utils-2.0/src/tee.c 1999-07-26 09:09:42+02 2.0
| --- sh-utils-2.0/src/tee.c 2001-05-17 03:56:12+02
| *************** static int append;
...