[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
inplace stream editing with tee --quiet --overwrite
From: |
Roman Czyborra |
Subject: |
inplace stream editing with tee --quiet --overwrite |
Date: |
Thu, 17 May 2001 06:20:40 +0200 (MEST) |
Dear fellow GNUzees,
how do you shorten (shuffle up) a huge logfile or mailbox
without losing the latest appends or the inode hardlinks?
There are several approaches to inplace editing that involve temporary
files. "cp -p A B && tail B > A && rm B" reuses A's inode but
requires enough disk space for two copies of the unshortened A and
desires A.lock to prevent lost appends during the lengthy copy
operation. The more appending "tail A > B && mv B A" approach leaves
anybody attached to the old inode in the rain and requires you to copy
the file permissions. Perl -i will do this for you but is even
quicker with its rename and basically does a "mv A B && tail B > A"
interspersing any incoming "date >> A" during the tail copy. Example:
flushlines=`grep -n '^From ' $mbox | tail -$count | head -1 | cut -d: -f1`
test "$flushlines" -gt 0 && perl -ne $flushlines'<$.||print' -i $mbox
So why don't we stream-edit such large files in place? Unix files can
be opened for simultaneous reading and writing with O_RDWR alias "r+"
in fopen() <stdio.h> or "+<" in perlfunc open but unavailable in bash
whose <> equals "w+" with O_TRUNC. Proof of concept:
into='open(A,"+<".shift);while(<>){print A};truncate(A,tell A)'
test "$flushlines" -gt 0 && tail -$flushlines $mbox | perl -e "$into" $mbox
This process fills A from the beginning without touching the end until
A is processed completely. There is only a minimal time slot between
the read EOF and the written truncate susceptible to data losses.
No extra inodes are put on disk nor temporary disk space needed.
But I find both Perl and the one-liner too big for such a basic task
and would prefer to abbreviate this into
tail -$flushlines $mbox | tee -qo $mbox
Why so? I found tee the simplest of all existing commands to redirect
output into named files. I found that I often don't need the cat-like
extra standard output produced by tee and just bear it because
tee file is easier to type than tee >file or tee file >/dev/null and
therefore I suggest a new option tee -q file that is quiet on stdout.
Furthermore I found that tee -a $mbox appends instead of overwriting
and plain tee $mbox truncates $mbox before it was read. Just like the
sort file > file truncation dilemma is solved by sort file -o file I
would love to get the nonsorting general-purpose GNU tee to overwrite
with delayed truncation and therefore suggest the following patch
*** sh-utils-2.0/src/tee.c 1999-07-26 09:09:42+02 2.0
--- sh-utils-2.0/src/tee.c 2001-05-17 03:56:12+02
*************** static int append;
*** 42,47 ****
--- 42,53 ----
/* If nonzero, ignore interrupts. */
static int ignore_interrupts;
+ /* If nonzero, overwrite without premature truncation */
+ static int overwrite;
+
+ /* If set to one, keep stdout quiet */
+ static int quiet;
+
/* The name that this program was run with. */
char *program_name;
*************** static struct option const long_options[
*** 49,54 ****
--- 55,62 ----
{
{"append", no_argument, NULL, 'a'},
{"ignore-interrupts", no_argument, NULL, 'i'},
+ {"overwrite", no_argument, NULL, 'o'},
+ {"quiet", no_argument, NULL, 'q'},
{GETOPT_HELP_OPTION_DECL},
{GETOPT_VERSION_OPTION_DECL},
{NULL, 0, NULL, 0}
*************** Copy standard input to each FILE, and al
*** 68,74 ****
\n\
-a, --append append to the given FILEs, do not overwrite\n\
-i, --ignore-interrupts ignore interrupt signals\n\
--help display this help and exit\n\
--version output version information and exit\n\
"));
puts (_("\nReport bugs to <address@hidden>."));
--- 76,84 ----
\n\
-a, --append append to the given FILEs, do not overwrite\n\
-i, --ignore-interrupts ignore interrupt signals\n\
+ -o, --overwrite overwrite without early truncation\n\
+ -q, --quiet do not copy to stdout\n\
--help display this help and exit\n\
--version output version information and exit\n\
"));
puts (_("\nReport bugs to <address@hidden>."));
*************** main (int argc, char **argv)
*** 90,96 ****
append = 0;
ignore_interrupts = 0;
! while ((optc = getopt_long (argc, argv, "ai", long_options, NULL)) != -1)
{
switch (optc)
{
--- 100,106 ----
append = 0;
ignore_interrupts = 0;
! while ((optc = getopt_long (argc, argv, "aiqo", long_options, NULL)) != -1)
{
switch (optc)
{
*************** main (int argc, char **argv)
*** 105,110 ****
--- 115,128 ----
ignore_interrupts = 1;
break;
+ case 'o':
+ overwrite = 1;
+ break;
+
+ case 'q':
+ quiet = 1;
+ break;
+
case_GETOPT_HELP_CHAR;
case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
*************** tee (int nfiles, const char **files)
*** 166,172 ****
char buffer[BUFSIZ];
int bytes_read, i;
int ret = 0;
! const char *mode_string = (append ? "a" : "w");
descriptors = (FILE **) xmalloc ((nfiles + 1) * sizeof (descriptors[0]));
--- 184,190 ----
char buffer[BUFSIZ];
int bytes_read, i;
int ret = 0;
! const char *mode_string = (overwrite ? "r+" : append ? "a" : "w");
descriptors = (FILE **) xmalloc ((nfiles + 1) * sizeof (descriptors[0]));
*************** tee (int nfiles, const char **files)
*** 206,213 ****
break;
/* Write to all NFILES + 1 descriptors.
! Standard output is the first one. */
! for (i = 0; i <= nfiles; i++)
{
if (descriptors[i] != NULL)
fwrite (buffer, bytes_read, 1, descriptors[i]);
--- 224,231 ----
break;
/* Write to all NFILES + 1 descriptors.
! Standard output is the first one unless --quiet. */
! for (i = quiet; i <= nfiles; i++)
{
if (descriptors[i] != NULL)
fwrite (buffer, bytes_read, 1, descriptors[i]);
*************** tee (int nfiles, const char **files)
*** 223,229 ****
/* Close the files, but not standard output. */
for (i = 1; i <= nfiles; i++)
if (descriptors[i] != NULL
! && (ferror (descriptors[i]) || fclose (descriptors[i]) == EOF))
{
error (0, errno, "%s", files[i]);
ret = 1;
--- 241,249 ----
/* Close the files, but not standard output. */
for (i = 1; i <= nfiles; i++)
if (descriptors[i] != NULL
! && (ferror (descriptors[i]) || overwrite &&
! ftruncate (fileno (descriptors[i]), ftell (descriptors[i])) == EOF
! || fclose (descriptors[i]) == EOF))
{
error (0, errno, "%s", files[i]);
ret = 1;
Is this viable? Who's maintaining the sh-utils?
http://mail.gnu.org/pipermail/bug-sh-utils/ is more verbose than
http://www.gnu.org/software/shellutils/shellutils.html
- inplace stream editing with tee --quiet --overwrite,
Roman Czyborra <=