bug-sh-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

inplace stream editing with tee --quiet --overwrite


From: Roman Czyborra
Subject: inplace stream editing with tee --quiet --overwrite
Date: Thu, 17 May 2001 06:20:40 +0200 (MEST)

Dear fellow GNUzees,

how do you shorten (shuffle up) a huge logfile or mailbox 
without losing the latest appends or the inode hardlinks?

There are several approaches to inplace editing that involve temporary
files.  "cp -p A B && tail B > A && rm B" reuses A's inode but
requires enough disk space for two copies of the unshortened A and
desires A.lock to prevent lost appends during the lengthy copy
operation.  The more appending "tail A > B && mv B A" approach leaves
anybody attached to the old inode in the rain and requires you to copy
the file permissions.  Perl -i will do this for you but is even
quicker with its rename and basically does a "mv A B && tail B > A"
interspersing any incoming "date >> A" during the tail copy.  Example:

 flushlines=`grep -n '^From ' $mbox | tail -$count | head -1 | cut -d: -f1`
 test "$flushlines" -gt 0 && perl -ne $flushlines'<$.||print' -i $mbox

So why don't we stream-edit such large files in place?  Unix files can
be opened for simultaneous reading and writing with O_RDWR alias "r+"
in fopen() <stdio.h> or "+<" in perlfunc open but unavailable in bash
whose <> equals "w+" with O_TRUNC.  Proof of concept:

 into='open(A,"+<".shift);while(<>){print A};truncate(A,tell A)'
 test "$flushlines" -gt 0 && tail -$flushlines $mbox | perl -e "$into" $mbox

This process fills A from the beginning without touching the end until
A is processed completely.  There is only a minimal time slot between
the read EOF and the written truncate susceptible to data losses.  
No extra inodes are put on disk nor temporary disk space needed.
But I find both Perl and the one-liner too big for such a basic task
and would prefer to abbreviate this into

 tail -$flushlines $mbox | tee -qo $mbox

Why so?  I found tee the simplest of all existing commands to redirect
output into named files.  I found that I often don't need the cat-like
extra standard output produced by tee and just bear it because 
tee file is easier to type than tee >file or tee file >/dev/null and
therefore I suggest a new option tee -q file that is quiet on stdout.
Furthermore I found that tee -a $mbox appends instead of overwriting
and plain tee $mbox truncates $mbox before it was read.  Just like the
sort file > file truncation dilemma is solved by sort file -o file I
would love to get the nonsorting general-purpose GNU tee to overwrite
with delayed truncation and therefore suggest the following patch

*** sh-utils-2.0/src/tee.c      1999-07-26 09:09:42+02  2.0
--- sh-utils-2.0/src/tee.c      2001-05-17 03:56:12+02
*************** static int append;
*** 42,47 ****
--- 42,53 ----
  /* If nonzero, ignore interrupts. */
  static int ignore_interrupts;
  
+ /* If nonzero, overwrite without premature truncation */
+ static int overwrite;
+ 
+ /* If set to one, keep stdout quiet */
+ static int quiet;
+ 
  /* The name that this program was run with. */
  char *program_name;
  
*************** static struct option const long_options[
*** 49,54 ****
--- 55,62 ----
  {
    {"append", no_argument, NULL, 'a'},
    {"ignore-interrupts", no_argument, NULL, 'i'},
+   {"overwrite", no_argument, NULL, 'o'},
+   {"quiet", no_argument, NULL, 'q'},
    {GETOPT_HELP_OPTION_DECL},
    {GETOPT_VERSION_OPTION_DECL},
    {NULL, 0, NULL, 0}
*************** Copy standard input to each FILE, and al
*** 68,74 ****
  \n\
    -a, --append              append to the given FILEs, do not overwrite\n\
    -i, --ignore-interrupts   ignore interrupt signals\n\
        --help                display this help and exit\n\
        --version             output version information and exit\n\
  "));
        puts (_("\nReport bugs to <address@hidden>."));
--- 76,84 ----
  \n\
    -a, --append              append to the given FILEs, do not overwrite\n\
    -i, --ignore-interrupts   ignore interrupt signals\n\
+   -o, --overwrite           overwrite without early truncation\n\
+   -q, --quiet               do not copy to stdout\n\
        --help                display this help and exit\n\
        --version             output version information and exit\n\
  "));
        puts (_("\nReport bugs to <address@hidden>."));
*************** main (int argc, char **argv)
*** 90,96 ****
    append = 0;
    ignore_interrupts = 0;
  
!   while ((optc = getopt_long (argc, argv, "ai", long_options, NULL)) != -1)
      {
        switch (optc)
        {
--- 100,106 ----
    append = 0;
    ignore_interrupts = 0;
  
!   while ((optc = getopt_long (argc, argv, "aiqo", long_options, NULL)) != -1)
      {
        switch (optc)
        {
*************** main (int argc, char **argv)
*** 105,110 ****
--- 115,128 ----
          ignore_interrupts = 1;
          break;
  
+       case 'o':
+         overwrite = 1;
+         break;
+ 
+       case 'q':
+         quiet = 1;
+         break;
+ 
        case_GETOPT_HELP_CHAR;
  
        case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
*************** tee (int nfiles, const char **files)
*** 166,172 ****
    char buffer[BUFSIZ];
    int bytes_read, i;
    int ret = 0;
!   const char *mode_string = (append ? "a" : "w");
  
    descriptors = (FILE **) xmalloc ((nfiles + 1) * sizeof (descriptors[0]));
  
--- 184,190 ----
    char buffer[BUFSIZ];
    int bytes_read, i;
    int ret = 0;
!   const char *mode_string = (overwrite ? "r+" : append ? "a" : "w");
  
    descriptors = (FILE **) xmalloc ((nfiles + 1) * sizeof (descriptors[0]));
  
*************** tee (int nfiles, const char **files)
*** 206,213 ****
        break;
  
        /* Write to all NFILES + 1 descriptors.
!        Standard output is the first one.  */
!       for (i = 0; i <= nfiles; i++)
        {
          if (descriptors[i] != NULL)
            fwrite (buffer, bytes_read, 1, descriptors[i]);
--- 224,231 ----
        break;
  
        /* Write to all NFILES + 1 descriptors.
!        Standard output is the first one unless --quiet.  */
!       for (i = quiet; i <= nfiles; i++)
        {
          if (descriptors[i] != NULL)
            fwrite (buffer, bytes_read, 1, descriptors[i]);
*************** tee (int nfiles, const char **files)
*** 223,229 ****
    /* Close the files, but not standard output.  */
    for (i = 1; i <= nfiles; i++)
      if (descriptors[i] != NULL
!       && (ferror (descriptors[i]) || fclose (descriptors[i]) == EOF))
        {
        error (0, errno, "%s", files[i]);
        ret = 1;
--- 241,249 ----
    /* Close the files, but not standard output.  */
    for (i = 1; i <= nfiles; i++)
      if (descriptors[i] != NULL
!       && (ferror (descriptors[i]) || overwrite && 
!           ftruncate (fileno (descriptors[i]), ftell (descriptors[i])) == EOF 
!       || fclose (descriptors[i]) == EOF))
        {
        error (0, errno, "%s", files[i]);
        ret = 1;

Is this viable?  Who's maintaining the sh-utils?
http://mail.gnu.org/pipermail/bug-sh-utils/ is more verbose than
http://www.gnu.org/software/shellutils/shellutils.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]