>From ff5c897bb898756d84c20282084609950d1a761c Mon Sep 17 00:00:00 2001 From: Assaf Gordon Date: Wed, 11 Aug 2021 11:33:48 -0600 Subject: [PATCH] sed: allow '0rFILE' (insert FILE before the first line) The 'r' command can be used with address zero, effectively prepending a file to the beginning of the input file, e.g.: sed '0rA.TXT' B.TXT > C.TXT is equivalent to: cat A.TXT B.TXT > C.TXT With "sed -i", this allows safe in-place prepending of files. A typical example would be adding a license header to multiple source files: sed -i '0rLICENSE' *.c *.h find -iname '*.cpp' | xargs sed -i '0rLICENSE' A current cumbersome alternative is: sed -i -e 'x;${p;x};1rA.TXT' -e '1d' B.TXT * NEWS: Mention new feature. * sed/sed.h (struct readcmd): New struct. (struct sed_cmd): Use new struct instead of a char* for the filename. * sed/compile.c (compile_program): Expand conditional detecting invalid usage of "0" address to allow "0r"; Adjust '0r' to '1r' with prepending (instead of appending). * sed/execute.c (execute_program): 'r' command: support prepending. * sed/debug.c (debug_print_function): Use the new 'struct readcmd'. * testsuite/cmd-0r.sh: New test. * testsuite/local.mk (TESTS): Add new test. * doc/sed.texi (Zero Address): New section. (Adding a header to multiple files): New example section. --- NEWS | 5 ++ doc/sed.texi | 152 ++++++++++++++++++++++++++++++++++++++++++-- sed/compile.c | 19 +++++- sed/debug.c | 2 +- sed/execute.c | 13 +++- sed/sed.h | 7 +- testsuite/cmd-0r.sh | 70 ++++++++++++++++++++ testsuite/local.mk | 1 + 8 files changed, 258 insertions(+), 11 deletions(-) create mode 100755 testsuite/cmd-0r.sh diff --git a/NEWS b/NEWS index 45593af..e415427 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,11 @@ GNU sed NEWS -*- outline -*- using the R command to read an input line of length longer than 2GB can no longer trigger an out-of-bounds memory read. +** New Features + + The 'r' command now accepts address 0, allowing inserting a file before + the first line. + * Noteworthy changes in release 4.8 (2020-01-14) [stable] diff --git a/doc/sed.texi b/doc/sed.texi index 16d0a9c..06ce697 100644 --- a/doc/sed.texi +++ b/doc/sed.texi @@ -1511,6 +1511,10 @@ standard input. As a GNU extension, this command accepts two addresses. The file will then be reread and inserted on each of the addressed lines. +As a @value{SSED} extension, the @code{r} command accepts a zero address, +inserting a file @emph{before} the first line of the input. +@pxref{Adding a header to multiple files}. + @item w @var{filename} @findex w (write file) command @cindex Write to a file @@ -2046,6 +2050,7 @@ $ ls -1 * Numeric Addresses:: selecting lines by numbers * Regexp Addresses:: selecting lines by text matching * Range Addresses:: selecting a range of lines +* Zero Address:: Using address @code{0} @end menu @node Addresses overview @@ -2380,6 +2385,7 @@ $ seq 10 | sed -n '4,1p' 4 @end example +@anchor{Zero Address Regex Range} @cindex Special addressing forms @cindex Range with start address of zero @cindex Zero, as range start address @@ -2404,10 +2410,6 @@ the @code{1,/@var{regexp}/} form will match the beginning of its range and hence make the range span up to the @emph{second} occurrence of the regular expression. -Note that this is the only place where the @code{0} address makes -sense; there is no 0-th line and commands which are given the @code{0} -address in any other way will give an error. - The following examples demonstrate the difference between starting with address 1 and 0: @@ -2452,6 +2454,22 @@ $ seq 10 | sed -n '6,~4p' +@node Zero Address +@section Zero Address +@cindex Zero Address +As a @value{SSED} extension, @code{0} address can be used in two cases: +@enumerate +@item +In a regex range addresses as @code{0,/@var{regexp}/} (@pxref{Zero Address Regex Range}). +@item +With the @code{r} command, inserting a file before the first line (@pxref{Adding a header to multiple files}). +@end enumerate + +Note that these are the only places where the @code{0} address makes +sense; Commands which are given the @code{0} address in any +other way will give an error. + + @node sed regular expressions @chapter Regular Expressions: selecting text @@ -4130,6 +4148,7 @@ Some exotic examples: * Reverse chars of lines:: * Text search across multiple lines:: * Line length adjustment:: +* Adding a header to multiple files:: Emulating standard utilities: * tac:: Reverse lines of files @@ -4802,6 +4821,131 @@ t was the age of foolishness, +@node Adding a header to multiple files +@section Adding a header to multiple files + +@value{SSED} can be used to safely modify multiple files at once. + +@exdent Add a single line to the beginning of source code files: + +@codequoteundirected on +@codequotebacktick on +@example +sed -i '1i/* Copyright (C) FOO BAR */' *.c +@end example +@codequoteundirected off +@codequotebacktick off + +@exdent Adding few lines is possible using @samp{\n} in the text: + +@codequoteundirected on +@codequotebacktick on +@example +sed -i '1i/*\n * Copyright (C) FOO BAR\n * Created by Jane Doe\n */' *.c +@end example +@codequoteundirected off +@codequotebacktick off + +To add multiple lines from another file, use @code{0rFILE}. +A typical use case is adding a license notice header to all files: + +@codequoteundirected on +@codequotebacktick on +@example +## Create the header file: +$ cat<<'EOF'>LIC.TXT +/* + Copyright (C) 1989-2021 FOO BAR + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; If not, see . +*/ +EOF + +## Add the file at the beginning of all source code files: +$ sed -i '0rLIC.TXT' *.cpp *.h +@end example +@codequoteundirected off +@codequotebacktick off + + +With script files (e.g. @file{.sh},@file{.py},@file{.pl} files) +the license notice typically appears @emph{after} the first line (the +'shebang' @samp{#!} line). The @code{1rFILE} command will add @file{FILE} +@emph{after} the first line: + +@codequoteundirected on +@codequotebacktick on +@example +## Create the header file: +$ cat<<'EOF'>LIC.TXT +## +## Copyright (C) 1989-2021 FOO BAR +## +## This program is free software; you can redistribute it and/or modify +## it under the terms of the GNU General Public License as published by +## the Free Software Foundation; either version 3, or (at your option) +## any later version. +## +## This program is distributed in the hope that it will be useful, +## but WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +## GNU General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with this program; If not, see . +## +## +EOF + +## Add the file at the beginning of all source code files: +$ sed -i '1rLIC.TXT' *.py *.sh +@end example +@codequoteundirected off +@codequotebacktick off + +The above @command{sed} commands can be combined with @command{find} +to locate files all subdirectories, @command{xargs} to run additional +commands on found files, @command{grep} to filter out files that already +contain a copyright notice: + +@codequoteundirected on +@codequotebacktick on +@example +find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) \ + | xargs grep -Li copyright \ + | xargs -r sed -i '0rLIC.TXT' +@end example +@codequoteundirected off +@codequotebacktick off + +@exdent Or a slightly safe version (handling files with spaces and newlines): + +@codequoteundirected on +@codequotebacktick on +@example +find \( -iname '*.cpp' -o -iname '*.c' -o -iname '*.h' \) -print0 \ + | xargs -0 grep -Z -Li copyright \ + | xargs -0 -r sed -i '0rLIC.TXT' +@end example +@codequoteundirected off +@codequotebacktick off + +Note: using the @code{0} address with @code{r} command requires @value{SSED} +version 4.9 or later. @xref{Zero Address}. + + + @node tac @section Reverse Lines of Files diff --git a/sed/compile.c b/sed/compile.c index 6d12f74..1d7d8ee 100644 --- a/sed/compile.c +++ b/sed/compile.c @@ -1021,7 +1021,8 @@ compile_program (struct vector *vector) if ((cur_cmd->a1->addr_type == ADDR_IS_NUM && cur_cmd->a1->addr_number == 0) - && ((!cur_cmd->a2 || cur_cmd->a2->addr_type != ADDR_IS_REGEX) + && ((!cur_cmd->a2 && ch != 'r') + || (cur_cmd->a2 && cur_cmd->a2->addr_type != ADDR_IS_REGEX) || posixicity == POSIXLY_BASIC)) bad_prog (_(INVALID_LINE_0)); } @@ -1196,7 +1197,21 @@ compile_program (struct vector *vector) b = read_filename (); if (strlen (get_buffer (b)) == 0) bad_prog (_(MISSING_FILENAME)); - cur_cmd->x.fname = xstrdup (get_buffer (b)); + cur_cmd->x.readcmd.fname = xstrdup (get_buffer (b)); + + /* Adjust '0rFILE' command to '1rFILE' in prepend mode */ + if (cur_cmd->a1 + && cur_cmd->a1->addr_type == ADDR_IS_NUM + && cur_cmd->a1->addr_number == 0 + && !cur_cmd->a2) + { + cur_cmd->a1->addr_number = 1; + cur_cmd->x.readcmd.append = false; + } + else + { + cur_cmd->x.readcmd.append = true; + } free_buffer (b); break; diff --git a/sed/debug.c b/sed/debug.c index 0530496..99f8b87 100644 --- a/sed/debug.c +++ b/sed/debug.c @@ -363,7 +363,7 @@ debug_print_function (const struct vector *program, const struct sed_cmd *sc) case 'r': putchar (' '); - fputs (sc->x.fname, stdout); + fputs (sc->x.readcmd.fname, stdout); break; case 'R': diff --git a/sed/execute.c b/sed/execute.c index bae5735..defa376 100644 --- a/sed/execute.c +++ b/sed/execute.c @@ -1509,10 +1509,17 @@ execute_program (struct vector *vec, struct input *input) return cur_cmd->x.int_arg == -1 ? 0 : cur_cmd->x.int_arg; case 'r': - if (cur_cmd->x.fname) + if (cur_cmd->x.readcmd.fname) { - struct append_queue *aq = next_append_slot (); - aq->fname = cur_cmd->x.fname; + if (cur_cmd->x.readcmd.append) + { + struct append_queue *aq = next_append_slot (); + aq->fname = cur_cmd->x.readcmd.fname; + } + else + { + print_file (cur_cmd->x.readcmd.fname, output_file.fp); + } } break; diff --git a/sed/sed.h b/sed/sed.h index 64bf17c..78117e7 100644 --- a/sed/sed.h +++ b/sed/sed.h @@ -57,6 +57,11 @@ struct regex { char re[1]; }; +struct readcmd { + char *fname; + bool append; /* true: append (default); false: prepend (gnu extension) */ +}; + enum replacement_types { REPL_ASIS = 0, REPL_UPPERCASE = 1, @@ -158,7 +163,7 @@ struct sed_cmd { countT jump_index; /* This is used for the r command. */ - char *fname; + struct readcmd readcmd; /* This is used for the hairy s command. */ struct subst *cmd_subst; diff --git a/testsuite/cmd-0r.sh b/testsuite/cmd-0r.sh new file mode 100755 index 0000000..bf48e6a --- /dev/null +++ b/testsuite/cmd-0r.sh @@ -0,0 +1,70 @@ +#!/bin/sh +# Test '0rFILE' command + +# Copyright (C) 2021 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +. "${srcdir=.}/testsuite/init.sh"; path_prepend_ ./sed +print_ver_ sed + +cat <<\EOF >in1 || framework_failure_ +HELLO +WORLD +EOF + +cat <<\EOF >in2 || framework_failure_ +1 +2 +3 +EOF + +cat <<\EOF >exp1 || framework_failure_ +HELLO +WORLD +1 +2 +3 +EOF + +cat <<\EOF >exp2 || framework_failure_ +1 +HELLO +WORLD +2 +HELLO +WORLD +3 +EOF + +cat <<\EOF> exp-err-addr0 || framework_failure_ +sed: -e expression #1, char 4: invalid usage of line address 0 +EOF + +# Typical usage +sed '0rin1' in2 >out1 || fail=1 +compare_ exp1 out1 || fail=1 + +# Ensure no regression for '0,/REGEXP/r' +sed '0,/2/rin1' in2 >out2 || fail=1 +compare_ exp2 out2 || fail=1 + +# Ensure '0r' doesn't accept a numeric address range +returns_ 1 sed '0,4rin1' in2 2>err3 || fail=1 +compare_ exp-err-addr0 err3 || fail=1 + +# Test with -i +sed -i '0rin1' in2 || fail=1 +compare_ exp1 in2 || fail=1 + +Exit $fail diff --git a/testsuite/local.mk b/testsuite/local.mk index 8ffc505..7561c03 100644 --- a/testsuite/local.mk +++ b/testsuite/local.mk @@ -47,6 +47,7 @@ T = \ testsuite/bug32271-1.sh \ testsuite/bug32271-2.sh \ testsuite/cmd-l.sh \ + testsuite/cmd-0r.sh \ testsuite/cmd-R.sh \ testsuite/colon-with-no-label.sh \ testsuite/comment-n.sh \ -- 2.20.1