help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How do I pass an arbitrary value to awk?


From: Emanuele Torre
Subject: Re: How do I pass an arbitrary value to awk?
Date: Thu, 20 Apr 2023 04:29:30 +0200
User-agent: Mutt/2.2.10 (2023-03-25)

On Thu, Apr 20, 2023 at 03:07:12AM +0300, Yuri Kanivetsky wrote:
> Hi,
> 
> As far as I can see variables passed via command line are preprocessed:
> 
> https://www.gnu.org/software/gawk/manual/html_node/Assignment-Options.html
> 
> and to pass an arbitrary value one needs to only escape backslashes:
> 
> v='...'
> v=`echo "$v" | sed -E 's/\\/\\\\/g'`
> awk PROGRAM "n=$v" INPUT
> 
> Right?
> 
> Regards,
> Yuri
> 

Yes, to pass aritrary values to awk, you will need to escape
backslashes because they are special, but your solution is not great
because of two things:

 1) Use of echo:

   echo "$v"   will potentially interpret its argument as options, so
   if "$v" is the string "-n", it will print nothing; if it is "-Eee"
   in bash, it print an empty line, and in general, you cannot expect a
   reliably portable result when using  echo "$v"  and v starts with "-"

   This is already bad enough since you want to pass arbitrary data, and
   "-n", and "-Enenneeene" won't be printed at all.

   But echo is even more bad than that, echo itself may (very likely)
   expands backslash sequence too. Not in all shell (most notably not in
   bash, ksh93, yash, and busybox), but in most shells, echo expands
   sequences in its argument, so   echo 'hi\nb'  will actually print
   hi<newline>b<newline>, not hi<backslash>nb<newline>, and  echo '\\'
   will print  <newline><newline>. Shells that expands those sequences
   include dash (debian's default sh), zsh, the pdksh and its forks
   (oksh, mksh, ...), and the bash 3.2 distributed with MacOS, that has
   been patched to turn  shopt -s xpg_echo  bash setting on by default.

   So echo is definitely not good, printf with %s is a better solution
   if you ever need to do something like this.

 2) Use of ``:

   ``, just like $(), remove all trailing newline characters from the
   end of the output, so if you have a string that is
   a<newline>b<newline><newline>, and use  a=$(printf %s\\n "$string")
   a will actually be just a<newline>b since all the trailing newlines
   have been removed.

   SIDE NOTE:
   It is also worth mentioning that your code is not even valid. Inside
   a `` command substitution, \\ is a single backslash.
   With that you are actually running  sed 's/\/\\/g'  that is not valid
   sed code and will error.

   `` is sort of similar to a quoted string in sh, it contains a string
   that is evaluated, and expands to the output generated by evaluating
   that string. Since it is a "string", that is terminated by a `, in
   case you want to nest `` substitution, the \` sequences is supported
   and expanded to a single ` that does not terminate the string,
   since there is \`, \\ is also supported in case you want to write a
   literal backslash, and \\ expands to a single backslash.

   The correct way to write your code using `` would be:

     v=`... | sed -E 's/\\\\/\\\\\\\\/g'`

   Or you could just use $() that was introduced to replace the ancient
   and obsolete `` syntax that has the problems mentioned above (to
   mention other problems, nesting `` requires lots of backslashing,
   and, since the content of `` is not code, but a string that gets
   evaluated in a separate process, the interpreter cannot detect syntax
   errors inside ``).

   $() behaves roughly the same way `` does, but it is more like an
   "expression", than like a "string"; it contains code that is parsed
   by the interpreted, not characters that are evaluated later, so the
   interpreter can detect syntax errors in the code, and you can write
   any arbitrary valid code in them without having to care about
   using backslashes; with $(), instead of ``, your original code just
   works:

    v=$(... | sed -E 's/\\/\\\\/g')

  In any case, as mentioned, you cannot really use `` or $() for stuff
  like this.

In bash, and ksh93 (and some other shells), you can use the
"${v//\\/\\\\}" parameter expansion to double all backslashes.

So something like  awk PROGRAM "v=${v//\\/\\\\}" INPUT  will work...
at least if awk is GNU awk... it won't work with nawk.

The approach of using  -v v=ESCAPEDSTRING   or
awk PROGRAM v=ESCAPEDSTRING   is kind of hopeless, various awk
interpreters have different quirks for it. The quirk of nawk's is that
a literal newline is always a syntax error:

  $ var=$'a\\\nb\\nhi'
  $ printf _%s_\\n "$var"
  _a\
  b\nhi_
  $ varesc=${var//\\/\\\\}
  $ printf _%s_\\n "$varesc"
  _a\\
  b\\nhi_
  $ gawk -v v="$varesc" 'BEGIN{printf "_%s_\n", v}'
  _a\
  b\nhi_
  $ nawk -v v="$varesc" 'BEGIN{printf "_%s_\n", v}'
  nawk: newline in string a\\
  b\\nhi... at source line 1

Thankfully there is a very easy and portable way to pass strings to awk
without having to care about any of this: environment variables!

  $ var=$'a\\\nb\\nhi'
  $ printf _%s_\\n "$var"
  _a\
  b\nhi_
  $ v=$var awk 'BEGIN {printf "_%s_\n", ENVIRON["v"]}'
  _a\
  b\nhi_
  $ # alternatively, instead of  v=$var ... awk  you can export var, and
  $ # use ENVIRON["var"] in the awk script.

Environment variables are obviously the best option. :)
 emanuele6



reply via email to

[Prev in Thread] Current Thread [Next in Thread]