[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How do I pass an arbitrary value to awk?
From: |
Emanuele Torre |
Subject: |
Re: How do I pass an arbitrary value to awk? |
Date: |
Thu, 20 Apr 2023 04:29:30 +0200 |
User-agent: |
Mutt/2.2.10 (2023-03-25) |
On Thu, Apr 20, 2023 at 03:07:12AM +0300, Yuri Kanivetsky wrote:
> Hi,
>
> As far as I can see variables passed via command line are preprocessed:
>
> https://www.gnu.org/software/gawk/manual/html_node/Assignment-Options.html
>
> and to pass an arbitrary value one needs to only escape backslashes:
>
> v='...'
> v=`echo "$v" | sed -E 's/\\/\\\\/g'`
> awk PROGRAM "n=$v" INPUT
>
> Right?
>
> Regards,
> Yuri
>
Yes, to pass aritrary values to awk, you will need to escape
backslashes because they are special, but your solution is not great
because of two things:
1) Use of echo:
echo "$v" will potentially interpret its argument as options, so
if "$v" is the string "-n", it will print nothing; if it is "-Eee"
in bash, it print an empty line, and in general, you cannot expect a
reliably portable result when using echo "$v" and v starts with "-"
This is already bad enough since you want to pass arbitrary data, and
"-n", and "-Enenneeene" won't be printed at all.
But echo is even more bad than that, echo itself may (very likely)
expands backslash sequence too. Not in all shell (most notably not in
bash, ksh93, yash, and busybox), but in most shells, echo expands
sequences in its argument, so echo 'hi\nb' will actually print
hi<newline>b<newline>, not hi<backslash>nb<newline>, and echo '\\'
will print <newline><newline>. Shells that expands those sequences
include dash (debian's default sh), zsh, the pdksh and its forks
(oksh, mksh, ...), and the bash 3.2 distributed with MacOS, that has
been patched to turn shopt -s xpg_echo bash setting on by default.
So echo is definitely not good, printf with %s is a better solution
if you ever need to do something like this.
2) Use of ``:
``, just like $(), remove all trailing newline characters from the
end of the output, so if you have a string that is
a<newline>b<newline><newline>, and use a=$(printf %s\\n "$string")
a will actually be just a<newline>b since all the trailing newlines
have been removed.
SIDE NOTE:
It is also worth mentioning that your code is not even valid. Inside
a `` command substitution, \\ is a single backslash.
With that you are actually running sed 's/\/\\/g' that is not valid
sed code and will error.
`` is sort of similar to a quoted string in sh, it contains a string
that is evaluated, and expands to the output generated by evaluating
that string. Since it is a "string", that is terminated by a `, in
case you want to nest `` substitution, the \` sequences is supported
and expanded to a single ` that does not terminate the string,
since there is \`, \\ is also supported in case you want to write a
literal backslash, and \\ expands to a single backslash.
The correct way to write your code using `` would be:
v=`... | sed -E 's/\\\\/\\\\\\\\/g'`
Or you could just use $() that was introduced to replace the ancient
and obsolete `` syntax that has the problems mentioned above (to
mention other problems, nesting `` requires lots of backslashing,
and, since the content of `` is not code, but a string that gets
evaluated in a separate process, the interpreter cannot detect syntax
errors inside ``).
$() behaves roughly the same way `` does, but it is more like an
"expression", than like a "string"; it contains code that is parsed
by the interpreted, not characters that are evaluated later, so the
interpreter can detect syntax errors in the code, and you can write
any arbitrary valid code in them without having to care about
using backslashes; with $(), instead of ``, your original code just
works:
v=$(... | sed -E 's/\\/\\\\/g')
In any case, as mentioned, you cannot really use `` or $() for stuff
like this.
In bash, and ksh93 (and some other shells), you can use the
"${v//\\/\\\\}" parameter expansion to double all backslashes.
So something like awk PROGRAM "v=${v//\\/\\\\}" INPUT will work...
at least if awk is GNU awk... it won't work with nawk.
The approach of using -v v=ESCAPEDSTRING or
awk PROGRAM v=ESCAPEDSTRING is kind of hopeless, various awk
interpreters have different quirks for it. The quirk of nawk's is that
a literal newline is always a syntax error:
$ var=$'a\\\nb\\nhi'
$ printf _%s_\\n "$var"
_a\
b\nhi_
$ varesc=${var//\\/\\\\}
$ printf _%s_\\n "$varesc"
_a\\
b\\nhi_
$ gawk -v v="$varesc" 'BEGIN{printf "_%s_\n", v}'
_a\
b\nhi_
$ nawk -v v="$varesc" 'BEGIN{printf "_%s_\n", v}'
nawk: newline in string a\\
b\\nhi... at source line 1
Thankfully there is a very easy and portable way to pass strings to awk
without having to care about any of this: environment variables!
$ var=$'a\\\nb\\nhi'
$ printf _%s_\\n "$var"
_a\
b\nhi_
$ v=$var awk 'BEGIN {printf "_%s_\n", ENVIRON["v"]}'
_a\
b\nhi_
$ # alternatively, instead of v=$var ... awk you can export var, and
$ # use ENVIRON["var"] in the awk script.
Environment variables are obviously the best option. :)
emanuele6