bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

feature: expose POSITION of parsed columns (fields) as a variable/functi


From: Vla D
Subject: feature: expose POSITION of parsed columns (fields) as a variable/function?
Date: Thu, 13 May 2021 03:42:29 -0400

Example 1: try to modify a (first) column and preserve the remaining
columns intact - it ruins the formatting of the remaining columns by
re-joining those via OFS:

> $ echo -e " a  be   c  d  e" | awk '$1="A"'
> A be c d e

if awk would've had a function/variable telling me that the 2nd column "be"
starts at 5th character of $0 - I could've done it easily (preserving all
spaces/tabs/etc):

> $ echo -e " a  be   c  d  e" | awk '{print "A
"substr($0,FPOS[2])}BEGIN{FPOS[2]=5}'
> A be   c  d  e

Example 2: try to join data from columns 3-N by keys stored in columns 1
and 2 without losing formatting:

> $ echo -e '1 k1   g o d   i s   n o w   h e r e !\n2 k1   p e n   i s   b
r o k e n     !'
> 1 k1   g o d   i s   n o w   h e r e !
> 2 k1   p e n   i s   b r o k e n     !
>
> $ echo -e '1 k1   g o d   i s   n o w   h e r e !\n2 k1   p e n   i s   b
r o k e n     !' \
> | awk '{file=$1;key=$2;$1=$2="";data=$0;O[key]=O[key]data}END{print
O["k1"]}'
>   g o d i s n o w h e r e !  p e n i s b r o k e n !

if awk would've had a function/variable telling me that the 3rd column ("g"
on 1st line or "p" on 2nd line of input) starts at 8th character:

> $ echo -e '1 k1   g o d   i s   n o w   h e r e !\n2 k1   p e n   i s   b
r o k e n     !' \
> | awk '{data=substr($0,FPOS[3]);O[$2]=O[$2]" "data}END{print
O["k1"]}BEGIN{FPOS[3]=8}'
>  g o d   i s   n o w   h e r e ! p e n   i s   b r o k e n     !

in order to properly "calculate" the position of that 8th character
on-the-fly - there's quite a big computational overhead (see workarounds
and the performance of those):

https://unix.stackexchange.com/questions/649159/awk-how-can-i-tell-where-column-begins

the odd thing is that in order to access the column $3 - the awk itself has
to find that exact position, and it KNOWS that it's the 8th character of
$0, it just doesn't tell us (doesn't expose this valuable information via
any variable/function) :(

any chance to have this feature? I can add more useful examples (all to
access a portion of $0 by column-number WITHOUT ruining the formatting of
the input)

OffTopic1: if awk would've allowed to treat strings as
pointers-to-first-char - we could've just calculated `1 + $3 - $0` (one
plus mem address of 8th char minus mem address of 1st char) = 8, but this
is from non-awk universe

OffTopic2: if split($0,flds,FS,seps) could've been made "lazy", e.g. to
only do the actual parsing only up to the field at the moment when we use
the filed (flds[3]) - this might've added enough performance to the
workarounds of original issue, BUT this sounds waaay more complex to
implement than just exposing the desired value as a function....


reply via email to

[Prev in Thread] Current Thread [Next in Thread]