bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: article about gawk best practices in data science and feature propos


From: Jean-Philippe Guérard
Subject: Re: article about gawk best practices in data science and feature proposal
Date: Thu, 11 Feb 2021 21:54:42 +0100

> Ivan Molineris <ivan.molineris@gmail.com> wrote:
> > Moreover, one of the biggest drawbacks of gawk in our field is the
> > fact that, indicating the columns of the input by numbers often
> > produces hard to read scripts.
> > For this reason in the wrapper I commonly use it is possible to
> > refer to columns not only by number, but also by name.
> >
> > For example, if a file is composed like this:
> >
> > chromosome     start        end
> >       chr1       241      53521
> >       chr1       363      43623
> >       chr2      5243     234562
> >
> > gawk '{l=$2-$1}'
> > can be also written as
> > gawk '{l=$end-$start}'

This might be done by rewriting the command line to add the headers
values (by reading the first line of each file). The following small
library does just that:

------------- process_headers.awk -------------
@namespace "process_args"
BEGIN {
  n = split("",newargs)
  for(i=1;i<ARGC;i++){
    file = ARGV[i]
    if(file!~/=/){
      l = split("",headers)
      if((getline line < file)>0){
        l = split(line,headers)
      }
      close(file)
      for(j=1;j<=l;j++){
        n++
        newargs[n]= headers[j] "=" j
      }
    }
    n++
    newargs[n]=file
  }
  ARGC=length(newargs)+1
  for(i=1;i<ARGC;i++){
    ARGV[i] = newargs[i]
  }
}
--------------------------------------------------

Then, the needed variables would be automatically added to the command
line:

gawk -i process_headers.awk 'FNR > 1 { print $start }' test.txt test2.txt

The arguments will be rewritten from:

test.txt test2.txt

To something like:

chromosome=1 start=2 end=3 test.txt chromosome=1 start=2 end=3 test2.txt

This might be limited by the numbers of columns you have, which might
overrun the maximum number of arguments (I have no idea what the limit
is). So it might not scale as you need.

HTH.

-- 
Jean-Philippe Guérard
https://tigrerayé.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]