sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: avoid automatic newline when writing output


From: Assaf Gordon
Subject: Re: avoid automatic newline when writing output
Date: Wed, 4 Aug 2021 14:56:56 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0

Hello,

On 2021-07-28 12:27 p.m., Marco R. via wrote:
Hello,
I'm working with a log4j log file which carries HTTP Requests (URI, body, 
headers, ...) among the usual timestamp, thread id and so forth.

I need to preprocess the JSON inside the body of such requests before applying 
other processings to the whole line, hence I need to somehow extract part of 
each line, work on it, put the result in place of the original and write the 
whole line to output.

Or at least that's the way I thought it:

1- read a line
2- copy the text from line start to begin of request body to output
3- extract the request body and send it to an external tool (jq)4- copy the 
external tool result to output
5- copy the remaining text after request body to line end to output
6- process the next line

By playing with the pattern and hold spaces I managed to process the same input 
line multiple times, effectively achieving 2, 3, 4 and 5. But I got stuck on 
writing to output, because 'p' (or better the 'p' flag to the 's' command) 
always added a newline at the text, preventing me to write the partial result 
of each step to a single line in output.

It's the first time I use sed for more than a simple s/x/y job, hence I may 
have missed something from the docs: is there a way to avoid the automatic 
newline addition?

You may find more info (and how I ended up doing it) here: 
https://unix.stackexchange.com/questions/660332/how-to-avoid-sed-outputting-new-line-carriage-return


Certainly an interesting task, thanks for sharing your solution.

I can suggest few other ideas:

First,
Embrace the new lines, don't try to remove them :)
Most unix tools naturally operate on a "line" as a unit, so use that to your benefit:
If you already have the regex to extract the three parts of the line:
 1. before the "[{"
 2. inside the "[{ ... }]"
 3. after the  "}]"

You can break them into group of three lines, and process only the second of each group.

Example, here's the input:

    $ seq -f "foo[{%g}]bar" 5
    foo[{1}]bar
    foo[{2}]bar
    foo[{3}]bar
    foo[{4}]bar
    foo[{5}]bar

Now break it into groups of 3 lines:

  $ seq -f "foo[{%g}]bar" 5 | sed 's/\[{/\n/ ; s/}]/\n/'
  foo
  1
  bar
  foo
  2
  bar
  foo
  3
  bar
  foo
  4
  bar
  foo
  5
  bar

And it's very easy to handle only the second line of each triplet using
sed,awk and many others:

  $ seq -f "foo[{%g}]bar" 5 \
        | sed 's/\[{/\n/ ; s/}]/\n/' \
        | sed '2~3s/^/The magic number is /'
  foo
  The magic number is 1
  bar
  foo
  The magic number is 2
  bar
  foo
  The magic number is 3
  bar
  foo
  The magic number is 4
  bar
  foo
  The magic number is 5
  bar

or:

  $ seq -f "foo[{%g}]bar" 5 \
        | sed 's/\[{/\n/ ; s/}]/\n/' \
        | awk 'NR%3==2 { print $0 " is a number" ; next } 1;'

  foo
  1 is a number
  bar
  foo
  2 is a number
  bar
  foo
  3 is a number
  bar
  foo
  4 is a number
  bar
  foo
  5 is a number
  bar

Then combining every three lines can be done with 'sed' or  'paste':

  $ seq -f "foo[{%g}]bar" 5 \
     | sed 's/\[{/\n/ ; s/}]/\n/' \
     | sed '2~3s/^/The magic number is /' \
     | paste - - -
  foo   The magic number is 1   bar
  foo   The magic number is 2   bar
  foo   The magic number is 3   bar
  foo   The magic number is 4   bar
  foo   The magic number is 5   bar

 $ seq -f "foo[{%g}]bar" 5 \
     | sed 's/\[{/\n/ ; s/}]/\n/' \
     | sed '2~3s/^/The magic number is /' \
     | sed 'N;N;s/\n/\t/g'
  foo   The magic number is 1   bar
  foo   The magic number is 2   bar
  foo   The magic number is 3   bar
  foo   The magic number is 4   bar
  foo   The magic number is 5   bar


============================

Second,
When executing an external tool (like 'jq'), every invocation takes some
time and overhead.
As an alternative, you can separate the three lines into three separate
files, operate only on the second file (containing your json data), then merge them again:

First, split into three files:
  $ seq -f "foo[{%g}]bar" 5 \
       | sed 's/\[{/\n/ ; s/}]/\n/' \
       | sed -n -e '1~3wlines1' -e '2~3wlines2' -e '3~3wlines3'

  $ head lines*
  ==> lines1 <==
  foo
  foo
  foo
  foo
  foo

  ==> lines2 <==
  1
  2
  3
  4
  5

  ==> lines3 <==
  bar
  bar
  bar
  bar
  bar

Now you can run 'jq' once on the file 'lines2', and it will process each line as a JSON record, much more efficiently.

Lastly, join the three files into one:

  $ paste lines1 lines2 lines3
  foo   1       bar
  foo   2       bar
  foo   3       bar
  foo   4       bar
  foo   5       bar


=========================

Third,
In the good ol' days, this kind of task was on the border line
between using the common text programs and a short perl script, e.g.:

  $ seq -f "foo[{%g}]bar" 5 \
        | perl -F'/\[\{|\}\]/' -na \
               -e 'print "BEFORE: $F[0]  JSON: $F[1]    AFTER: $F[2]"'
  BEFORE: foo  JSON: 1    AFTER: bar
  BEFORE: foo  JSON: 2    AFTER: bar
  BEFORE: foo  JSON: 3    AFTER: bar
  BEFORE: foo  JSON: 4    AFTER: bar
  BEFORE: foo  JSON: 5    AFTER: bar


  $ seq -f "foo[{%g}]bar" 5 \
       | perl -F'/\[\{|\}\]/' -na \
              -e ' $a=`echo "$F[1]" | jq -e ".+4" ` ;
                   chomp $a ;
                   print $F[0],"\t",$a,"\t",$F[2] '
  foo     5       bar
  foo     6       bar
  foo     7       bar
  foo     8       bar
  foo     9       bar


Sadly Perl has fallen out of favor...

==============

Lastly, a small python script can load the JSON,
and I'm certain you can manipulate the JSON in python without even needing the execute 'jq':

  $ cat 1.py
  import json,re,sys

  regex = re.compile('^(.*)\[\{(.*)\}\](.*)$')

  for line in sys.stdin:
      m = regex.search(line)

      # Do something with the JSON using python's JSON module,
      # then convert back to string
      j = json.loads(m.group(2))
      jstr = json.dumps(j)

      out = [ m.group(1), jstr, m.group(3) ]
      print ("\t".join(out))

  $ seq -f "foo[{%g}]bar" 5 | python 1.py
  foo     1       bar
  foo     2       bar
  foo     3       bar
  foo     4       bar
  foo     5       bar



=====

Hope this helps.
regards,
 - assaf



reply via email to

[Prev in Thread] Current Thread [Next in Thread]