[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: avoid automatic newline when writing output
From: |
Assaf Gordon |
Subject: |
Re: avoid automatic newline when writing output |
Date: |
Wed, 4 Aug 2021 14:56:56 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 |
Hello,
On 2021-07-28 12:27 p.m., Marco R. via wrote:
Hello,
I'm working with a log4j log file which carries HTTP Requests (URI, body,
headers, ...) among the usual timestamp, thread id and so forth.
I need to preprocess the JSON inside the body of such requests before applying
other processings to the whole line, hence I need to somehow extract part of
each line, work on it, put the result in place of the original and write the
whole line to output.
Or at least that's the way I thought it:
1- read a line
2- copy the text from line start to begin of request body to output
3- extract the request body and send it to an external tool (jq)4- copy the
external tool result to output
5- copy the remaining text after request body to line end to output
6- process the next line
By playing with the pattern and hold spaces I managed to process the same input
line multiple times, effectively achieving 2, 3, 4 and 5. But I got stuck on
writing to output, because 'p' (or better the 'p' flag to the 's' command)
always added a newline at the text, preventing me to write the partial result
of each step to a single line in output.
It's the first time I use sed for more than a simple s/x/y job, hence I may
have missed something from the docs: is there a way to avoid the automatic
newline addition?
You may find more info (and how I ended up doing it) here:
https://unix.stackexchange.com/questions/660332/how-to-avoid-sed-outputting-new-line-carriage-return
Certainly an interesting task, thanks for sharing your solution.
I can suggest few other ideas:
First,
Embrace the new lines, don't try to remove them :)
Most unix tools naturally operate on a "line" as a unit, so use that to
your benefit:
If you already have the regex to extract the three parts of the line:
1. before the "[{"
2. inside the "[{ ... }]"
3. after the "}]"
You can break them into group of three lines, and process only the
second of each group.
Example, here's the input:
$ seq -f "foo[{%g}]bar" 5
foo[{1}]bar
foo[{2}]bar
foo[{3}]bar
foo[{4}]bar
foo[{5}]bar
Now break it into groups of 3 lines:
$ seq -f "foo[{%g}]bar" 5 | sed 's/\[{/\n/ ; s/}]/\n/'
foo
1
bar
foo
2
bar
foo
3
bar
foo
4
bar
foo
5
bar
And it's very easy to handle only the second line of each triplet using
sed,awk and many others:
$ seq -f "foo[{%g}]bar" 5 \
| sed 's/\[{/\n/ ; s/}]/\n/' \
| sed '2~3s/^/The magic number is /'
foo
The magic number is 1
bar
foo
The magic number is 2
bar
foo
The magic number is 3
bar
foo
The magic number is 4
bar
foo
The magic number is 5
bar
or:
$ seq -f "foo[{%g}]bar" 5 \
| sed 's/\[{/\n/ ; s/}]/\n/' \
| awk 'NR%3==2 { print $0 " is a number" ; next } 1;'
foo
1 is a number
bar
foo
2 is a number
bar
foo
3 is a number
bar
foo
4 is a number
bar
foo
5 is a number
bar
Then combining every three lines can be done with 'sed' or 'paste':
$ seq -f "foo[{%g}]bar" 5 \
| sed 's/\[{/\n/ ; s/}]/\n/' \
| sed '2~3s/^/The magic number is /' \
| paste - - -
foo The magic number is 1 bar
foo The magic number is 2 bar
foo The magic number is 3 bar
foo The magic number is 4 bar
foo The magic number is 5 bar
$ seq -f "foo[{%g}]bar" 5 \
| sed 's/\[{/\n/ ; s/}]/\n/' \
| sed '2~3s/^/The magic number is /' \
| sed 'N;N;s/\n/\t/g'
foo The magic number is 1 bar
foo The magic number is 2 bar
foo The magic number is 3 bar
foo The magic number is 4 bar
foo The magic number is 5 bar
============================
Second,
When executing an external tool (like 'jq'), every invocation takes some
time and overhead.
As an alternative, you can separate the three lines into three separate
files, operate only on the second file (containing your json data), then
merge them again:
First, split into three files:
$ seq -f "foo[{%g}]bar" 5 \
| sed 's/\[{/\n/ ; s/}]/\n/' \
| sed -n -e '1~3wlines1' -e '2~3wlines2' -e '3~3wlines3'
$ head lines*
==> lines1 <==
foo
foo
foo
foo
foo
==> lines2 <==
1
2
3
4
5
==> lines3 <==
bar
bar
bar
bar
bar
Now you can run 'jq' once on the file 'lines2', and it will process each
line as a JSON record, much more efficiently.
Lastly, join the three files into one:
$ paste lines1 lines2 lines3
foo 1 bar
foo 2 bar
foo 3 bar
foo 4 bar
foo 5 bar
=========================
Third,
In the good ol' days, this kind of task was on the border line
between using the common text programs and a short perl script, e.g.:
$ seq -f "foo[{%g}]bar" 5 \
| perl -F'/\[\{|\}\]/' -na \
-e 'print "BEFORE: $F[0] JSON: $F[1] AFTER: $F[2]"'
BEFORE: foo JSON: 1 AFTER: bar
BEFORE: foo JSON: 2 AFTER: bar
BEFORE: foo JSON: 3 AFTER: bar
BEFORE: foo JSON: 4 AFTER: bar
BEFORE: foo JSON: 5 AFTER: bar
$ seq -f "foo[{%g}]bar" 5 \
| perl -F'/\[\{|\}\]/' -na \
-e ' $a=`echo "$F[1]" | jq -e ".+4" ` ;
chomp $a ;
print $F[0],"\t",$a,"\t",$F[2] '
foo 5 bar
foo 6 bar
foo 7 bar
foo 8 bar
foo 9 bar
Sadly Perl has fallen out of favor...
==============
Lastly, a small python script can load the JSON,
and I'm certain you can manipulate the JSON in python without even
needing the execute 'jq':
$ cat 1.py
import json,re,sys
regex = re.compile('^(.*)\[\{(.*)\}\](.*)$')
for line in sys.stdin:
m = regex.search(line)
# Do something with the JSON using python's JSON module,
# then convert back to string
j = json.loads(m.group(2))
jstr = json.dumps(j)
out = [ m.group(1), jstr, m.group(3) ]
print ("\t".join(out))
$ seq -f "foo[{%g}]bar" 5 | python 1.py
foo 1 bar
foo 2 bar
foo 3 bar
foo 4 bar
foo 5 bar
=====
Hope this helps.
regards,
- assaf
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: avoid automatic newline when writing output,
Assaf Gordon <=