bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#49873: Replacing all \n with spaces doesn't work in GNU sed as expec


From: Assaf Gordon
Subject: bug#49873: Replacing all \n with spaces doesn't work in GNU sed as expected
Date: Wed, 4 Aug 2021 14:07:22 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0

tag 49873 notabug
close 49873
stop

Hello,

On 2021-08-04 4:27 a.m., AlvinSeville7cf wrote:
Hello! I want to read entire file and then replace all *\n* with space.

For that I would recommend using 'tr' - it'll be much faster:

    tr '\n' ' ' < input > output


My sed script is (I know that it is not optimal but it demonstrates problem):

|:a $! { N; ta } s/\n/ /g p |

The above script isn't valid as-is (perhaps line breaks were lost in the email?).

I'm going to assume you meant the following script, and used "sed -n":

   sed -n ':a $! { N; ta } ; s/\n/ /g ; p' < input > output

or with line breaks:

   sed -n ':a
           $! { N; ta }
           s/\n/ /g
           p' < input > output


So why even with *g* flag *s* command replaces only first *\n* in pattern space? For instance I have the following file:

You script is almost correct :)
I assume that with the "$!{N;ta}" command you meant to accumulate all
lines except the last in the pattern space, and then replace all
the new lines and print the patern space.

The only 'bug': "t" is "conditional jump".
It jumps once to label "a", accumulating one more line, but then
doesn't jump again - so the "s///" is executed and the two lines are printed (and one newline replaced with space). The "s///" command also resets the "t" conditional, so the next line (3rd line in the input file) then does causes a jump.

Observe:

  $ seq 10 | sed -n ':a $! { N; ta } ; s/\n/ /g ; p'
  1 2
  3 4
  5 6
  7 8
  9 10

If you replace the "t" with a "b" command (b = always jump),
it behaves as you expected:

  $ seq 10 | sed -n ':a $! { N; ba } ; s/\n/ /g ; p'
  1 2 3 4 5 6 7 8 9 10

Note that even with this script, the last newline is preserved and
printed.

As a work-around, you can instruct "sed" to use NUL as line-breaks,
causing "\n" characters to be treated like any other character:

   $ seq 10 | sed -z 's/\n/ /g'
   1 2 3 4 5 6 7 8 9 10

But this won't be as efficient as using 'tr'.


|It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, |

The result of script execution is:

|It was the best of times, it was the worst of times, it was the age of wisdom, it was |

I use GNU sed 4.8. It seems to be a bug.

Without line breaks it's a bit hard to reproduce your case,
but I hope the explanation above was sufficient.

As such I'm closing this as "not a bug",
but discussion can continue by replying to this thread.

regards,
 - assaf









reply via email to

[Prev in Thread] Current Thread [Next in Thread]