|
From: | GNU bug Tracking System |
Subject: | bug#42764: closed (csplit does not suppress the last match when not using {*}) |
Date: | Sat, 08 Aug 2020 20:58:02 +0000 |
Your message dated Sat, 8 Aug 2020 21:56:48 +0100 with message-id <820d4ead-bed8-0ae6-90a3-5bcc3e056e5e@draigBrady.com> and subject line Re: bug#42764: csplit does not suppress the last match when not using {*} has caused the debbugs.gnu.org bug report #42764, regarding csplit does not suppress the last match when not using {*} to be marked as done. (If you believe you have received this mail in error, please contact help-debbugs@gnu.org.) -- 42764: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=42764 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems
--- Begin Message ---Subject: csplit does not suppress the last match when not using {*} Date: Sat, 8 Aug 2020 09:12:51 +0000 (UTC) Good day,
I am experiencing an odd behaviour in csplit which may actually be a
bug.
I am testing this against the code cloned from
https://github.com/coreutils/coreutils.git, on the commit described by
git as v8.32-52-gc0e5f8c59.
Suppose I have the following YAML file:
==> test.yaml <==
value1: 123
---
value2: 456
---
value3: 789
and I want to split it at '---' lines. First I would try the following:
csplit -z --suppress-matched test.yaml '/^---$/' '{1}'
which outputs:
12
12
16
and creates the following files:
==> xx00 <==
value1: 123
==> xx01 <==
value2: 456
==> xx02 <==
---
value3: 789
The last portion still contains the '---', despite it being suppressed
from the second part.
Now, if I try again with:
csplit -z --suppress-matched test.yaml '/^---$/' '{*}'
I get:
12
12
12
and:
==> xx00 <==
value1: 123
==> xx01 <==
value2: 456
==> xx02 <==
value3: 789
where the last part does not contain the matched line, as expected.
While trying to figure out the problem, I noticed that match suppression
is done at the beginning of process_regexp. For a match-twice scenario
like the first one, the function is called twice, then the rest of the
file is simply dumped by split_file.
This means that the two calls to process_regexp will:
* suppress nothing for call #1 because nothing has been matched yet;
* suppress the first match in call #2.
Then, the rest of the file is dumped but no one actually suppressed the
second match, which appears in the last segment. When using asterisk
repetition, the file is instead dumped by process_regexp, which gets its
chance to suppress the matched line.
I came up with the attached patch, which simply moves match suppression
at the end of process_regexp. With this modification, the invocation:
csplit -z --suppress-matched test.yaml '/^---$/' '{1}'
now produces:
12
12
12
and:
==> xx00 <==
value1: 123
==> xx01 <==
value2: 456
==> xx02 <==
value3: 789
which is what I would expect.patch.patch
Description: Text Data
--- End Message ---
--- Begin Message ---Subject: Re: bug#42764: csplit does not suppress the last match when not using {*} Date: Sat, 8 Aug 2020 21:56:48 +0100 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:80.0) Gecko/20100101 Thunderbird/80.0 On 08/08/2020 10:12, Emanuele Giacomelli via GNU coreutils Bug Reports wrote:Good day, I am experiencing an odd behaviour in csplit which may actually be a bug. I am testing this against the code cloned from https://github.com/coreutils/coreutils.git, on the commit described by git as v8.32-52-gc0e5f8c59. Suppose I have the following YAML file: ==> test.yaml <== value1: 123 --- value2: 456 --- value3: 789 and I want to split it at '---' lines. First I would try the following: csplit -z --suppress-matched test.yaml '/^---$/' '{1}' which outputs: 12 12 16 and creates the following files: ==> xx00 <== value1: 123 ==> xx01 <== value2: 456 ==> xx02 <== --- value3: 789 The last portion still contains the '---', despite it being suppressed from the second part. Now, if I try again with: csplit -z --suppress-matched test.yaml '/^---$/' '{*}' I get: 12 12 12 and: ==> xx00 <== value1: 123 ==> xx01 <== value2: 456 ==> xx02 <== value3: 789 where the last part does not contain the matched line, as expected. While trying to figure out the problem, I noticed that match suppression is done at the beginning of process_regexp. For a match-twice scenario like the first one, the function is called twice, then the rest of the file is simply dumped by split_file. This means that the two calls to process_regexp will: * suppress nothing for call #1 because nothing has been matched yet; * suppress the first match in call #2. Then, the rest of the file is dumped but no one actually suppressed the second match, which appears in the last segment. When using asterisk repetition, the file is instead dumped by process_regexp, which gets its chance to suppress the matched line. I came up with the attached patch, which simply moves match suppression at the end of process_regexp. With this modification, the invocation: csplit -z --suppress-matched test.yaml '/^---$/' '{1}' now produces: 12 12 12 and: ==> xx00 <== value1: 123 ==> xx01 <== value2: 456 ==> xx02 <== value3: 789 which is what I would expect.I agree with this analysis. The usual manifestation would probably be when there was only a single match. I.E. when not specifying a repetition count, we were not suppressing the single match. I'll apply the attached in your name later today (which also adds a test). Marking this as done. thanks! Pádraigcsplit--suppress-last.patch
Description: Text Data
--- End Message ---
[Prev in Thread] | Current Thread | [Next in Thread] |