sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

unexpected match of s command regexp at ^


From: Christoph Anton Mitterer
Subject: unexpected match of s command regexp at ^
Date: Tue, 28 Sep 2021 03:20:51 +0200
User-agent: Evolution 3.38.3-1

Hey.

I don't quite understand why the following behaves as it does:

The general idea was that I have a string where multiple key=value
pairs or singleOptions are separated by "," and any number of
consecutive "," are allowed before/after such words.

What I wanted to do was, check for unknown options, e.g. by doing
something like this in an s/// command:

for readability:
\(
        \(^\|,\+\)
        \(
                \(foo\|bar\|baz\)=[^,]*
                \|
                single
        \)
\)*


A: s/\(\(^\|,\+\)\(\(foo\|bar\|baz\)=[^,]*\|single\)\)*//

if that would leave over just zero or more "," only valid options would
have been used.

I though a better version of (A) would be:
B: s/^\(\(^\|,\+\)\(\(foo\|bar\|baz\)=[^,]*\|single\)\)*,*$//

It's anchored in the very beginning (before the outer \( ... \)* and
already removes any trailing "," till the anchor in the end.


Thinking about (B) I tried a bit more around with (A) and noticed the
following which I cannot explain (and hope someone here knows why):

printf '%s\n' 't,,,,,,,,single,,,bar=value,foo=,,,,'  | \
sed 's/\(\(^\|,\+\)\(\(foo\|bar\|baz\)=[^,]*\|single\)\)*//'

(that's (A))

I'd have expected that this give me:
t,,,,

That is: the "t" in the beginning and the final ",,,,", however it
doesn't, instead it gives:
t,,,,,,,,single,,,bar=value,foo=,,,,


It does though when I use:
C: s/\(\(^\|,\+\)\(\(foo\|bar\|baz\)=[^,]*\|single\)\)//g




Eventually I tried:
A_: s/\(\(^\|,\+\)\(\(foo\|bar\|baz\)=[^,]*\|single\)\)*//
(that's (A) but replacing to "_")

printf '%s\n' 't,,,,,,,,single,,,bar=value,foo=,,,,'  | \
sed 's/\(\(^\|,\+\)\(\(foo\|bar\|baz\)=[^,]*\|single\)\)*//'

and I saw that this yields:
_t,,,,,,,,single,,,bar=value,foo=,,,,

That kinda explains why ",,,,,,,,single,,,bar=value,foo=" isn't
removed, cause it matches in the beginning and then the "t" interrupts
the * operator,... which is where s///g is different in behaviour, I
assume)


But why on earth does it (A / A_) match in the beginning?



Interestingly it "works" again when anchoring it in the end:
D: s/\(\(^\|,\+\)\(\(foo\|bar\|baz\)=[^,]*\|single\)\)*,*$//
which gives:
t



Thanks,
Chris.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]