bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25750: [sed] Matching square brackets


From: 林自均
Subject: bug#25750: [sed] Matching square brackets
Date: Tue, 28 Mar 2017 14:52:35 +0000

Hi Bob,

Thank you for the detailed explanation. That was so helpful.

Best,
John Lin

林自均 <address@hidden> 於 2017年3月28日 週二 下午10:47寫道:

> Hi Bob,
>
> Thank you for the detailed explanation. That was so helpful.
>
> Best,
> John Lin
>
> Bob Proulx <address@hidden> 於 2017年2月16日 週四 下午5:17寫道:
>
> 林自均 wrote:
> > I want to remove the square brackets in a string:
> >
> > $ echo '[1,2,3]' | sed 's/\[//g' | sed 's/\]//g'
> > 1,2,3
> >
> > And it works.
>
> Yes.  But the above isn't strictly correct regular expression usage.
> Let's discuss it piece by piece.
>
>   echo '[1,2,3]' |
>
> Okay.  Good test pattern.
>
>   sed 's/\[//g' |
>
> Okay.  Since the [ would start a character class and you want it to
> match itself it needs to be escaped.
>
>   sed 's/\]//g'
>
> This is not strictly correct.  You have escaped the ] with \].  But
> that is not needed.  The ] does not do anything special in that
> context.  It ends a character class started by a [ but outside of that
> it is simply a normal character.  Escaping the \] defaults to being
> just a ] character.  But it is a bad habit to get into because
> escaping other characters such as \+ turns on ERE handling.  Your
> expressoin should be this following instead.
>
>   sed 's/]//g'
>
> Those two could be combined into one sed command.
>
>   echo '[1,2,3]' | sed -e 's/\[//g' -e 's/]//g'
>     1,2,3
>
> Or by a combined string split by the ';' separator.
>
>   echo '[1,2,3]' | sed 's/\[//g;s/]//g'
>     1,2,3
>
> I tend to prefer the latter.  But either is fine.
>
> > However, when I want to do it in a single sed, it does not work:
> >
> > $ echo '[1,2,3]' | sed 's/[\[\]]//g'
> > [1,2,3]
>
> That is incorrect usage.  Do not escape characters inside of [...]
> character classes.  The above is behaving correctly.  But do not
> escape characters inside of [...] character classes.
>
> You are starting a character class to match any of the enclosed
> characters.  That is good.  But then it is broken by escaping the
> characters inside the character class.  Do not escape them.  Inside of
> a character class there is nothing special about those characters
> because the class turns off special characters.  Therefore trying to
> escape them is wrong.  That is the problem.
>
> Please review the documentation on regular expressions here:
>
>
> https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html#Character-Classes-and-Bracket-Expressions
>
>   Most meta-characters lose their special meaning inside bracket
> expressions:
>
>   ']'  ends the bracket expression if it’s not the first list
>        item. So, if you want to make the ‘]’ character a list item,
>        you must put it first.
>
> Therefore you must start the character class, then immediately put in
> the ] to match itself literally.  It does not end the character class
> since an empty class wouldn't make sense.
>
>   [  -- start of the character class
>   ]  -- match a literal ]
>   [  -- match a literal [
>   ]  -- end the class
>
> Here is the working example:
>
>   echo '[1,2,3]' | sed 's/[][]//g'
>     1,2,3
>
> > I can manage to make it work by a weird regexp:
> >
> > $ echo '[1,2,3]' | sed 's/[]\[]//g'
> > 1,2,3
>
> That is also incorrect usage.  You have added an additional \ into the
> class.  You thought you were esaping the [ but since it is inside of a
> bracket character class expression already the \ was simply a normal
> character and matched itself.
>
>   echo '[1,2,3]\1\2\3'
>   [1,2,3]\1\2\3
>   echo '[1,2,3]\1\2\3' | sed 's/[]\[]//g'
>   1,2,3123
>   echo '[1,2,3]\1\2\3' | sed 's/[][]//g'
>   1,2,3\1\2\3
>
> As you can see including the \ also removed the \ characters too.
> Because \ was included as part of the character class.
>
> > Is that a bug? If it is, I would like to spend some time to fix it.
>
> It is not a bug.  It is incorrect usage.  I will close the ticket.
> But please let us know if this makes sense to you.  Feel free to
> continue the discussion.
>
> Bob
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]