[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unquoted special characters in regexps
From: |
martin rudalics |
Subject: |
Re: Unquoted special characters in regexps |
Date: |
Tue, 28 Feb 2006 11:27:01 +0100 |
User-agent: |
Mozilla Thunderbird 1.0 (Windows/20041206) |
> `]', like `-' are only special in the context of a character
> alternative, that is if, before you type them, you are in a character
> alternative. By contrast, `[' and all other special characters
> (except `^') are only special outside that context.
You can talk about a context iff you are able to grammatically specify
it. In order to talk about the contents of a string you must be able to
determine the character sequences opening and closing strings. It would
be strange to say, for example, that the double-quote opening an Elisp
string is outside the context of the string and the double-quote that
closes it inside. It would be strange to say that the bracket opening a
character alternative is outside the context of the alternative and the
closing bracket inside.
> All characters that are special outside character alternatives are
> never special if you precede them with a backslash. This is true even
> for `^'. This is why it is good to precede them with a backslash even
> if they are not special. That way, the reader can see that they are
> not special, without studying the regexp.
I agree. Let's try to read the following definition from `cc-fonts.el':
(defconst autodoc-font-lock-doc-comments
`(("@\\(\\w+{\\|\\[\\(address@hidden|@@\\)*\\]\\|address@hidden|$\\)"
...
It tells me that there are two character alternatives started by an
unquoted `[' and terminated by an unquoted `]'. It also tells me that
it's meant to match a bracketed expression as represented by `\\[' and
`\\]' - I quickly exclude the possibility that the backslashes preceding
any of these brackets are quoted backslashes in a character alternative.
And, finally, the expression tells me that the author was probably
uncertain about how to put a `]' inside a complemented character
alternative, hence (s)he quoted it with a single backslash. In any case
I have no difficulties reading the expression although I completely
ignore its meaning. You propose to write
(defconst autodoc-font-lock-doc-comments
`(("@\\(\\w+{\\|\\[\\(address@hidden|@@\\)*]\\|address@hidden|$\\)"
...
instead. In that case, when I look at the character sequence `*]' I
would have to consider the case that the `]' closes some character
alternative. Only after I resolved that I would be able to say that the
`]' should indeed match a right bracket. And I would still have to
check whether the backslashes preceding the `\\[' are quoted backslashes
in a character set.
> First of all, there are (surprisingly) many occurrences of "\\]" in
> the Emacs source, where the `]' _is_ special and closes a character
> alternative that contains a slash. Reportedly quoting a `]' with a
> backslash _inside_ a character alternative works in some other regexp
> implementations such as AWK. So if I see "\\]" I have to worry about
> three possibilities: it might deliberately close a character
> alternative which includes a slash, it might do so by accident because
> the author tried to quote a `]' inside a character alternative (and
> hence the regexp is buggy), or it might be a deliberately quoted `]'
> outside a character alternative.
The Emacs manual clearly states that the backslash is not special in a
character set. But I admit that users of other languages do have
problems when writing Elisp regexps. That's why a clear and unambiguous
definition of these concepts is important.
> If I see `]' without preceding "\\", I only have to worry about
> whether or not it closes a character alternative, and not about the
> third possibility of a bug.
When I try to read a regular expression I do not worry about the
possibility of a bug in the first place. I try to understand what the
author wanted to match.
> There are places in the Emacs code that quote a `]' outside a
> character alternative. Even if we decide that this is undesirable, I
> do not fancy finding and changing them all. But we could change the
> behavior of `regexp-quote' and `regexp-opt' which currently quote
> such `]'. That could be done with the following trivial patch, which
> I could install if that is what we decide to do:
Given the amount of regular expressions users created with these
functions and manually inserted in their code that would be confusing
indeed.
- Re: Unquoted special characters in regexps, (continued)
- Re: Unquoted special characters in regexps, Richard Stallman, 2006/03/04
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/04
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/03
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/04
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps, Richard Stallman, 2006/03/05
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps,
martin rudalics <=
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/01
- Re: Unquoted special characters in regexps, Andreas Schwab, 2006/03/02
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/01
- Re: Unquoted special characters in regexps, Andreas Schwab, 2006/03/02
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/02
- Re: Unquoted special characters in regexps, Andreas Schwab, 2006/03/01
- Re: Unquoted special characters in regexps, Richard Stallman, 2006/03/02
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/04
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/02