Re: [Emacs-orgmode] Question aboug Regexp

From: Carsten Dominik
Subject: Re: [Emacs-orgmode] Question aboug Regexp
Date: Tue, 23 May 2006 05:58:17 +0200

On May 23, 2006, at 5:02, Todd Neal wrote:

I am looking at why the following link does not work:

[[elisp: (+ 1 2 3)]]

I think that the problem lies with this regexp:

     1  (defconst org-link-re-with-space2

The regexp org-link-re-with-space2 requires that the first character after elisp: is not a space character. This was originally to make sure that the following would not be matched as a link:

I can explain you a feature in elisp: Parenthesis are everything.

This is not documented properly, thanks for reporting this.

In a regexp character class, the first character of a class is special and can be used to include character into he class which are otherwise difficult to get into a class, for example the minus "-" or a square bracket. Since a character class [] or [^] is meaningless, this is a special case so that []] matches the closing bracket and [^]] everything besides the closing bracket.

- Carsten

From the Emacs manual, node "Regexps", I have marked the important parts with "!" in the first column,

`[ ... ]'
     is a "character set", which begins with `[' and is terminated by
     `]'.  In the simplest case, the characters between the two
     brackets are what this set can match.

     Thus, `[ad]' matches either one `a' or one `d', and `[ad]*'
     matches any string composed of just `a's and `d's (including the
     empty string), from which it follows that `c[ad]*r' matches `cr',
     `car', `cdr', `caddaar', etc.

     You can also include character ranges in a character set, by
     writing the starting and ending characters with a `-' between
     them.  Thus, `[a-z]' matches any lower-case ASCII letter.  Ranges
     may be intermixed freely with individual characters, as in
     `[a-z$%.]', which matches any lower-case ASCII letter or `$', `%'
     or period.

     Note that the usual regexp special characters are not special
     inside a character set.  A completely different set of special
     characters exists inside character sets: `]', `-' and `^'.

!     To include a `]' in a character set, you must make it the first
!     character.  For example, `[]a]' matches `]' or `a'.  To include a
!     `-', write `-' as the first or last character of the set, or put
!     it after a range.  Thus, `[]-]' matches both `]' and `-'.

     To include `^' in a set, put it anywhere but at the beginning of
     the set.  (At the beginning, it complements the set--see below.)

     When you use a range in case-insensitive search, you should write
     both ends of the range in upper case, or both in lower case, or
     both should be non-letters.  The behavior of a mixed-case range
     such as `A-z' is somewhat ill-defined, and it may change in future
     Emacs versions.

`[^ ... ]'
     `[^' begins a "complemented character set", which matches any
     character except the ones specified.  Thus, `[^a-z0-9A-Z]' matches
     all characters _except_ ASCII letters and digits.

 !    `^' is not special in a character set unless it is the first
 !    character.  The character following the `^' is treated as if it
 !    were first (in other words, `-' and `]' are not special there).

     A complemented character set can match a newline, unless newline is
     mentioned as one of the characters not to match.  This is in
     contrast to the handling of regexps in programs such as `grep'.

     2    (concat
     3     "<?\\(" (mapconcat 'identity org-link-types "\\|") "\\):"
     4     "\\([^" org-non-link-chars " ]"
     5     "[^]\t\n\r]*"
     6     "[^" org-non-link-chars " ]\\)>?")
7 "Matches a link with spaces, optional angular brackets around it.")

I am more used to PCRE so I may be incorrect, but is the "[^]" a typo?

Also we have the following definition:

(defconst org-non-link-chars "]\t\n\r<>")

Doesn't this make line 4 evaluate to:

"\\([^]\t\n\r<> ]"

or is the right-bracket escaped somehow?


Carsten Dominik
Sterrenkundig Instituut "Anton Pannekoek"
Universiteit van Amsterdam
Kruislaan 403
NL-1098SJ Amsterdam
phone: +31 20 525 7477

