[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Stack overflow in regexp matcher
From: |
Dan Davison |
Subject: |
Re: Stack overflow in regexp matcher |
Date: |
Tue, 08 Feb 2011 22:58:15 +0000 |
User-agent: |
Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (darwin) |
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> The following fails with "Stack overflow in regexp matcher" in emacs 23
>> and 24:
>
>> (string-match
>> "^\\[.+\\]$"
>> (concat
>> "["
>> (mapconcat (lambda (i) "x") (number-sequence 1 33500) "")
>> "]"))
>
>> This surprised me; I assumed that the ^ and $ anchors, and the simple
>> ".+" requirement in the middle would result in a simple, efficient
>> regexp.
>
> The problem is that Emacs's regexp engine is not very clever.
>
> Basically, each time . succeeds we first recurse, trying to match the
> rest against ".*\\]$", and if that fails then we try to match
> "\\]$" instead. The fact that at each step we have this "fork" of two
> different possible ways to match, means that at each step we have to
> push something on the stack, so the longer the string against which we
> match, the deeper the stack we need.
>
> You can fix it by using a regexp that does not need this backtracking.
> E.g. "^\\[[^]\n]+\\]$". If you do that, then Emacs's regexp engine will
> notice that when [^]\n] matches, then "\\]$" can't match, so there's no
> point pushing something on the stack to try the second path when the
> first fails. I.e. it will do the whole match with a constant amount of
> stack space.
>
> This relates also to the recent discussion with Ilya in the thread
> titled "will we ever have zero width assertions in regexps?".
Hi Stefan,
Thanks for the explanation.
Dan
>
>
> Stefan