bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24020: [PATCH] Fix ‘[[:word:]]*\u2620’ failing to match ‘foo\u2620’


From: Eli Zaretskii
Subject: bug#24020: [PATCH] Fix ‘[[:word:]]*\u2620’ failing to match ‘foo\u2620’
Date: Mon, 18 Jul 2016 18:03:13 +0300

> From: Michal Nazarewicz <mina86@mina86.com>
> Date: Mon, 18 Jul 2016 16:04:44 +0200
> 
> mutually_exclusive_p did not check for the claass bits of an charset
> opcode when comparing it with an exactn which resulted in situation
> where it thought a multibyte character could not match the character
> class.
> 
> This assumption caused incorrect optimisation of the regular expression
> and eventually failure of ‘[[:word:]]*\u2620’ to match ‘foo\u2620’.
> 
> The issue affects all multibyte word characters as well as other
> character classes which may match multibyte characters.

Thanks.

Unfortunately, the above description is too terse for me to understand
the issue and the way you propose to fix it.  Could you please provide
more details, including what problems you saw in classes other than
[:word:]?

Note that some of the classes deliberately don't work on multibyte
characters, and are documented as such.  So if we are changing that,
there should be documentation changes and an entry in NEWS as well
(but I suggest not to make such changes too easily, not without
measuring the impact on performance, if any).

> * src/regex.c (executing_charset): A new function for executing the
> charset and charset_not opcodes.  It performs check on the character
> taking into consideration existing bitmap, rang table and class bits.
                                             ^^^^
A typo.

> +#ifdef emacs
> +  else if (rtp)
> +    {
> +      int class_bits = CHARSET_RANGE_TABLE_BITS (p);
> +      re_wchar_t range_start, range_end;
> +
> +  /* Sort tests by the most commonly used classes with some adjustment to 
> which
> +     tests are easiest to perform.  Frequencies of character class names as 
> of
> +     2016-07-15:

Not sure what files you used for this.  Are those Emacs source files?

> diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el
> new file mode 100644
> index 0000000..a2dd4f0
> --- /dev/null
> +++ b/test/src/regex-tests.el
> @@ -0,0 +1,75 @@
> +;;; buffer-tests.el --- tests for regex.c functions -*- lexical-binding: t 
> -*-
       ^^^^^^^^^^^^^^^

Copy-paste error.

> +;;; buffer-tests.el ends here

And another one.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]