all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Michal Nazarewicz <mina86@mina86.com>
Cc: 24020@debbugs.gnu.org
Subject: bug#24020: [PATCH] Fix ‘[[:word:]]*\u2620’ failing to match ‘foo\u2620’
Date: Mon, 18 Jul 2016 18:03:13 +0300	[thread overview]
Message-ID: <83r3ar0z0u.fsf@gnu.org> (raw)
In-Reply-To: <1468850684-17867-1-git-send-email-mina86@mina86.com> (message from Michal Nazarewicz on Mon, 18 Jul 2016 16:04:44 +0200)

> From: Michal Nazarewicz <mina86@mina86.com>
> Date: Mon, 18 Jul 2016 16:04:44 +0200
> 
> mutually_exclusive_p did not check for the claass bits of an charset
> opcode when comparing it with an exactn which resulted in situation
> where it thought a multibyte character could not match the character
> class.
> 
> This assumption caused incorrect optimisation of the regular expression
> and eventually failure of ‘[[:word:]]*\u2620’ to match ‘foo\u2620’.
> 
> The issue affects all multibyte word characters as well as other
> character classes which may match multibyte characters.

Thanks.

Unfortunately, the above description is too terse for me to understand
the issue and the way you propose to fix it.  Could you please provide
more details, including what problems you saw in classes other than
[:word:]?

Note that some of the classes deliberately don't work on multibyte
characters, and are documented as such.  So if we are changing that,
there should be documentation changes and an entry in NEWS as well
(but I suggest not to make such changes too easily, not without
measuring the impact on performance, if any).

> * src/regex.c (executing_charset): A new function for executing the
> charset and charset_not opcodes.  It performs check on the character
> taking into consideration existing bitmap, rang table and class bits.
                                             ^^^^
A typo.

> +#ifdef emacs
> +  else if (rtp)
> +    {
> +      int class_bits = CHARSET_RANGE_TABLE_BITS (p);
> +      re_wchar_t range_start, range_end;
> +
> +  /* Sort tests by the most commonly used classes with some adjustment to which
> +     tests are easiest to perform.  Frequencies of character class names as of
> +     2016-07-15:

Not sure what files you used for this.  Are those Emacs source files?

> diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el
> new file mode 100644
> index 0000000..a2dd4f0
> --- /dev/null
> +++ b/test/src/regex-tests.el
> @@ -0,0 +1,75 @@
> +;;; buffer-tests.el --- tests for regex.c functions -*- lexical-binding: t -*-
       ^^^^^^^^^^^^^^^

Copy-paste error.

> +;;; buffer-tests.el ends here

And another one.





  reply	other threads:[~2016-07-18 15:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-18 14:04 bug#24020: [PATCH] Fix ‘[[:word:]]*\u2620’ failing to match ‘foo\u2620’ Michal Nazarewicz
2016-07-18 15:03 ` Eli Zaretskii [this message]
2016-07-18 18:07   ` Michal Nazarewicz
2016-07-18 23:30   ` bug#24020: [PATCHv2] Fix ‘[[:word:]]*\u2620’ failing to match ‘foo\u2620’ (bug#24020) Michal Nazarewicz
2016-07-19  8:00     ` Andreas Schwab
2016-07-20 12:36       ` Michal Nazarewicz
2016-07-25 21:54         ` Michal Nazarewicz
2016-07-27 16:22 ` bug#24020: [PATCH] Fix ‘is multibyte’ test regex.c’s mutually_exclusive_p (bug#24020) Michal Nazarewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83r3ar0z0u.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=24020@debbugs.gnu.org \
    --cc=mina86@mina86.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.