From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#24020: [PATCH] Fix =?UTF-8?Q?=E2=80=98[[:word:]]*\u2620=E2=80=99?= failing to match =?UTF-8?Q?=E2=80=98foo\u2620=E2=80=99?= Date: Mon, 18 Jul 2016 18:03:13 +0300 Message-ID: <83r3ar0z0u.fsf@gnu.org> References: <1468850684-17867-1-git-send-email-mina86@mina86.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1468854271 26784 80.91.229.3 (18 Jul 2016 15:04:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 18 Jul 2016 15:04:31 +0000 (UTC) Cc: 24020@debbugs.gnu.org To: Michal Nazarewicz Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Jul 18 17:04:20 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bPA5k-0008RG-3T for geb-bug-gnu-emacs@m.gmane.org; Mon, 18 Jul 2016 17:04:20 +0200 Original-Received: from localhost ([::1]:48180 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPA5g-0002wx-41 for geb-bug-gnu-emacs@m.gmane.org; Mon, 18 Jul 2016 11:04:16 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33413) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPA5W-0002vK-GM for bug-gnu-emacs@gnu.org; Mon, 18 Jul 2016 11:04:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bPA5S-0005jV-R9 for bug-gnu-emacs@gnu.org; Mon, 18 Jul 2016 11:04:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:43548) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPA5S-0005jR-Mn for bug-gnu-emacs@gnu.org; Mon, 18 Jul 2016 11:04:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bPA5S-0000L2-Bh for bug-gnu-emacs@gnu.org; Mon, 18 Jul 2016 11:04:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 18 Jul 2016 15:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24020 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24020-submit@debbugs.gnu.org id=B24020.14688542271273 (code B ref 24020); Mon, 18 Jul 2016 15:04:02 +0000 Original-Received: (at 24020) by debbugs.gnu.org; 18 Jul 2016 15:03:47 +0000 Original-Received: from localhost ([127.0.0.1]:55885 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPA5D-0000KT-2x for submit@debbugs.gnu.org; Mon, 18 Jul 2016 11:03:47 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:42381) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPA5B-0000KG-OY for 24020@debbugs.gnu.org; Mon, 18 Jul 2016 11:03:46 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bPA50-0005Zg-40 for 24020@debbugs.gnu.org; Mon, 18 Jul 2016 11:03:40 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:54260) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPA50-0005ZY-0x; Mon, 18 Jul 2016 11:03:34 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2705 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bPA4r-00070F-Bp; Mon, 18 Jul 2016 11:03:33 -0400 In-reply-to: <1468850684-17867-1-git-send-email-mina86@mina86.com> (message from Michal Nazarewicz on Mon, 18 Jul 2016 16:04:44 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:121222 Archived-At: > From: Michal Nazarewicz > Date: Mon, 18 Jul 2016 16:04:44 +0200 > > mutually_exclusive_p did not check for the claass bits of an charset > opcode when comparing it with an exactn which resulted in situation > where it thought a multibyte character could not match the character > class. > > This assumption caused incorrect optimisation of the regular expression > and eventually failure of ‘[[:word:]]*\u2620’ to match ‘foo\u2620’. > > The issue affects all multibyte word characters as well as other > character classes which may match multibyte characters. Thanks. Unfortunately, the above description is too terse for me to understand the issue and the way you propose to fix it. Could you please provide more details, including what problems you saw in classes other than [:word:]? Note that some of the classes deliberately don't work on multibyte characters, and are documented as such. So if we are changing that, there should be documentation changes and an entry in NEWS as well (but I suggest not to make such changes too easily, not without measuring the impact on performance, if any). > * src/regex.c (executing_charset): A new function for executing the > charset and charset_not opcodes. It performs check on the character > taking into consideration existing bitmap, rang table and class bits. ^^^^ A typo. > +#ifdef emacs > + else if (rtp) > + { > + int class_bits = CHARSET_RANGE_TABLE_BITS (p); > + re_wchar_t range_start, range_end; > + > + /* Sort tests by the most commonly used classes with some adjustment to which > + tests are easiest to perform. Frequencies of character class names as of > + 2016-07-15: Not sure what files you used for this. Are those Emacs source files? > diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el > new file mode 100644 > index 0000000..a2dd4f0 > --- /dev/null > +++ b/test/src/regex-tests.el > @@ -0,0 +1,75 @@ > +;;; buffer-tests.el --- tests for regex.c functions -*- lexical-binding: t -*- ^^^^^^^^^^^^^^^ Copy-paste error. > +;;; buffer-tests.el ends here And another one.