From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: bojohan+mail@dd.chalmers.se (Johan =?utf-8?Q?Bockg=C3=A5rd?=) Newsgroups: gmane.emacs.devel Subject: [BUG] Regexp compiler, problem with character classes Date: Sat, 03 Jun 2006 03:14:15 +0200 Message-ID: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1157564570 30710 80.91.229.2 (6 Sep 2006 17:42:50 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 6 Sep 2006 17:42:50 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 06 19:42:48 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GL1Q4-0000wY-Bs for ged-emacs-devel@m.gmane.org; Wed, 06 Sep 2006 19:42:36 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GL1Q3-0003Fk-S7 for ged-emacs-devel@m.gmane.org; Wed, 06 Sep 2006 13:42:35 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GKuvN-0006Gn-Ua for emacs-devel@gnu.org; Wed, 06 Sep 2006 06:46:29 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GKuvL-0006G1-7r for emacs-devel@gnu.org; Wed, 06 Sep 2006 06:46:28 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GKuvL-0006Fv-43 for emacs-devel@gnu.org; Wed, 06 Sep 2006 06:46:27 -0400 Original-Received: from [129.16.50.72] (helo=gamma02.me.chalmers.se) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GKuvS-0002h0-Ui for emacs-devel@gnu.org; Wed, 06 Sep 2006 06:46:35 -0400 Original-Received: by gamma02.me.chalmers.se (Postfix, from userid 61540) id 7FB889D6D8; Wed, 6 Sep 2006 12:46:24 +0200 (CEST) Original-To: emacs-devel@gnu.org Original-Lines: 68 User-Agent: Gnus/5.110005 (No Gnus v0.5) Emacs/22.0.50 (gnu/linux) X-From-Line: nobody Sat Jun 3 03:14:55 2006 X-Mailman-Approved-At: Wed, 06 Sep 2006 13:42:20 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:59457 Archived-At: [I'm resending this because I think it's a serious bug. It makes character classes totally unreliable.] Character classes are translated to character alternatives during the regexp compile phase. This is wrong, since the syntax table should be taken into account during the actual matching. This may be non-trivial to fix. (with-temp-buffer (list (progn (modify-syntax-entry ?a " ") (string-match "x[[:space:]]" "xa")) (progn (modify-syntax-entry ?a "w") (string-match "x[[:space:]]" "xa")))) => (0 0) 0: /exactn/1/x 3: /charset [\t\f a\302\200-\303\277] 37: /succeed 38: end of pattern. Compiling pattern: x[[:space:]] Compiled pattern: 38 bytes used/174 bytes allocated. fastmap: x re_nsub: 0 regs_alloc: 0 can_be_null: 0 no_sub: 0 not_bol: 0 not_eol: 0 syntax: 340204 0: /exactn/1/x 3: /charset [\t\f a\302\200-\303\277] 37: /succeed 38: end of pattern. 0: /exactn/1/x 3: /charset [\t\f a\302\200-\303\277] 37: /succeed 38: end of pattern. As an effect you get the behavior below, since the compiler takes no care to setup the syntax in the first place: 1) emacs -Q (with-temp-buffer (string-match "x[[:space:]]" "x\n")) => nil (exit Emacs) 2) emacs -Q (with-temp-buffer (char-syntax ?\n) (string-match "x[[:space:]]" "x\n")) => 0 (Fchar_syntax does gl_state.current_syntax_table = current_buffer->syntax_table;) -- This is bad.