From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Questionable code in handling of wordend in the regexp engine in regex-emacs.c Date: Fri, 22 Feb 2019 16:45:22 +0000 Message-ID: <20190222164522.GB5411@ACM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="12185"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.10.1 (2018-07-13) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Feb 22 17:50:45 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gxE27-00031I-UV for ged-emacs-devel@m.gmane.org; Fri, 22 Feb 2019 17:50:44 +0100 Original-Received: from localhost ([127.0.0.1]:54021 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gxE26-00005L-Qp for ged-emacs-devel@m.gmane.org; Fri, 22 Feb 2019 11:50:42 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:56604) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gxE1S-00005F-20 for emacs-devel@gnu.org; Fri, 22 Feb 2019 11:50:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gxE1Q-00009y-N7 for emacs-devel@gnu.org; Fri, 22 Feb 2019 11:50:01 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:61239 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1gxE1Q-0007ja-9T for emacs-devel@gnu.org; Fri, 22 Feb 2019 11:50:00 -0500 Original-Received: (qmail 46453 invoked by uid 3782); 22 Feb 2019 16:49:27 -0000 Original-Received: from acm.muc.de (p4FE15C05.dip0.t-ipconnect.de [79.225.92.5]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 22 Feb 2019 17:49:27 +0100 Original-Received: (qmail 10459 invoked by uid 1000); 22 Feb 2019 16:45:22 -0000 Content-Disposition: inline X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233533 Archived-At: Hello, Emacs. In function re_match_2_internal (in regex-emacs.c) in the handling of case wordend: inside the large switch statement (approximately line 4800 in the file), there is some code, the look of which I don't like. Primarily, there is an UPDATE_SYNTAX_TABLE (charpos); before determining the syntax of the previous character, which seems OK. Later on, before determining the syntax of the next character, we have: UPDATE_SYNTAX_TABLE_FORWARD (charpos); . Between these two calls, charpos hasn't been changed. Surely the argument to the second occurrence should be (charpos + 1)? Also, probably less importantly, there is GET_CHAR_AFTER (c2, d, dummy); , whereas at the same place in the handler for case symend: we have instead c2 = RE_STRING_CHAR (d, target_multibyte); . Is the effect of these macros identical, or is one of them up to date, and the other one really needs updating as well, for correct functionality? I came across these whilst investigating bug #34525. Making the indicated changes to regex-emacs.c sadly doesn't help solve the symptoms of that bug. :-( -- Alan Mackenzie (Nuremberg, Germany).