From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c Date: Fri, 1 Mar 2019 16:38:25 +0000 Message-ID: <20190301163824.GF5674@ACM> References: <20190222164522.GB5411@ACM> <20190225185656.GA3605@ACM> <20190301111018.GA5674@ACM> <83bm2uiu6x.fsf@gnu.org> <20190301141448.GC5674@ACM> <834l8mirj9.fsf@gnu.org> <20190301145856.GE5674@ACM> <83zhqeh8ds.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="256815"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.10.1 (2018-07-13) Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 01 17:47:44 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gzlK3-0014bz-Va for ged-emacs-devel@m.gmane.org; Fri, 01 Mar 2019 17:47:44 +0100 Original-Received: from localhost ([127.0.0.1]:40628 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gzlK2-0003B2-TI for ged-emacs-devel@m.gmane.org; Fri, 01 Mar 2019 11:47:42 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:37679) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gzlG1-00009H-Bf for emacs-devel@gnu.org; Fri, 01 Mar 2019 11:43:34 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gzlG0-0005BI-Gh for emacs-devel@gnu.org; Fri, 01 Mar 2019 11:43:33 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:63598 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1gzlFy-0005AI-Up for emacs-devel@gnu.org; Fri, 01 Mar 2019 11:43:32 -0500 Original-Received: (qmail 53049 invoked by uid 3782); 1 Mar 2019 16:43:27 -0000 Original-Received: from acm.muc.de (p4FE15D75.dip0.t-ipconnect.de [79.225.93.117]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 01 Mar 2019 17:43:26 +0100 Original-Received: (qmail 11829 invoked by uid 1000); 1 Mar 2019 16:38:25 -0000 Content-Disposition: inline In-Reply-To: <83zhqeh8ds.fsf@gnu.org> X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233739 Archived-At: Hello, Eli. On Fri, Mar 01, 2019 at 18:22:39 +0200, Eli Zaretskii wrote: > > Date: Fri, 1 Mar 2019 14:58:56 +0000 > > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > > From: Alan Mackenzie > > > > Thanks, I didn't know that. Maybe we should put an assert into the code, > > > > like Stefan suggested. > > > We could try. > > How about this, as a first approximation? > > [...] > > + /* Check BYTEPOS was at a character boundary. */ > > + eassert (best_below_byte == bytepos); > Actually, what I had in mind was a simple > eassert (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))); > right at the beginning of buf_bytepos_to_charpos. But maybe if you > explain why you wanted a different assertion, I will change my mind. I had forgotten that it was possible to determine that a UTF8 byte was at the start of a character. I think your version is simpler and better. Lets use it! But first, I'll commit the fix to bug #34525, including the problems with "d - 1". That will be one fewer spot where that new assertion will get triggered. -- Alan Mackenzie (Nuremberg, Germany).