From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c Date: Tue, 5 Mar 2019 10:51:50 +0000 Message-ID: <20190305105150.GA4850@ACM> References: <834l8mirj9.fsf@gnu.org> <20190301145856.GE5674@ACM> <83zhqeh8ds.fsf@gnu.org> <20190301163824.GF5674@ACM> <20190301191607.GG5674@ACM> <83woligzmu.fsf@gnu.org> <20190302111640.GA21061@ACM> <83fts5h3lz.fsf@gnu.org> <20190302131801.GB21061@ACM> <83lg1ueel6.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="153502"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.10.1 (2018-07-13) Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 05 11:57:46 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h17lZ-000dr4-9X for ged-emacs-devel@m.gmane.org; Tue, 05 Mar 2019 11:57:45 +0100 Original-Received: from localhost ([127.0.0.1]:41176 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h17lY-00010X-Br for ged-emacs-devel@m.gmane.org; Tue, 05 Mar 2019 05:57:44 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:59404) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h17lP-00010O-Sy for emacs-devel@gnu.org; Tue, 05 Mar 2019 05:57:36 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h17lO-0006Vc-WC for emacs-devel@gnu.org; Tue, 05 Mar 2019 05:57:35 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:57463 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1h17lN-00063P-73 for emacs-devel@gnu.org; Tue, 05 Mar 2019 05:57:33 -0500 Original-Received: (qmail 33760 invoked by uid 3782); 5 Mar 2019 10:57:26 -0000 Original-Received: from acm.muc.de (p4FE15C04.dip0.t-ipconnect.de [79.225.92.4]) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 05 Mar 2019 11:57:24 +0100 Original-Received: (qmail 4889 invoked by uid 1000); 5 Mar 2019 10:51:50 -0000 Content-Disposition: inline In-Reply-To: <83lg1ueel6.fsf@gnu.org> X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233846 Archived-At: Hello, Eli. On Mon, Mar 04, 2019 at 19:25:57 +0200, Eli Zaretskii wrote: > > Date: Sat, 2 Mar 2019 13:18:01 +0000 > > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > > From: Alan Mackenzie > > Instead I moved the eassert to after the bit where it checks for unibyte > > buffers, giving this: > > diff --git a/src/marker.c b/src/marker.c > > index b58051a8c2..0b2e1bf5c6 100644 > > --- a/src/marker.c > > +++ b/src/marker.c > > @@ -332,6 +332,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) > > if (best_above == best_above_byte) > > return bytepos; > > > > + /* Check bytepos is not in the middle of a character. */ > > + eassert (bytepos >= BUF_Z_BYTE (b) > > + || CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))); > > + > > best_below = BEG; > > best_below_byte = BEG_BYTE; [ .... ] > I was forced to disable this assertion for now: I bootstrapped today a > clean checkout, and several jobs that run during the bootstrap > triggered the assertion. It turns out there's one legitimate use case > when bytepos _can_ be in the middle of a multibyte sequence: when we > convert a buffer from unibyte to multibyte. There are comments to > that effect in set_intervals_multibyte_1. > I see 2 possible ways to handle this: (1) remove the assertion for > good, or (2) change buf_bytepos_to_charpos to accept one more > argument, telling it whether to make this check, and then modify all > the callers except those in set_intervals_multibyte_1 to pass 'true' > as that argument. > Thoughts? First of all, sorry I wasn't here yesterday to deal with it. I don't think I like alternative (2) - it's ugly, and how much do we really need this eassert anyway? It's turned out not to be such a good idea after all. I would favour alternative (1), just removing the thing altogether. -- Alan Mackenzie (Nuremberg, Germany).