From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c Date: Mon, 04 Mar 2019 19:25:57 +0200 Message-ID: <83lg1ueel6.fsf@gnu.org> References: <83bm2uiu6x.fsf@gnu.org> <20190301141448.GC5674@ACM> <834l8mirj9.fsf@gnu.org> <20190301145856.GE5674@ACM> <83zhqeh8ds.fsf@gnu.org> <20190301163824.GF5674@ACM> <20190301191607.GG5674@ACM> <83woligzmu.fsf@gnu.org> <20190302111640.GA21061@ACM> <83fts5h3lz.fsf@gnu.org> <20190302131801.GB21061@ACM> Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="75772"; mail-complaints-to="usenet@blaine.gmane.org" Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Mar 04 18:26:28 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h0rMB-000JaY-TT for ged-emacs-devel@m.gmane.org; Mon, 04 Mar 2019 18:26:28 +0100 Original-Received: from localhost ([127.0.0.1]:57888 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h0rMA-00006Z-U0 for ged-emacs-devel@m.gmane.org; Mon, 04 Mar 2019 12:26:26 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:60882) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h0rLp-0008Pp-3M for emacs-devel@gnu.org; Mon, 04 Mar 2019 12:26:07 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:59564) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h0rLn-0005Oe-SG; Mon, 04 Mar 2019 12:26:03 -0500 Original-Received: from [176.228.60.248] (port=1815 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1h0rLn-000073-G7; Mon, 04 Mar 2019 12:26:03 -0500 In-reply-to: <20190302131801.GB21061@ACM> (message from Alan Mackenzie on Sat, 2 Mar 2019 13:18:01 +0000) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233819 Archived-At: > Date: Sat, 2 Mar 2019 13:18:01 +0000 > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > From: Alan Mackenzie > > > eassert (NILP (BVAR (b, enable_multibyte_characters)) > > || bytepos >= BUF_Z_BYTE (b) > > || CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))); > > > IOW, this test is irrelevant in unibyte buffers. > > Instead I moved the eassert to after the bit where it checks for unibyte > buffers, giving this: > > > > diff --git a/src/marker.c b/src/marker.c > index b58051a8c2..0b2e1bf5c6 100644 > --- a/src/marker.c > +++ b/src/marker.c > @@ -332,6 +332,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) > if (best_above == best_above_byte) > return bytepos; > > + /* Check bytepos is not in the middle of a character. */ > + eassert (bytepos >= BUF_Z_BYTE (b) > + || CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))); > + > best_below = BEG; > best_below_byte = BEG_BYTE; > > > I now no longer see the failed easserts in make check. > > So I'll commit this sometime (real life is a bit urgent right now). I was forced to disable this assertion for now: I bootstrapped today a clean checkout, and several jobs that run during the bootstrap triggered the assertion. It turns out there's one legitimate use case when bytepos _can_ be in the middle of a multibyte sequence: when we convert a buffer from unibyte to multibyte. There are comments to that effect in set_intervals_multibyte_1. I see 2 possible ways to handle this: (1) remove the assertion for good, or (2) change buf_bytepos_to_charpos to accept one more argument, telling it whether to make this check, and then modify all the callers except those in set_intervals_multibyte_1 to pass 'true' as that argument. Thoughts?