From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c Date: Fri, 1 Mar 2019 14:58:56 +0000 Message-ID: <20190301145856.GE5674@ACM> References: <20190222164522.GB5411@ACM> <20190225185656.GA3605@ACM> <20190301111018.GA5674@ACM> <83bm2uiu6x.fsf@gnu.org> <20190301141448.GC5674@ACM> <834l8mirj9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="16703"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.10.1 (2018-07-13) Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 01 16:04:47 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gzjiP-00044M-TY for ged-emacs-devel@m.gmane.org; Fri, 01 Mar 2019 16:04:46 +0100 Original-Received: from localhost ([127.0.0.1]:39158 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gzjiO-0004UH-VY for ged-emacs-devel@m.gmane.org; Fri, 01 Mar 2019 10:04:44 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:45105) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gzjhi-0004Th-Pv for emacs-devel@gnu.org; Fri, 01 Mar 2019 10:04:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gzjhh-0007rs-DA for emacs-devel@gnu.org; Fri, 01 Mar 2019 10:04:02 -0500 Original-Received: from colin.muc.de ([193.149.48.1]:19464 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1gzjhh-0007qo-4L for emacs-devel@gnu.org; Fri, 01 Mar 2019 10:04:01 -0500 Original-Received: (qmail 98551 invoked by uid 3782); 1 Mar 2019 15:03:59 -0000 Original-Received: from acm.muc.de (p4FE15D75.dip0.t-ipconnect.de [79.225.93.117]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 01 Mar 2019 16:03:57 +0100 Original-Received: (qmail 11078 invoked by uid 1000); 1 Mar 2019 14:58:56 -0000 Content-Disposition: inline In-Reply-To: <834l8mirj9.fsf@gnu.org> X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy] X-Received-From: 193.149.48.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233733 Archived-At: Hello, Eli. On Fri, Mar 01, 2019 at 16:43:38 +0200, Eli Zaretskii wrote: > > Date: Fri, 1 Mar 2019 14:14:48 +0000 > > From: Alan Mackenzie > > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org > > > buf_bytepos_to_charpos is not supposed to be called when the byte > > > position is in the middle of a multibyte sequence. We have the > > > CHAR_HEAD_P, BYTES_BY_CHAR_HEAD, and related macros for that. > > Thanks, I didn't know that. Maybe we should put an assert into the code, > > like Stefan suggested. > We could try. How about this, as a first approximation? diff --git a/src/marker.c b/src/marker.c index 36d6b10c74..9faeca49f4 100644 --- a/src/marker.c +++ b/src/marker.c @@ -311,7 +311,8 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) } \ } -/* Return the character position corresponding to BYTEPOS in B. */ +/* Return the character position corresponding to BYTEPOS in B. + BYTEPOS must be at a character boundary. */ ptrdiff_t buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) @@ -370,6 +371,8 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) best_below++; BUF_INC_POS (b, best_below_byte); } + /* Check BYTEPOS was at a character boundary. */ + eassert (best_below_byte == bytepos); /* If this position is quite far from the nearest known position, cache the correspondence by creating a marker here. @@ -397,6 +400,8 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) best_above--; BUF_DEC_POS (b, best_above_byte); } + /* Check BYTEPOS was at a character boundary. */ + eassert (best_above_byte == bytepos); /* If this position is quite far from the nearest known position, cache the correspondence by creating a marker here. -- Alan Mackenzie (Nuremberg, Germany).