all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
To: emacs-devel@gnu.org
Cc: Alan Mackenzie <acm@muc.de>
Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c
Date: Mon, 25 Feb 2019 14:18:10 -0500	[thread overview]
Message-ID: <jwvtvgr1yc7.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: 20190225185656.GA3605@ACM

>> > Surely the argument to the second occurrence should be (charpos + 1)?
>> I believe it's instead the other one that needs to use "charpos - 1"
>> because the UPDATE_SYNTAX_TABLE is called just before reading the char
>> *before* charpos (see patch below).
> I don't think this is right.  offset is calculated from d, and then
> decremented, before calculating charpos.

Hmm... I think you're right (and the symend code does like you suggest).
This said, I find it odd that the code does:

	      ptrdiff_t offset = PTR_TO_OFFSET (d) - 1;
	      ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
	      UPDATE_SYNTAX_TABLE (charpos);

Supposedly `d` is a char* pointing to the beginning of a potentially
multibyte char, In that case `d - 1` will point "somewhere before the
end of the previous multibyte char" but not necessarily at
its beginning.  Maybe the patch below would be preferable to avoid
this situation?

Worse, in notwordbound we do:

		ptrdiff_t offset = PTR_TO_OFFSET (d - 1);
		ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
		UPDATE_SYNTAX_TABLE (charpos);

which seems even more broken because `d` might point to the first byte
after the gap, so `d - 1` will point in the middle of the gap, so it's
simply an invalid argument to PTR_TO_OFFSET.  According to the
definition of PTR_TO_OFFSET and POINTER_TO_OFFSET, the result may
be the same as if we did the decrement after the fact, but it still
looks fishy.  WDYT?


        Stefan


diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index b667a43a37..b21cba0e46 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -4811,9 +4811,9 @@ re_match_2_internal (struct re_pattern_buffer *bufp, re_char *string1,
 	      int c1, c2;
 	      int s1, s2;
 	      int dummy;
-	      ptrdiff_t offset = PTR_TO_OFFSET (d) - 1;
+	      ptrdiff_t offset = PTR_TO_OFFSET (d);
 	      ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
-	      UPDATE_SYNTAX_TABLE (charpos);
+	      UPDATE_SYNTAX_TABLE (charpos - 1);
 	      GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
 	      s1 = SYNTAX (c1);
 




  reply	other threads:[~2019-02-25 19:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-22 16:45 Questionable code in handling of wordend in the regexp engine in regex-emacs.c Alan Mackenzie
2019-02-23 23:15 ` Stefan Monnier
2019-02-25 18:56   ` Alan Mackenzie
2019-02-25 19:18     ` Stefan Monnier [this message]
2019-03-01 11:10       ` Alan Mackenzie
2019-03-01 13:41         ` Stefan Monnier
2019-03-01 13:46         ` Eli Zaretskii
2019-03-01 14:14           ` Alan Mackenzie
2019-03-01 14:43             ` Eli Zaretskii
2019-03-01 14:58               ` Alan Mackenzie
2019-03-01 16:22                 ` Eli Zaretskii
2019-03-01 16:38                   ` Alan Mackenzie
2019-03-01 19:16                     ` Alan Mackenzie
2019-03-01 19:31                       ` Eli Zaretskii
2019-03-02 11:16                         ` Alan Mackenzie
2019-03-02 12:18                           ` Eli Zaretskii
2019-03-02 13:18                             ` Alan Mackenzie
2019-03-02 13:37                               ` Eli Zaretskii
2019-03-04 17:25                               ` Eli Zaretskii
2019-03-05 10:51                                 ` Alan Mackenzie
2019-03-05 16:26                                   ` Eli Zaretskii
2019-03-02 12:21                           ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvtvgr1yc7.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=acm@muc.de \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.