all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Alan Mackenzie <acm@muc.de>
Cc: emacs-devel@gnu.org
Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c
Date: Fri, 01 Mar 2019 08:41:31 -0500	[thread overview]
Message-ID: <jwvzhqesosd.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <20190301111018.GA5674@ACM> (Alan Mackenzie's message of "Fri, 1 Mar 2019 11:10:18 +0000")

> down", other times it "rounds it up" to a character position.  I think
> it should be defined as rounding it down.  It would be a relatively
> simple correction (at least, technically ;-).

When moving forward, rounding it up is more natural ;-)

> But I'm still a little worried about buf_bytepos_to_charpos.  Perhaps it
> should state that the result is undefined when the bytepos is "invalid".

Yes, I think it's the intention.  Even better would be to signal an
error (when built with --enable-checking).

> For that matter, how many charpos <-> bytepos functions are there in
> Emacs?  Just this one?

I think so, yes.

>> Worse, in notwordbound we do:
>
>> 		ptrdiff_t offset = PTR_TO_OFFSET (d - 1);
>> 		ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
>> 		UPDATE_SYNTAX_TABLE (charpos);
>
>> which seems even more broken because `d` might point to the first byte
>> after the gap, so `d - 1` will point in the middle of the gap, so it's
>> simply an invalid argument to PTR_TO_OFFSET.
>
> I don't think this is right.  Both `d' and `offset' are byte
> measurements, not character measurements, so it shouldn't matter whether
> the "- 1" is inside or outside the parens.  However, it would be less
> confusing if they were both (?all) the same.

The difference between `d` and `offset` is just an offset, indeed, but
it can be 2 different offsets depending on whether `d` is before or
after the gap, so what happens when `d` is within the gap depends on how
the test for "before/after the gap" is implemented.

More specifically, when `d` is N bytes before the end of the gap, the
code could consider it as being N bytes before the beginning of the
second part, or being "gap-size - N" bytes after the end of the
first part.

>> According to the definition of PTR_TO_OFFSET and POINTER_TO_OFFSET,
>> the result may be the same as if we did the decrement after the fact,
>> but it still looks fishy.  WDYT?
>
> I think it is suboptimal to have both PTR_TO_OFFSET and
> POINTER_TO_OFFSET meaning different things in the same source file.  ;-)

I'm so glad you're volunteering to clean this up.
Thank you, really.

> There are eight occurrences of SYNTAX_TABLE_BYTE_TO_CHAR in
> regex-emacs.c.  I think I will check them all, amending them as in your
> patch.
> What do you say?

Thanks,


        Stefan



  reply	other threads:[~2019-03-01 13:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-22 16:45 Questionable code in handling of wordend in the regexp engine in regex-emacs.c Alan Mackenzie
2019-02-23 23:15 ` Stefan Monnier
2019-02-25 18:56   ` Alan Mackenzie
2019-02-25 19:18     ` Stefan Monnier
2019-03-01 11:10       ` Alan Mackenzie
2019-03-01 13:41         ` Stefan Monnier [this message]
2019-03-01 13:46         ` Eli Zaretskii
2019-03-01 14:14           ` Alan Mackenzie
2019-03-01 14:43             ` Eli Zaretskii
2019-03-01 14:58               ` Alan Mackenzie
2019-03-01 16:22                 ` Eli Zaretskii
2019-03-01 16:38                   ` Alan Mackenzie
2019-03-01 19:16                     ` Alan Mackenzie
2019-03-01 19:31                       ` Eli Zaretskii
2019-03-02 11:16                         ` Alan Mackenzie
2019-03-02 12:18                           ` Eli Zaretskii
2019-03-02 13:18                             ` Alan Mackenzie
2019-03-02 13:37                               ` Eli Zaretskii
2019-03-04 17:25                               ` Eli Zaretskii
2019-03-05 10:51                                 ` Alan Mackenzie
2019-03-05 16:26                                   ` Eli Zaretskii
2019-03-02 12:21                           ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvzhqesosd.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=acm@muc.de \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.