From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Alan Mackenzie <acm@muc.de>
Cc: emacs-devel@gnu.org
Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c
Date: Fri, 01 Mar 2019 08:41:31 -0500 [thread overview]
Message-ID: <jwvzhqesosd.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <20190301111018.GA5674@ACM> (Alan Mackenzie's message of "Fri, 1 Mar 2019 11:10:18 +0000")
> down", other times it "rounds it up" to a character position. I think
> it should be defined as rounding it down. It would be a relatively
> simple correction (at least, technically ;-).
When moving forward, rounding it up is more natural ;-)
> But I'm still a little worried about buf_bytepos_to_charpos. Perhaps it
> should state that the result is undefined when the bytepos is "invalid".
Yes, I think it's the intention. Even better would be to signal an
error (when built with --enable-checking).
> For that matter, how many charpos <-> bytepos functions are there in
> Emacs? Just this one?
I think so, yes.
>> Worse, in notwordbound we do:
>
>> ptrdiff_t offset = PTR_TO_OFFSET (d - 1);
>> ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
>> UPDATE_SYNTAX_TABLE (charpos);
>
>> which seems even more broken because `d` might point to the first byte
>> after the gap, so `d - 1` will point in the middle of the gap, so it's
>> simply an invalid argument to PTR_TO_OFFSET.
>
> I don't think this is right. Both `d' and `offset' are byte
> measurements, not character measurements, so it shouldn't matter whether
> the "- 1" is inside or outside the parens. However, it would be less
> confusing if they were both (?all) the same.
The difference between `d` and `offset` is just an offset, indeed, but
it can be 2 different offsets depending on whether `d` is before or
after the gap, so what happens when `d` is within the gap depends on how
the test for "before/after the gap" is implemented.
More specifically, when `d` is N bytes before the end of the gap, the
code could consider it as being N bytes before the beginning of the
second part, or being "gap-size - N" bytes after the end of the
first part.
>> According to the definition of PTR_TO_OFFSET and POINTER_TO_OFFSET,
>> the result may be the same as if we did the decrement after the fact,
>> but it still looks fishy. WDYT?
>
> I think it is suboptimal to have both PTR_TO_OFFSET and
> POINTER_TO_OFFSET meaning different things in the same source file. ;-)
I'm so glad you're volunteering to clean this up.
Thank you, really.
> There are eight occurrences of SYNTAX_TABLE_BYTE_TO_CHAR in
> regex-emacs.c. I think I will check them all, amending them as in your
> patch.
> What do you say?
Thanks,
Stefan
next prev parent reply other threads:[~2019-03-01 13:41 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-22 16:45 Questionable code in handling of wordend in the regexp engine in regex-emacs.c Alan Mackenzie
2019-02-23 23:15 ` Stefan Monnier
2019-02-25 18:56 ` Alan Mackenzie
2019-02-25 19:18 ` Stefan Monnier
2019-03-01 11:10 ` Alan Mackenzie
2019-03-01 13:41 ` Stefan Monnier [this message]
2019-03-01 13:46 ` Eli Zaretskii
2019-03-01 14:14 ` Alan Mackenzie
2019-03-01 14:43 ` Eli Zaretskii
2019-03-01 14:58 ` Alan Mackenzie
2019-03-01 16:22 ` Eli Zaretskii
2019-03-01 16:38 ` Alan Mackenzie
2019-03-01 19:16 ` Alan Mackenzie
2019-03-01 19:31 ` Eli Zaretskii
2019-03-02 11:16 ` Alan Mackenzie
2019-03-02 12:18 ` Eli Zaretskii
2019-03-02 13:18 ` Alan Mackenzie
2019-03-02 13:37 ` Eli Zaretskii
2019-03-04 17:25 ` Eli Zaretskii
2019-03-05 10:51 ` Alan Mackenzie
2019-03-05 16:26 ` Eli Zaretskii
2019-03-02 12:21 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwvzhqesosd.fsf-monnier+emacs@gnu.org \
--to=monnier@iro.umontreal.ca \
--cc=acm@muc.de \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.