From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Ihor Radchenko <yantar92@posteo.net>
Cc: 58558@debbugs.gnu.org, Eli Zaretskii <eliz@gnu.org>, larsi@gnus.org
Subject: bug#58558: 29.0.50; re-search-forward is slow in some buffers
Date: Tue, 13 Dec 2022 12:38:13 -0500 [thread overview]
Message-ID: <jwv5yefb556.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <87fsdjwb4e.fsf@localhost> (Ihor Radchenko's message of "Tue, 13 Dec 2022 15:56:33 +0000")
>>> I will look how to do it. Maybe perf probe.
>>> I guess, it will be useful to compile Emacs with debug symbols at this
>>> point.
>>
>> AFAIR, you can ask perf to profile a single function, and you can ask
>> it to annotate the profile with the source code.
>
> I now compiled Emacs with debug symbols, waited enough to see observable
> increase in the benchmark-run timing, and recorded the perf data.
>
> buf_bytepos_to_charpos is still on the top
>
> 78.06% emacs emacs [.] buf_bytepos_to_charpos
> 3.00% emacs emacs [.] re_match_2_internal
> 1.05% emacs emacs [.] find_interval
> 1.04% emacs emacs [.] CHAR_TABLE_REF_ASCII
> 0.85% emacs emacs [.] make_lisp_symbol
> 0.80% emacs emacs [.] re_search_2
> 0.76% emacs emacs [.] builtin_lisp_symbol
> 0.62% emacs emacs [.] PSEUDOVECTORP
AFAIK the main places where we call `buf_bytepos_to_charpos` from
`re_match_2_internal` is via the `SYNTAX_TABLE_BYTE_TO_CHAR` macro, used
for regexp elements that depend on syntax tables (i.e. \<, \>, \_<, ...).
But I'd expect those to be executed "frequently&closely" enough that the
`cached_(byte|char)pos` data should almost always be nearby, making the
call to `buf_bytepos_to_charpos` fairly cheap (more specifically
the `for (tail = BUF_MARKERS (b);...` loop should not iterate many
times, regardless how many markers there are).
> My guess: number of markers is growing somehow?
`buf_bytepos_to_charpos` itself creates markers (using them as a cache
of previous conversions), so that might be why.
But we only look at the first N markers where N*50 is the distance to
the closest marker found so far. So growth is not sufficient (it's
clearly a part of the reason, tho).
Regarding growth: could you call `garbage-collect` between the calls to
`re-search-forward` to see if that avoids the accumulation?
[ I presume here that those markers are created/added by
`buf_bytepos_to_charpos` itself, so they should be recovered by the GC
because they're not referenced from anywhere else. ]
I'd be interested to know how many iterations of the `for (tail =
BUF_MARKERS (b);...` loop are executed on average during your
`re-search-forward` (and how that average changes between runs of
`re-search-forward`).
Stefan
PS: Of course, another approach would be to replace this code with
something else. Using markers as a cache of bytepos/charpos conversions
has been a source of a few performance issues over the year.
Another approach could be to use a "vector with gap" managed alongside
the actual buffer text. It could be indexed by "charpos divided by
1024", so conversion from charpos to bytepos could be a simple vector
lookup followed by scanning at most 1kB, and conversion in the other
direction would use a binary search in that same vector (or we could use
2 "vectors with gap", one per direction of conversion).
next prev parent reply other threads:[~2022-12-13 17:38 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-16 1:26 bug#58558: 29.0.50; re-search-forward is slow in some buffers Ihor Radchenko
2022-10-16 9:19 ` Lars Ingebrigtsen
2022-10-16 9:34 ` Ihor Radchenko
2022-10-16 9:37 ` Lars Ingebrigtsen
2022-10-16 10:02 ` Ihor Radchenko
2022-10-16 10:04 ` Lars Ingebrigtsen
2022-10-16 10:53 ` Ihor Radchenko
2022-10-16 11:01 ` Lars Ingebrigtsen
2022-10-16 11:21 ` Eli Zaretskii
2022-10-16 14:23 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-17 0:56 ` Ihor Radchenko
2022-10-18 11:50 ` Lars Ingebrigtsen
2022-10-18 14:58 ` Eli Zaretskii
2022-10-18 18:19 ` Lars Ingebrigtsen
2022-10-18 18:38 ` Eli Zaretskii
2022-12-13 10:28 ` Ihor Radchenko
2022-12-13 13:11 ` Eli Zaretskii
2022-12-13 13:32 ` Ihor Radchenko
2022-12-13 14:28 ` Eli Zaretskii
2022-12-13 15:56 ` Ihor Radchenko
2022-12-13 16:08 ` Eli Zaretskii
2022-12-13 17:43 ` Ihor Radchenko
2022-12-13 17:52 ` Eli Zaretskii
2022-12-13 18:03 ` Ihor Radchenko
2022-12-13 20:02 ` Eli Zaretskii
2022-12-14 11:40 ` Ihor Radchenko
2022-12-14 13:06 ` Eli Zaretskii
2022-12-14 13:23 ` Ihor Radchenko
2022-12-14 13:32 ` Eli Zaretskii
2022-12-14 13:39 ` Ihor Radchenko
2022-12-14 14:12 ` Eli Zaretskii
2022-12-13 18:15 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-13 18:40 ` Ihor Radchenko
2022-12-13 19:55 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-13 20:21 ` Eli Zaretskii
2022-12-14 11:42 ` Ihor Radchenko
2022-12-13 17:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2022-12-14 12:00 ` Ihor Radchenko
2022-12-14 12:23 ` Ihor Radchenko
2022-12-14 13:10 ` Eli Zaretskii
2022-12-14 13:26 ` Ihor Radchenko
2022-12-14 13:57 ` Eli Zaretskii
2022-12-14 14:01 ` Ihor Radchenko
2023-04-06 11:49 ` Ihor Radchenko
2023-04-06 12:05 ` Eli Zaretskii
2023-04-09 19:54 ` Ihor Radchenko
2023-04-10 4:14 ` Eli Zaretskii
2023-04-10 12:24 ` Ihor Radchenko
2023-04-10 13:40 ` Eli Zaretskii
2023-04-10 14:55 ` Ihor Radchenko
2023-04-10 16:04 ` Eli Zaretskii
2023-04-10 14:27 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-11 11:29 ` Ihor Radchenko
2023-04-11 11:51 ` Eli Zaretskii
2023-04-12 13:39 ` Ihor Radchenko
2023-04-12 14:06 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-12 14:30 ` Eli Zaretskii
2023-04-12 14:38 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-12 15:22 ` Eli Zaretskii
2023-04-12 15:59 ` Alan Mackenzie
2023-04-12 14:38 ` Stephen Berman
2023-04-12 14:42 ` Ihor Radchenko
2023-04-12 14:39 ` Ihor Radchenko
2023-04-12 15:20 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-12 23:23 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-13 4:33 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-13 20:05 ` Ihor Radchenko
2023-04-13 4:52 ` Eli Zaretskii
2023-04-13 5:15 ` Eli Zaretskii
2023-04-12 18:31 ` Alan Mackenzie
2023-04-12 23:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-13 4:43 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-13 12:09 ` Ihor Radchenko
2022-12-13 13:27 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-16 10:36 ` Eli Zaretskii
2023-02-19 12:17 ` Dmitry Gutov
2023-02-20 10:24 ` Ihor Radchenko
2023-02-20 14:54 ` Dmitry Gutov
2023-04-10 8:48 ` Mattias Engdegård
2023-04-10 9:57 ` Ihor Radchenko
2023-04-10 10:05 ` Mattias Engdegård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwv5yefb556.fsf-monnier+emacs@gnu.org \
--to=bug-gnu-emacs@gnu.org \
--cc=58558@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=larsi@gnus.org \
--cc=monnier@iro.umontreal.ca \
--cc=yantar92@posteo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.