all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Ihor Radchenko <yantar92@posteo.net>
Cc: 58558@debbugs.gnu.org, Eli Zaretskii <eliz@gnu.org>, larsi@gnus.org
Subject: bug#58558: 29.0.50; re-search-forward is slow in some buffers
Date: Tue, 13 Dec 2022 12:38:13 -0500	[thread overview]
Message-ID: <jwv5yefb556.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <87fsdjwb4e.fsf@localhost> (Ihor Radchenko's message of "Tue, 13 Dec 2022 15:56:33 +0000")

>>> I will look how to do it. Maybe perf probe.
>>> I guess, it will be useful to compile Emacs with debug symbols at this
>>> point.
>>
>> AFAIR, you can ask perf to profile a single function, and you can ask
>> it to annotate the profile with the source code.
>
> I now compiled Emacs with debug symbols, waited enough to see observable
> increase in the benchmark-run timing, and recorded the perf data.
>
> buf_bytepos_to_charpos is still on the top
>
>     78.06%  emacs         emacs                       [.] buf_bytepos_to_charpos
>      3.00%  emacs         emacs                       [.] re_match_2_internal
>      1.05%  emacs         emacs                       [.] find_interval
>      1.04%  emacs         emacs                       [.] CHAR_TABLE_REF_ASCII
>      0.85%  emacs         emacs                       [.] make_lisp_symbol
>      0.80%  emacs         emacs                       [.] re_search_2
>      0.76%  emacs         emacs                       [.] builtin_lisp_symbol
>      0.62%  emacs         emacs                       [.] PSEUDOVECTORP

AFAIK the main places where we call `buf_bytepos_to_charpos` from
`re_match_2_internal` is via the `SYNTAX_TABLE_BYTE_TO_CHAR` macro, used
for regexp elements that depend on syntax tables (i.e. \<, \>, \_<, ...).

But I'd expect those to be executed "frequently&closely" enough that the
`cached_(byte|char)pos` data should almost always be nearby, making the
call to `buf_bytepos_to_charpos` fairly cheap (more specifically
the `for (tail = BUF_MARKERS (b);...` loop should not iterate many
times, regardless how many markers there are).

> My guess: number of markers is growing somehow?

`buf_bytepos_to_charpos` itself creates markers (using them as a cache
of previous conversions), so that might be why.

But we only look at the first N markers where N*50 is the distance to
the closest marker found so far.  So growth is not sufficient (it's
clearly a part of the reason, tho).

Regarding growth: could you call `garbage-collect` between the calls to
`re-search-forward` to see if that avoids the accumulation?
[ I presume here that those markers are created/added by
  `buf_bytepos_to_charpos` itself, so they should be recovered by the GC
  because they're not referenced from anywhere else.  ]

I'd be interested to know how many iterations of the `for (tail =
BUF_MARKERS (b);...` loop are executed on average during your
`re-search-forward` (and how that average changes between runs of
`re-search-forward`).


        Stefan


PS: Of course, another approach would be to replace this code with
something else.  Using markers as a cache of bytepos/charpos conversions
has been a source of a few performance issues over the year.

Another approach could be to use a "vector with gap" managed alongside
the actual buffer text.  It could be indexed by "charpos divided by
1024", so conversion from charpos to bytepos could be a simple vector
lookup followed by scanning at most 1kB, and conversion in the other
direction would use a binary search in that same vector (or we could use
2 "vectors with gap", one per direction of conversion).






  parent reply	other threads:[~2022-12-13 17:38 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-16  1:26 bug#58558: 29.0.50; re-search-forward is slow in some buffers Ihor Radchenko
2022-10-16  9:19 ` Lars Ingebrigtsen
2022-10-16  9:34   ` Ihor Radchenko
2022-10-16  9:37     ` Lars Ingebrigtsen
2022-10-16 10:02       ` Ihor Radchenko
2022-10-16 10:04         ` Lars Ingebrigtsen
2022-10-16 10:53           ` Ihor Radchenko
2022-10-16 11:01             ` Lars Ingebrigtsen
2022-10-16 11:21               ` Eli Zaretskii
2022-10-16 14:23                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-17  0:56                   ` Ihor Radchenko
2022-10-18 11:50                     ` Lars Ingebrigtsen
2022-10-18 14:58                       ` Eli Zaretskii
2022-10-18 18:19                         ` Lars Ingebrigtsen
2022-10-18 18:38                           ` Eli Zaretskii
2022-12-13 10:28                           ` Ihor Radchenko
2022-12-13 13:11                             ` Eli Zaretskii
2022-12-13 13:32                               ` Ihor Radchenko
2022-12-13 14:28                                 ` Eli Zaretskii
2022-12-13 15:56                                   ` Ihor Radchenko
2022-12-13 16:08                                     ` Eli Zaretskii
2022-12-13 17:43                                       ` Ihor Radchenko
2022-12-13 17:52                                         ` Eli Zaretskii
2022-12-13 18:03                                           ` Ihor Radchenko
2022-12-13 20:02                                             ` Eli Zaretskii
2022-12-14 11:40                                               ` Ihor Radchenko
2022-12-14 13:06                                                 ` Eli Zaretskii
2022-12-14 13:23                                                   ` Ihor Radchenko
2022-12-14 13:32                                                     ` Eli Zaretskii
2022-12-14 13:39                                                       ` Ihor Radchenko
2022-12-14 14:12                                                         ` Eli Zaretskii
2022-12-13 18:15                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-13 18:40                                           ` Ihor Radchenko
2022-12-13 19:55                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-13 20:21                                               ` Eli Zaretskii
2022-12-14 11:42                                                 ` Ihor Radchenko
2022-12-13 17:38                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2022-12-14 12:00                                       ` Ihor Radchenko
2022-12-14 12:23                                       ` Ihor Radchenko
2022-12-14 13:10                                         ` Eli Zaretskii
2022-12-14 13:26                                           ` Ihor Radchenko
2022-12-14 13:57                                             ` Eli Zaretskii
2022-12-14 14:01                                               ` Ihor Radchenko
2023-04-06 11:49                                                 ` Ihor Radchenko
2023-04-06 12:05                                                   ` Eli Zaretskii
2023-04-09 19:54                                                     ` Ihor Radchenko
2023-04-10  4:14                                                       ` Eli Zaretskii
2023-04-10 12:24                                                         ` Ihor Radchenko
2023-04-10 13:40                                                           ` Eli Zaretskii
2023-04-10 14:55                                                             ` Ihor Radchenko
2023-04-10 16:04                                                               ` Eli Zaretskii
2023-04-10 14:27                                                           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-11 11:29                                                             ` Ihor Radchenko
2023-04-11 11:51                                                               ` Eli Zaretskii
2023-04-12 13:39                                                                 ` Ihor Radchenko
2023-04-12 14:06                                                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-12 14:30                                                                     ` Eli Zaretskii
2023-04-12 14:38                                                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-12 15:22                                                                         ` Eli Zaretskii
2023-04-12 15:59                                                                           ` Alan Mackenzie
2023-04-12 14:38                                                                       ` Stephen Berman
2023-04-12 14:42                                                                       ` Ihor Radchenko
2023-04-12 14:39                                                                     ` Ihor Radchenko
2023-04-12 15:20                                                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-12 23:23                                                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-13  4:33                                                                           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-13 20:05                                                                             ` Ihor Radchenko
2023-04-13  4:52                                                                           ` Eli Zaretskii
2023-04-13  5:15                                                                             ` Eli Zaretskii
2023-04-12 18:31                                                                     ` Alan Mackenzie
2023-04-12 23:25                                                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-13  4:43                                                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-04-13 12:09                                                                     ` Ihor Radchenko
2022-12-13 13:27                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-10-16 10:36         ` Eli Zaretskii
2023-02-19 12:17     ` Dmitry Gutov
2023-02-20 10:24       ` Ihor Radchenko
2023-02-20 14:54         ` Dmitry Gutov
2023-04-10  8:48 ` Mattias Engdegård
2023-04-10  9:57   ` Ihor Radchenko
2023-04-10 10:05     ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwv5yefb556.fsf-monnier+emacs@gnu.org \
    --to=bug-gnu-emacs@gnu.org \
    --cc=58558@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=larsi@gnus.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.