From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#58558: 29.0.50; re-search-forward is slow in some buffers Date: Tue, 13 Dec 2022 15:11:17 +0200 Message-ID: <83h6xzphxm.fsf@gnu.org> References: <877d10r21x.fsf@localhost> <87zgdwyvkt.fsf@gnus.org> <87ilkk6ri5.fsf@localhost> <87v8okjei9.fsf@gnus.org> <87tu44jdce.fsf@localhost> <87czasjd9j.fsf@gnus.org> <87k050nio5.fsf@localhost> <87zgdwhw0z.fsf@gnus.org> <83sfjo3tfw.fsf@gnu.org> <878rlfjmjh.fsf@localhost> <87mt9tbbbp.fsf@gnus.org> <8335bl18lo.fsf@gnu.org> <87wn8x9eqb.fsf@gnus.org> <87tu1zd2c6.fsf@localhost> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3554"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 58558@debbugs.gnu.org, larsi@gnus.org, monnier@iro.umontreal.ca To: Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Dec 13 14:12:30 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1p555B-0000f8-Uf for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 13 Dec 2022 14:12:29 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p554v-0007gf-2Z; Tue, 13 Dec 2022 08:12:13 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p554k-0007g5-H5 for bug-gnu-emacs@gnu.org; Tue, 13 Dec 2022 08:12:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p554k-00017J-9I for bug-gnu-emacs@gnu.org; Tue, 13 Dec 2022 08:12:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1p554k-00017U-0T for bug-gnu-emacs@gnu.org; Tue, 13 Dec 2022 08:12:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 13 Dec 2022 13:12:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 58558 X-GNU-PR-Package: emacs Original-Received: via spool by 58558-submit@debbugs.gnu.org id=B58558.16709370934289 (code B ref 58558); Tue, 13 Dec 2022 13:12:01 +0000 Original-Received: (at 58558) by debbugs.gnu.org; 13 Dec 2022 13:11:33 +0000 Original-Received: from localhost ([127.0.0.1]:60278 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p554H-000177-DB for submit@debbugs.gnu.org; Tue, 13 Dec 2022 08:11:33 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:40152) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p554F-000171-Ja for 58558@debbugs.gnu.org; Tue, 13 Dec 2022 08:11:32 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p5548-0000lp-OW; Tue, 13 Dec 2022 08:11:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=gCljH7+Om63L5OJdWZcIgjOmbqC/CxP2mMJjmKhpTEM=; b=mFQ5AYTNLa3R xdnYIoF5az7tyXsOBTG3piuvXCVVHrSm1FPvn/2DvpEf9rHqwyg1RCMf9Oaee43a9laQl2ZfhVGAv jf5nXP1HlGwvdCh3PUOc+fuI9JKjATnYxWvue8QS24v+yswb2FRYc8UMEjgxoDhIZnWOTkirDFnYV GicpC39x094TFDgG5rLNp+KzVmBkFWQZqlGLnXaSLnbdbxUv0v9Zy66bhummHwBrAWDNPOv8J4a6t tr+1jI8WxdIwCs9P5y86fcyhaWQuOfldZAxl0sFhJpNZAmY3v8NyW4+ILA8zOV5+e1W3dwT3fjH7l MSnSOqMjPW1fMTz1U/NtVA==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p553x-0008QE-QZ; Tue, 13 Dec 2022 08:11:24 -0500 In-Reply-To: <87tu1zd2c6.fsf@localhost> (message from Ihor Radchenko on Tue, 13 Dec 2022 10:28:57 +0000) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:250830 Archived-At: > From: Ihor Radchenko > Cc: Eli Zaretskii , monnier@iro.umontreal.ca, > 58558@debbugs.gnu.org > Date: Tue, 13 Dec 2022 10:28:57 +0000 > > Ok. I got around to try perf, and it turned out to be very easy to get > started. > > perf record -p + perf report already appear to give some clue: > > 88.27% emacs emacs-30-vcs [.] buf_bytepos_to_charpos > 3.75% emacs emacs-30-vcs [.] re_match_2_internal > 1.35% emacs emacs-30-vcs [.] scan_sexps_forward > 1.03% emacs emacs-30-vcs [.] re_search_2 > 0.65% emacs emacs-30-vcs [.] find_interval > 0.56% emacs emacs-30-vcs [.] sub_char_table_ref > 0.55% emacs emacs-30-vcs [.] lookup_char_property > > The fraction of buf_bytepos_to_charpos increases over repeated benchmark > runs. So buf_bytepos_to_charpos is the main suspect now, I guess. This could happen because either (a) buf_bytepos_to_charpos is called more times as session uptime progresses, or (b) because each call to buf_bytepos_to_charpos becomes more and more expensive. So I think the first question is: how many times is buf_bytepos_to_charpos called for each search, or, equivalently, is the CPU time per call used up by buf_bytepos_to_charpos stays stable or goes up? I think perf can answer these questions if you ask nicely. If the number of calls is the same, but each call becomes more and more expensive, then the next step is to ask perf to produce a detailed profile for each line of buf_bytepos_to_charpos, and see which parts of it become more expensive. I could think about a couple of possible reasons for that, but I'd rather not speculate about profiles, as that is known to produce wrong guesses. Is the buffer in question being edited as time advances? Or is buffer text and everything else in the buffer left unchanged? > In contrast, using find-file-literally produces > > 34.44% emacs emacs-30-vcs [.] re_match_2_internal > 25.55% emacs emacs-30-vcs [.] scan_sexps_forward > 11.09% emacs emacs-30-vcs [.] re_search_2 > ... > 0.59% emacs emacs-30-vcs [.] buf_bytepos_to_charpos > > with buf_bytepos_to_charpos taking diminishing cpu sample fraction. That find-file-literally yields a buffer with a much faster buf_bytepos_to_charpos is not surprising: when each character is a single byte, the conversion is trivial, and buf_bytepos_to_charpos returns immediately. The puzzling part is not that buf_bytepos_to_charpos is much more expensive in a buffer with non-ASCII text, the puzzle is why it becomes more and more expensive with time. Thanks.