From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Wed, 13 Sep 2023 17:27:49 +0300 Message-ID: References: <937c3b8e-7742-91b7-c2cf-4cadd0782f0c@gutov.dev> <83a5vlsanw.fsf@gnu.org> <69a98e2a-5816-d36b-9d04-8609291333cd@gutov.dev> <87351cs8no.fsf@localhost> <35163e56-607d-9c5b-e3e8-5d5b548b3cb7@gutov.dev> <878rb3m43b.fsf@localhost> <83v8e6lyi4.fsf@gnu.org> <35f8b664-0241-9f96-1aa0-20ca51b2d34c@gutov.dev> <59c30342-a7e0-d83b-a128-0faae4cbd633@gutov.dev> <83pm4bi6qa.fsf@gnu.org> <83bkfs2tw5.fsf@gnu.org> <18a0b4d8-32bd-3ecd-8db4-32608a1ebba7@gutov.dev> <83il8lxjcu.fsf@gnu.org> <2e21ec81-8e4f-4c02-ea15-43bd6da3daa7@gutov.dev> <8334zmtwwi.fsf@gnu.org> <83tts0rkh5.fsf@gnu.org> <831qf3pd1y.fsf@gnu.org> <28a7916e-92d5-77ab-a61e-f85b59ac76b1@gutov.dev> <83sf7jnq0m.fsf@gnu.org> <5c493f86-0af5-256f-41a7-7d886ab4c5e4@gutov.dev> <83ledanvzw.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37190"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net, 64735@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Sep 13 16:29:17 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qgQrk-0009Rm-6a for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 13 Sep 2023 16:29:17 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qgQrW-0006kP-Ni; Wed, 13 Sep 2023 10:29:02 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qgQrV-0006jX-AM for bug-gnu-emacs@gnu.org; Wed, 13 Sep 2023 10:29:01 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qgQrU-00015r-Vi for bug-gnu-emacs@gnu.org; Wed, 13 Sep 2023 10:29:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qgQrY-0001Ri-5r for bug-gnu-emacs@gnu.org; Wed, 13 Sep 2023 10:29:04 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 13 Sep 2023 14:29:04 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64735 X-GNU-PR-Package: emacs Original-Received: via spool by 64735-submit@debbugs.gnu.org id=B64735.16946152935453 (code B ref 64735); Wed, 13 Sep 2023 14:29:04 +0000 Original-Received: (at 64735) by debbugs.gnu.org; 13 Sep 2023 14:28:13 +0000 Original-Received: from localhost ([127.0.0.1]:35535 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qgQqi-0001Pr-Ro for submit@debbugs.gnu.org; Wed, 13 Sep 2023 10:28:13 -0400 Original-Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:55661) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qgQqb-0001P9-5Y for 64735@debbugs.gnu.org; Wed, 13 Sep 2023 10:28:12 -0400 Original-Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id C4A3332007BE; Wed, 13 Sep 2023 10:27:53 -0400 (EDT) Original-Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 13 Sep 2023 10:27:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm3; t= 1694615273; x=1694701673; bh=qw2CUkx912C0EhU21xm0F0yGgOIkoPNpXaq 0dSd83CU=; b=gdYalQe+frSoSrAKBZoRc6hx11c7rEThP5nn9LASD57AjVT1tda 2+3ktvA+S6RlwvXsxvcNe/DJSFa5kPO+fbgmlGnRkPQuqfJ61oLzJNzkCyx+g7e4 kx1FLKf5TE9YkXqJING17bTVFdKPxxM4hyPbV/f1INeXSuSSRLBOi4QlPfJmipLy dDLsqYGWzsfp+WqbyxG4raHHrccGOIBei18G8MASOK9at1a/BYPj5u/HoybPYh/J fH0Gw/dleI4D9lmibsk7UxErry8X4TuTr81VmjEs70k2wTprnUNo7iCUauDV1bdE Ij6x3cbJosj/RFDHKkowaFP3nxBMjLMcPlg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1694615273; x=1694701673; bh=qw2CUkx912C0EhU21xm0F0yGgOIkoPNpXaq 0dSd83CU=; b=Sz8bHqbZoWoVobpfkxe8AEylesT6As0o+iHqKhr0GbB7rlzt5yi M/OEw3ISn+x0R2kder2XAve0l5gYbthg/SQaapoZvRXohLCV5WKTf2ZQLDbjBxxQ ZKFIAv2naipmpkiKGP+k3OiVv7k5rAkAp8+MfEu2V4fHOa8xcDmkB1WNtySsUQ7K nyStORqb6pAAEdvb8RHTQrE/KPnqGaRHkuy7blIAilKxzNZQaDF2dx0Cygmz8rpU ATLkcGEqkohM/Xn2cRVhUXHRUhWzq7BGdnKFNMwgBbY40jTobIryBoSRrx8qKIE3 6FFBhZE3Ua1zKBpNtDtNfZmaEqm6nfjfZAQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrudeikedgjeejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtfeejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepiefgteevheevveffheeltdeukeeiieekueefgedugfefgefhudelgfefveel vdevnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 13 Sep 2023 10:27:51 -0400 (EDT) Content-Language: en-US In-Reply-To: <83ledanvzw.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:270295 Archived-At: On 13/09/2023 14:38, Eli Zaretskii wrote: >> Date: Tue, 12 Sep 2023 23:27:49 +0300 >> Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net, >> 64735@debbugs.gnu.org >> From: Dmitry Gutov >> >>> That's why I said "or writing a new filter function". >>> read_and_dispose_of_process_output will have to call this new filter >>> differently, passing it the raw text read from the subprocess, where >>> read_and_dispose_of_process_output current first decodes the text and >>> produces a Lisp string from it. Then the filter would need to do >>> something similar to what insert-file-contents does: insert the raw >>> input into the gap, then call decode_coding_gap to decode that >>> in-place. >> >> Does the patch from my last patch-bearing email look similar enough to >> what you're describing? >> >> The one called read_and_insert_process_output.diff > > No, not entirely: it still produces a Lisp string when decoding is > needed, and then inserts that string into the buffer. Are you sure? IIUC the fact that is passes 'curbuf' as the last argument to 'decode_coding_c_string' means that decoding happens inside the buffer. This has been my explanation for the performance improvement anyway. If it still generated a Lisp string, I think that would mean that we could save the general shape of internal-default-process-filter and just improve its implementation for the same measured benefit. > Did you look at what insert-file-contents does? If not I suggest to > have a look, starting from this comment: > > /* Here, we don't do code conversion in the loop. It is done by > decode_coding_gap after all data are read into the buffer. */ > > and ending here: > > if (CODING_MAY_REQUIRE_DECODING (&coding) > && (inserted > 0 || CODING_REQUIRE_FLUSHING (&coding))) > { > /* Now we have all the new bytes at the beginning of the gap, > but `decode_coding_gap` can't have them at the beginning of the gap, > so we need to move them. */ > memmove (GAP_END_ADDR - inserted, GPT_ADDR, inserted); > decode_coding_gap (&coding, inserted); > inserted = coding.produced_char; > coding_system = CODING_ID_NAME (coding.id); > } > else if (inserted > 0) > { > /* Make the text read part of the buffer. */ > eassert (NILP (BVAR (current_buffer, enable_multibyte_characters))); > insert_from_gap_1 (inserted, inserted, false); > > invalidate_buffer_caches (current_buffer, PT, PT + inserted); > adjust_after_insert (PT, PT_BYTE, PT + inserted, PT_BYTE + inserted, > inserted); > } That does look different. I'm not sure how long it would take me to adapt this code (if you have an alternative patch to suggest right away, please go ahead), but if this method turns out to be faster, it sounds like we could improve the performance of 'call_process' the same way. That would be a win-win. >> The result there, though, is that a "filter" (in the sense that >> make-process uses that term) is not used at all. > > Sure, but in this case we don't need any filtering. It's basically > the same idea as internal-default-process-filter: we just need to > insert the process output into a buffer, and optionally decode it. Pretty much. But that raises the question of what to do with the existing function internal-default-process-filter. Looking around, it doesn't seem to be used with advice (a good thing: the proposed change would break that), but it is called directly in some packages like magit-blame, org-assistant, with-editor, wisi, sweeprolog, etc. I suppose we'd just keep it around unchanged. >>> And if we want to use it in production, we should >>> probably work on adding that special default filter which inserts and >>> decodes directly into the buffer, because that will probably lower the >>> GC pressure and thus has hope of being faster. Or even replace the >>> default filter implementation with that new one. >> >> But a filter must be a Lisp function, which can't help but accept only >> Lisp strings (not C string) as argument. Isn't that right? > > We can provide a special filter identified by a symbol. Such a filter > will not be Lisp-callable, it will exist for the cases where we need > to insert the output into the process buffer. The would be the safest alternative. OTOH, this way we'd pass up on the opportunity to make all existing asynchronous processes without custom filters, a little bit faster in one fell swoop. > Any Lisp callback could > then access the process output as the text of that buffer, no Lisp > strings needed. I thought this was a worthy goal; apologies if I > misunderstood. Sorry, I was just quibbling about the terminology, to make sure we are on the same page on what is being proposed. If the patch and evidence look good to people, that is. And I'd like to explore that improvement venue to the max. But note that it has limitations as well (e.g. filter is the only way to get in-process callbacks from the process, and avoiding it for best performance will require external callback such as timers), so if someone has any better ideas how to improve GC time to a comparable extent but keep design unchanged, that's also welcome. Should we also discuss increasing the default of read-process-output-max? Even increasing it 10x (not necessarily 100x) creates a noticeable difference, especially combined with the proposed change.