From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Francesco =?UTF-8?Q?Potort=C3=AC?= Newsgroups: gmane.emacs.bugs Subject: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions Date: Thu, 10 Oct 2024 16:25:28 +0200 Organization: The GNU project Message-ID: <874j5khw6f.fsf@tucano.isti.cnr.it> References: <87tteaznog.fsf@zephyr.silentflame.com> <87jzezzg87.fsf_-_@zephyr.silentflame.com> <37e4b3cd-6363-4f55-9921-92a1182679dc@gutov.dev> <86ttdy50ja.fsf@gnu.org> <75fe4289-da41-454d-ba92-22a92ea7002f@gutov.dev> <86frpe2186.fsf@gnu.org> <8e305b6d-8ca8-4437-990f-183ebc007d18@gutov.dev> <865xqa1ggi.fsf@gnu.org> <86ttdtzoof.fsf@gnu.org> <8d7dc133-9828-4023-821f-e4403f899f81@gutov.dev> <86ttdsxt6x.fsf@gnu.org> <52cb1caa-9e7e-45df-b328-d60948d397f6@gutov.dev> <864j5rxca1.fsf@gnu.org> <87a5fiijy9.fsf@tucano.isti.cnr.it> <86jzelvjh4.fsf@gnu.org> <8b6560a9-e2d6-42ae-ac1d-014700f21804@gutov.dev> <86wmiktzez.fsf@gnu.org> <86ldyzucdd.fsf@gnu.org> <021c625b-adc9-4e19-819c-fe929583e503@gutov.dev> <86ed4ru41x.fsf@gnu.org> <8d86f23e-fdc3-45a5-b3c8-cd4670813e21@gutov.dev> <86ploasq35.fsf@gnu.org> <3e63f532-c6af-4923-880b-01a32cc667ec@gutov.dev> <878quwix4c.fsf@tucano.isti.cnr.it> <86ldyw3467.fsf@gnu.org> <875xq0icqa.fsf@tucano.isti.cnr.it> <86y12w1hjp.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38655"; mail-complaints-to="usenet@ciao.gmane.io" Cc: dmitry@gutov.dev, 73484@debbugs.gnu.org, spwhitton@spwhitton.name To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Oct 10 16:29:07 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1syuA6-0009rN-Re for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 10 Oct 2024 16:29:07 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1syu9t-0006OA-O2; Thu, 10 Oct 2024 10:28:53 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1syu9r-0006Nm-GA for bug-gnu-emacs@gnu.org; Thu, 10 Oct 2024 10:28:51 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1syu9r-00084p-7P for bug-gnu-emacs@gnu.org; Thu, 10 Oct 2024 10:28:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:References:In-Reply-To:Date:From:To:Subject; bh=jzpwtLuufN0WlzSpynCAMjPVm7WWIS9CI/wT5NUeOe0=; b=LBJ2dzVayUw2YClJuWQHJw/GdYT9kG8P57bkr/KE2vCydSPR/FvfYIQm0uivzstK7ki0fQIRha+EgpnTkk5oDYDzNaSm9VBTyzshQ5Ia1nLXYzlw1D7jkhDGRT72W7bQZ/1MWDGQ9uq8scc26XCleFpLEc51zl5cWyqqaVjJBbzBZylIZfqyNGzH7TzVsffdnaBFCfxkFm4SW+KvIJTHFWIwCmAWDxyYvMIOGqDYidOvgykIA43ogiwBDweKN9oIArnqa25AwgHlNoeHHKKpnhnTv3LB6BDxBYsmbJEMdMqmymG4WB6D+biwX3F3jxKAubnV0k4DdkexQ4/Nat/8ow==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1syuA2-00059c-B4 for bug-gnu-emacs@gnu.org; Thu, 10 Oct 2024 10:29:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Francesco =?UTF-8?Q?Potort=C3=AC?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 10 Oct 2024 14:29:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 73484 X-GNU-PR-Package: emacs Original-Received: via spool by 73484-submit@debbugs.gnu.org id=B73484.172857049219748 (code B ref 73484); Thu, 10 Oct 2024 14:29:02 +0000 Original-Received: (at 73484) by debbugs.gnu.org; 10 Oct 2024 14:28:12 +0000 Original-Received: from localhost ([127.0.0.1]:60167 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1syu9D-00058Q-HP for submit@debbugs.gnu.org; Thu, 10 Oct 2024 10:28:12 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:54268) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1syu9B-000588-7O for 73484@debbugs.gnu.org; Thu, 10 Oct 2024 10:28:10 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1syu6m-0007ih-Tu; Thu, 10 Oct 2024 10:25:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:References:Subject:In-Reply-To:To:Date: From; bh=jzpwtLuufN0WlzSpynCAMjPVm7WWIS9CI/wT5NUeOe0=; b=U5WjZGC2ZnqhHVfSmH2Y dC2wbNCkb6NvBiaH9qFDcc5WmfbpuBRaRmFLDJ+U0EEdaZfNXsPMObhZXemTDAbunuVUcdfbFUogS RYpeV39qTOZ+KGqWHt3rN02VYXj2p93PV7VHl8ud/NRTK2s/s1TGJigtywOSDAQELB8Uh3EwdhvUy m8Gk7fOBVnm04DGBDJMwYgeAE9o2u201t06Qg/Z6XilM5mSd06H+3Wuior6mf5D+0s0f8KnHMI84A SvXh3H2/6Mj1T8mHL671g2KXLIu+0ZAUxHdkcFHIySFB5pkZPHneM5POf/m+LDGs8FCeE0gDg3niZ Jd9es10vJ9Fh3A==; In-Reply-To: <86y12w1hjp.fsf@gnu.org> (eliz@gnu.org) X-fingerprint: 4B02 6187 5C03 D6B1 2E31 7666 09DF 2DC9 BE21 6115 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:293285 Archived-At: >> I had a quick look at the whole code and in fact the only place I can find where ou have O^2 behaviour seems to be file name comparison, and it still looks so strange to me that this can in facrt cause significant delay. > >We are using etags on a huge tree: about 375K files. I think that's >the reason, because non-linear behaviors are like that: they are >insignificant with small sets, but huge with larger ones... > >Profiles don't lie... Ok, makes sense. I must have missed the number of files in your previous explanations, sorry. The only other place where I found O^2 behaviour is when managing #line directives, but you already tried to disable them without much change. So let's concentrate on file name comparison which is done in process_file_name at for (fdp = fdhead; fdp != NULL; fdp = fdp->next) { assert (fdp->infname != NULL); if (streq (uncompressed_name, fdp->infname)) goto cleanup; } This is a simple O^2 comparison, which is repeated sum(1,N,N-1)=~N^2/2, which for ~375k files means ~70G comparisons. If you can count the number of times streq is called and 70G is a substantial portion of that number, then we have the culprit. To check, just remove the above test and see if the running time drops. In that case, using a hash rather than a comparison would probably make sense. Alternatively, rather than managing file names in a single loop, do a first loop on all file names to canonicalise them, but without searching for tags (essentially, remove the call to process_file from process_file_name), then uniquify the list of canonicalised file names, then run process_file on them.