From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time Date: Mon, 27 Sep 2021 14:48:06 +0300 Message-ID: <83y27ikznt.fsf@gnu.org> References: <03aa81b5-6077-c35c-1a5f-ec4d867b59ac@yandex.ru> <63300a34-e487-02d1-c182-2b84438654d7@yandex.ru> <83k0j6trau.fsf@gnu.org> <838rzmtf00.fsf@gnu.org> <83sfxurr67.fsf@gnu.org> <83ilypsw31.fsf@gnu.org> <83h7e9stbm.fsf@gnu.org> <835yuprx1d.fsf@gnu.org> <2391d2ad6f5d14256480@heytings.org> <83fstqmwgu.fsf@gnu.org> <2391d2ad6f1301eed00b@heytings.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32893"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es To: Gregory Heytings Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Sep 27 13:49:25 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mUp8O-0008OT-SS for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Sep 2021 13:49:24 +0200 Original-Received: from localhost ([::1]:58464 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mUp8N-0000U5-Fc for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Sep 2021 07:49:23 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54356) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mUp82-0000RU-33 for bug-gnu-emacs@gnu.org; Mon, 27 Sep 2021 07:49:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:56593) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mUp81-0002sf-RZ for bug-gnu-emacs@gnu.org; Mon, 27 Sep 2021 07:49:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mUp81-0006PL-Ni for bug-gnu-emacs@gnu.org; Mon, 27 Sep 2021 07:49:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 27 Sep 2021 11:49:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 50733 X-GNU-PR-Package: emacs Original-Received: via spool by 50733-submit@debbugs.gnu.org id=B50733.163274329724561 (code B ref 50733); Mon, 27 Sep 2021 11:49:01 +0000 Original-Received: (at 50733) by debbugs.gnu.org; 27 Sep 2021 11:48:17 +0000 Original-Received: from localhost ([127.0.0.1]:39906 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mUp7I-0006O5-L9 for submit@debbugs.gnu.org; Mon, 27 Sep 2021 07:48:17 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:49038) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mUp7G-0006Nq-Ff for 50733@debbugs.gnu.org; Mon, 27 Sep 2021 07:48:15 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:51878) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mUp7A-00024H-Fl; Mon, 27 Sep 2021 07:48:08 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:2735 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mUp77-0004V7-2m; Mon, 27 Sep 2021 07:48:06 -0400 In-Reply-To: <2391d2ad6f1301eed00b@heytings.org> (message from Gregory Heytings on Mon, 27 Sep 2021 09:23:28 +0000) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:215651 Archived-At: > Date: Mon, 27 Sep 2021 09:23:28 +0000 > From: Gregory Heytings > cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es > > > To get back to the issue at hand: we are talking (or at least I was > > talking) about scalability of an algorithm, not about some particular > > implementation of the algorithm. > > Are you now again shifting the discussion to something else, a theoretical > comparison between various algorithms? That was the original intent, so there's no change in subject on my side. Apologies if that wasn't clear enough. > > Ripgrep is a multithreaded program, whereas idutils is single-threaded. > > So for a fair comparison of scalability of these two main ideas: > > file-based search vs DB search, you need at the very least to limit > > ripgrep to a single thread. And then you need to run each program on > > code bases of various sizes, preferably those which differ by orders of > > magnitude or close to that, and see their O(n) behavior. And exclude > > from your comparison command-line options that require IDUtils to access > > the files in addition to the DB. That would be at least an > > approximation to comparing apples to apples. > > You're asking me to disable everything that makes ripgrep a modern tool, > and to disable everything that makes idutils an outdated tool, to make the > outdated tool shine in comparison? Interesting viewpoint. That's the _only_ valid viewpoint when talking about scalability of algorithms. Aspects not relevant to the algorithm being examined should not affect the comparison. Comparing a multithreaded implementation with a single-threaded one is like comparing programs running on two very different computers, from the performance POV. > > IDUtils is an example of the latter, and it beats many utilities that > > search the files, including ripgrep, as long as it doesn't need to > > access the files themselves. But even if it doesn't always beat them > > (which you didn't yet demonstrate), it just means the ideas of its > > design should be taken further and/or implemented better, that's all. > > I provided you with many numbers and comparisons, which IMO demonstrate > what they were meant to demonstrate. A tool which finds matches for a > regexp in a O(100 MB) code base in O(10 ms), and in a O(1 GB) code base in > O(100 ms), is clearly good enough in practice. "Good enough in practice" is not the issue at hand. No one in their right mind will argue that ripgrep's performance is not good enough in practice, for the tree sizes that we are talking about. But the scalability you measured is approximately O(n), as expected from a tool that accesses the files while searching. IDUtils' gid basically runs in O(1) time. I've just run it on a tree with 17MB of code in 400 files (ID database size 54KB) and on a tree with 1.4GB of code in 40,000 files (DB size 13.5MB), and I get the same time for the same search: 15ms. No algorithm that actually accesses most or all of the files in the tree can scale like that. And if the above still doesn't convince you, then please forgive me, but I'll bow out of this.