From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time Date: Mon, 27 Sep 2021 08:14:09 +0300 Message-ID: <83fstqmwgu.fsf@gnu.org> References: <03aa81b5-6077-c35c-1a5f-ec4d867b59ac@yandex.ru> <63300a34-e487-02d1-c182-2b84438654d7@yandex.ru> <83k0j6trau.fsf@gnu.org> <838rzmtf00.fsf@gnu.org> <83sfxurr67.fsf@gnu.org> <83ilypsw31.fsf@gnu.org> <83h7e9stbm.fsf@gnu.org> <835yuprx1d.fsf@gnu.org> <2391d2ad6f5d14256480@heytings.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9056"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es To: Gregory Heytings Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Sep 27 07:16:27 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mUj06-0002BV-OP for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Sep 2021 07:16:26 +0200 Original-Received: from localhost ([::1]:34152 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mUj05-0001ZF-MI for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Sep 2021 01:16:25 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33794) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mUiyl-0000kH-0k for bug-gnu-emacs@gnu.org; Mon, 27 Sep 2021 01:15:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:56142) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mUiyk-0005j2-Os for bug-gnu-emacs@gnu.org; Mon, 27 Sep 2021 01:15:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mUiyk-00083l-AE for bug-gnu-emacs@gnu.org; Mon, 27 Sep 2021 01:15:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 27 Sep 2021 05:15:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 50733 X-GNU-PR-Package: emacs Original-Received: via spool by 50733-submit@debbugs.gnu.org id=B50733.163271965730826 (code B ref 50733); Mon, 27 Sep 2021 05:15:02 +0000 Original-Received: (at 50733) by debbugs.gnu.org; 27 Sep 2021 05:14:17 +0000 Original-Received: from localhost ([127.0.0.1]:39455 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mUiy1-000816-2z for submit@debbugs.gnu.org; Mon, 27 Sep 2021 01:14:17 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:56830) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mUixy-00080i-Pd for 50733@debbugs.gnu.org; Mon, 27 Sep 2021 01:14:15 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:43534) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mUixs-0004ue-Cu; Mon, 27 Sep 2021 01:14:09 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:2333 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mUixr-0003YV-Qy; Mon, 27 Sep 2021 01:14:08 -0400 In-Reply-To: <2391d2ad6f5d14256480@heytings.org> (message from Gregory Heytings on Mon, 27 Sep 2021 00:43:05 +0000) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:215637 Archived-At: > Date: Mon, 27 Sep 2021 00:43:05 +0000 > From: Gregory Heytings > cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es > > >> rg O_CREAT takes 0.18 seconds > >> gid O_CREAT takes 0.02 seconds > >> rg O.?CREAT takes 0.18 seconds > >> gid O.?CREAT takes 0.93 seconds > >> rg O.*CREAT takes 0.19 seconds > >> gid O.*CREAT takes 1.73 seconds > >> > >> Isn't idutils the one that doesn't scale? > > > > No. You compare apples with oranges. > > No. I compare apples with apples. I compare regexp searches in a code > base with regexp searches in a code base. There are different implementations of regexp searches in a code base. Are you saying that they all are "apples" for the purpose of comparing scalability? Then why not compare IDUtils with Grep? To get back to the issue at hand: we are talking (or at least I was talking) about scalability of an algorithm, not about some particular implementation of the algorithm. Ripgrep is a multithreaded program, whereas idutils is single-threaded. So for a fair comparison of scalability of these two main ideas: file-based search vs DB search, you need at the very least to limit ripgrep to a single thread. And then you need to run each program on code bases of various sizes, preferably those which differ by orders of magnitude or close to that, and see their O(n) behavior. And exclude from your comparison command-line options that require IDUtils to access the files in addition to the DB. That would be at least an approximation to comparing apples to apples. For an even better approximation, one should analyze the implementations of regexp search in each program and see if there aren't any significant differences there, or in any other aspects of the implementation that could significantly affect the speed. If there are such aspects, then the study of scalability should either account for the differences or implement the same regexp etc. algorithms in both programs before comparing them. But frankly, I don't understand why this all would be needed at all, because it should be absolutely clear that searching the files in the filesystem will always scale worse than reading a well-indexed DB. IDUtils is an example of the latter, and it beats many utilities that search the files, including ripgrep, as long as it doesn't need to access the files themselves. But even if it doesn't always beat them (which you didn't yet demonstrate), it just means the ideas of its design should be taken further and/or implemented better, that's all. > Because this is a thread about regexp searches in a code base. As should be clear from what I wrote at the beginning of this sub-thread, it is about scalability of the various algorithms for such searches, not about specific programs: > > Btw, I don't understand why we focus on general-purpose text-searching > > tools for these features. Why not focus on packages like ID Utils > > instead, they are so much faster. Daniel, could you time the same > > search in that large tree when xref-search-program is 'gid'? (You'd > > need to run 'mkid' first, to create the ID database, but that is > > one-time, and is very fast.) As I told many times, I think this is the > > future: program language sensitive tools that use a precomputed DB. > > It should now be clear that idutils is not "so much faster", it is > marginally faster in one case, and slower in all other cases. And it > doesn't do what project-find-regexp needs, because it ignores (most, but > not all) tokens in comments (oh, BTW, including tokens in comments has > been on its TODO for at least 20 years). Creating the ID database is also > not "very fast", and the ID database cannot be updated incrementally (oh, > BTW, incremental database updates has been on its TODO list for at least > 20 years). In short, it's an outdated tool, that isn't maintained > anymore, and that can't be the future. I said that such tools are the future, not that IDUtils itself is necessarily the future (though it could be, if someone picks up its development). Again, this is about looking for the best tools for this job, and I still stand by my opinion: focusing only on general-purpose search tools is sub-optimal. We should be on the lookout for the other kind, and support those of them that exist at least as well as we support Grep and its ilk.