bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Eli Zaretskii <eliz@gnu.org>
To: Gregory Heytings <gregory@heytings.org>
Cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es
Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Date: Mon, 27 Sep 2021 08:14:09 +0300	[thread overview]
Message-ID: <83fstqmwgu.fsf@gnu.org> (raw)
In-Reply-To: <2391d2ad6f5d14256480@heytings.org> (message from Gregory Heytings on Mon, 27 Sep 2021 00:43:05 +0000)

> Date: Mon, 27 Sep 2021 00:43:05 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es
> 
> >> rg O_CREAT takes 0.18 seconds
> >> gid O_CREAT takes 0.02 seconds
> >> rg O.?CREAT takes 0.18 seconds
> >> gid O.?CREAT takes 0.93 seconds
> >> rg O.*CREAT takes 0.19 seconds
> >> gid O.*CREAT takes 1.73 seconds
> >>
> >> Isn't idutils the one that doesn't scale?
> >
> > No.  You compare apples with oranges.
> 
> No.  I compare apples with apples.  I compare regexp searches in a code 
> base with regexp searches in a code base.

There are different implementations of regexp searches in a code
base.  Are you saying that they all are "apples" for the purpose of
comparing scalability?  Then why not compare IDUtils with Grep?

To get back to the issue at hand: we are talking (or at least I was
talking) about scalability of an algorithm, not about some particular
implementation of the algorithm.  Ripgrep is a multithreaded program,
whereas idutils is single-threaded.  So for a fair comparison of
scalability of these two main ideas: file-based search vs DB search,
you need at the very least to limit ripgrep to a single thread.  And
then you need to run each program on code bases of various sizes,
preferably those which differ by orders of magnitude or close to that,
and see their O(n) behavior.  And exclude from your comparison
command-line options that require IDUtils to access the files in
addition to the DB.  That would be at least an approximation to
comparing apples to apples.  For an even better approximation, one
should analyze the implementations of regexp search in each program
and see if there aren't any significant differences there, or in any
other aspects of the implementation that could significantly affect
the speed.  If there are such aspects, then the study of scalability
should either account for the differences or implement the same regexp
etc. algorithms in both programs before comparing them.

But frankly, I don't understand why this all would be needed at all,
because it should be absolutely clear that searching the files in the
filesystem will always scale worse than reading a well-indexed DB.
IDUtils is an example of the latter, and it beats many utilities that
search the files, including ripgrep, as long as it doesn't need to
access the files themselves.  But even if it doesn't always beat them
(which you didn't yet demonstrate), it just means the ideas of its
design should be taken further and/or implemented better, that's all.

> Because this is a thread about regexp searches in a code base.

As should be clear from what I wrote at the beginning of this
sub-thread, it is about scalability of the various algorithms for such
searches, not about specific programs:

> > Btw, I don't understand why we focus on general-purpose text-searching 
> > tools for these features.  Why not focus on packages like ID Utils 
> > instead, they are so much faster.  Daniel, could you time the same 
> > search in that large tree when xref-search-program is 'gid'?  (You'd 
> > need to run 'mkid' first, to create the ID database, but that is 
> > one-time, and is very fast.)  As I told many times, I think this is the 
> > future: program language sensitive tools that use a precomputed DB.
> 
> It should now be clear that idutils is not "so much faster", it is 
> marginally faster in one case, and slower in all other cases.  And it 
> doesn't do what project-find-regexp needs, because it ignores (most, but 
> not all) tokens in comments (oh, BTW, including tokens in comments has 
> been on its TODO for at least 20 years).  Creating the ID database is also 
> not "very fast", and the ID database cannot be updated incrementally (oh, 
> BTW, incremental database updates has been on its TODO list for at least 
> 20 years).  In short, it's an outdated tool, that isn't maintained 
> anymore, and that can't be the future.

I said that such tools are the future, not that IDUtils itself is
necessarily the future (though it could be, if someone picks up its
development).  Again, this is about looking for the best tools for
this job, and I still stand by my opinion: focusing only on
general-purpose search tools is sub-optimal.  We should be on the
lookout for the other kind, and support those of them that exist at
least as well as we support Grep and its ilk.

next prev parent reply	other threads:[~2021-09-27  5:14 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <m1h7edq7sc.fsf.ref@yahoo.es>
2021-09-22  9:27 ` bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-22 19:09   ` Dmitry Gutov
2021-09-22 21:58     ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-22 23:09       ` Dmitry Gutov
2021-09-23 17:41         ` Dmitry Gutov
2021-09-23 21:17         ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-23 22:40           ` Dmitry Gutov
2021-09-24  6:34             ` Eli Zaretskii
2021-09-24  7:02               ` Juri Linkov
2021-09-24 10:43                 ` Eli Zaretskii
2021-09-24 15:30                   ` Juri Linkov
2021-09-24 16:08                     ` Eli Zaretskii
2021-09-24  9:00               ` Gregory Heytings
2021-09-24 11:00                 ` Eli Zaretskii
2021-09-24 11:43                   ` Dmitry Gutov
2021-09-24 13:18                   ` Gregory Heytings
2021-09-24 14:20                     ` Eli Zaretskii
2021-09-24 15:50                       ` Dmitry Gutov
2021-09-24 16:18                         ` Eli Zaretskii
2021-09-24 16:22                           ` Dmitry Gutov
2021-09-24 16:24                       ` Gregory Heytings
2021-09-24 17:49                         ` Eli Zaretskii
2021-09-24 17:52                           ` Gregory Heytings
2021-09-24 18:48                             ` Eli Zaretskii
2021-09-24 21:49                               ` Gregory Heytings
2021-09-25  6:26                                 ` Eli Zaretskii
2021-09-27  0:43                                   ` Gregory Heytings
2021-09-27  5:14                                     ` Eli Zaretskii [this message]
2021-09-27  9:23                                       ` Gregory Heytings
2021-09-27 11:48                                         ` Eli Zaretskii
2021-09-27 12:25                                           ` Gregory Heytings
2021-09-27 12:44                                             ` Eli Zaretskii
2021-09-27 19:36                                               ` Gregory Heytings
2021-09-28  5:17                                                 ` Eli Zaretskii
2021-09-28  8:23                                                   ` Gregory Heytings
2021-09-27 12:58                                             ` Dmitry Gutov
2021-09-27 13:05                                               ` Eli Zaretskii
2021-09-27 19:37                                                 ` Gregory Heytings
2021-09-24 18:17                           ` Dmitry Gutov
2021-09-24 18:51                             ` Eli Zaretskii
2021-09-24 11:40               ` Dmitry Gutov
2021-09-24 12:02                 ` Eli Zaretskii
2021-09-24 12:22                   ` Dmitry Gutov
2021-09-24 13:06                     ` Eli Zaretskii
2021-09-24 14:05                       ` Dmitry Gutov
2021-09-23  6:13       ` Eli Zaretskii
2021-09-23 20:42         ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-23 22:18           ` Dmitry Gutov
2021-09-24  6:03           ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83fstqmwgu.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=50733@debbugs.gnu.org \
    --cc=dgutov@yandex.ru \
    --cc=gregory@heytings.org \
    --cc=mardani29@yahoo.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.