all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Gregory Heytings <gregory@heytings.org>
Cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es
Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Date: Mon, 27 Sep 2021 14:48:06 +0300	[thread overview]
Message-ID: <83y27ikznt.fsf@gnu.org> (raw)
In-Reply-To: <2391d2ad6f1301eed00b@heytings.org> (message from Gregory Heytings on Mon, 27 Sep 2021 09:23:28 +0000)

> Date: Mon, 27 Sep 2021 09:23:28 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es
> 
> > To get back to the issue at hand: we are talking (or at least I was 
> > talking) about scalability of an algorithm, not about some particular 
> > implementation of the algorithm.
> 
> Are you now again shifting the discussion to something else, a theoretical 
> comparison between various algorithms?

That was the original intent, so there's no change in subject on my
side.  Apologies if that wasn't clear enough.

> > Ripgrep is a multithreaded program, whereas idutils is single-threaded. 
> > So for a fair comparison of scalability of these two main ideas: 
> > file-based search vs DB search, you need at the very least to limit 
> > ripgrep to a single thread.  And then you need to run each program on 
> > code bases of various sizes, preferably those which differ by orders of 
> > magnitude or close to that, and see their O(n) behavior.  And exclude 
> > from your comparison command-line options that require IDUtils to access 
> > the files in addition to the DB.  That would be at least an 
> > approximation to comparing apples to apples.
> 
> You're asking me to disable everything that makes ripgrep a modern tool, 
> and to disable everything that makes idutils an outdated tool, to make the 
> outdated tool shine in comparison?  Interesting viewpoint.

That's the _only_ valid viewpoint when talking about scalability of
algorithms.  Aspects not relevant to the algorithm being examined
should not affect the comparison.  Comparing a multithreaded
implementation with a single-threaded one is like comparing programs
running on two very different computers, from the performance POV.

> > IDUtils is an example of the latter, and it beats many utilities that 
> > search the files, including ripgrep, as long as it doesn't need to 
> > access the files themselves.  But even if it doesn't always beat them 
> > (which you didn't yet demonstrate), it just means the ideas of its 
> > design should be taken further and/or implemented better, that's all.
> 
> I provided you with many numbers and comparisons, which IMO demonstrate 
> what they were meant to demonstrate.  A tool which finds matches for a 
> regexp in a O(100 MB) code base in O(10 ms), and in a O(1 GB) code base in 
> O(100 ms), is clearly good enough in practice.

"Good enough in practice" is not the issue at hand.  No one in their
right mind will argue that ripgrep's performance is not good enough in
practice, for the tree sizes that we are talking about.

But the scalability you measured is approximately O(n), as expected
from a tool that accesses the files while searching.  IDUtils' gid
basically runs in O(1) time.  I've just run it on a tree with 17MB of
code in 400 files (ID database size 54KB) and on a tree with 1.4GB of
code in 40,000 files (DB size 13.5MB), and I get the same time for the
same search: 15ms.  No algorithm that actually accesses most or all of
the files in the tree can scale like that.

And if the above still doesn't convince you, then please forgive me,
but I'll bow out of this.





  reply	other threads:[~2021-09-27 11:48 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <m1h7edq7sc.fsf.ref@yahoo.es>
2021-09-22  9:27 ` bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-22 19:09   ` Dmitry Gutov
2021-09-22 21:58     ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-22 23:09       ` Dmitry Gutov
2021-09-23 17:41         ` Dmitry Gutov
2021-09-23 21:17         ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-23 22:40           ` Dmitry Gutov
2021-09-24  6:34             ` Eli Zaretskii
2021-09-24  7:02               ` Juri Linkov
2021-09-24 10:43                 ` Eli Zaretskii
2021-09-24 15:30                   ` Juri Linkov
2021-09-24 16:08                     ` Eli Zaretskii
2021-09-24  9:00               ` Gregory Heytings
2021-09-24 11:00                 ` Eli Zaretskii
2021-09-24 11:43                   ` Dmitry Gutov
2021-09-24 13:18                   ` Gregory Heytings
2021-09-24 14:20                     ` Eli Zaretskii
2021-09-24 15:50                       ` Dmitry Gutov
2021-09-24 16:18                         ` Eli Zaretskii
2021-09-24 16:22                           ` Dmitry Gutov
2021-09-24 16:24                       ` Gregory Heytings
2021-09-24 17:49                         ` Eli Zaretskii
2021-09-24 17:52                           ` Gregory Heytings
2021-09-24 18:48                             ` Eli Zaretskii
2021-09-24 21:49                               ` Gregory Heytings
2021-09-25  6:26                                 ` Eli Zaretskii
2021-09-27  0:43                                   ` Gregory Heytings
2021-09-27  5:14                                     ` Eli Zaretskii
2021-09-27  9:23                                       ` Gregory Heytings
2021-09-27 11:48                                         ` Eli Zaretskii [this message]
2021-09-27 12:25                                           ` Gregory Heytings
2021-09-27 12:44                                             ` Eli Zaretskii
2021-09-27 19:36                                               ` Gregory Heytings
2021-09-28  5:17                                                 ` Eli Zaretskii
2021-09-28  8:23                                                   ` Gregory Heytings
2021-09-27 12:58                                             ` Dmitry Gutov
2021-09-27 13:05                                               ` Eli Zaretskii
2021-09-27 19:37                                                 ` Gregory Heytings
2021-09-24 18:17                           ` Dmitry Gutov
2021-09-24 18:51                             ` Eli Zaretskii
2021-09-24 11:40               ` Dmitry Gutov
2021-09-24 12:02                 ` Eli Zaretskii
2021-09-24 12:22                   ` Dmitry Gutov
2021-09-24 13:06                     ` Eli Zaretskii
2021-09-24 14:05                       ` Dmitry Gutov
2021-09-23  6:13       ` Eli Zaretskii
2021-09-23 20:42         ` Daniel Martín via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-09-23 22:18           ` Dmitry Gutov
2021-09-24  6:03           ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83y27ikznt.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=50733@debbugs.gnu.org \
    --cc=dgutov@yandex.ru \
    --cc=gregory@heytings.org \
    --cc=mardani29@yahoo.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.