unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: sbaugh@catern.com
To: Dmitry Gutov <dgutov@yandex.ru>
Cc: Spencer Baugh <sbaugh@janestreet.com>, 62837@debbugs.gnu.org
Subject: bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
Date: Sat, 15 Apr 2023 21:56:24 +0000 (UTC)	[thread overview]
Message-ID: <871qkkn720.fsf@catern.com> (raw)
In-Reply-To: <5e6eddd5-4b38-5765-05f3-dd6c1927edd3@yandex.ru> (Dmitry Gutov's message of "Sat, 15 Apr 2023 01:38:18 +0300")

Dmitry Gutov <dgutov@yandex.ru> writes:
> Hi!
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
>> When project-files is available, this is a much more efficient
>> fallback than the current grep fallback.  Ultimately, this is
>> motivated by making xref-find-references faster by default even in the
>> absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
>
> E.g. I have this checkout of gecko-dev (a big project, just for
> testing: https://github.com/mozilla/gecko-dev) which contains
> different types of files: cpp, js, py.
>
> If I do an xref-find-references search with the current code, it
> finishes in around ~0.8s. 'find' is not that slow, actually:
>
>   time find . -type f -name "*.cpp" >/dev/null
>
> reports just 400 ms here.
>
> Whereas with your patch the search, depending on the language (cpp --
> more files, py -- less files) can take 3 seconds and more.
>
> Why? First of all, project-files returns all files (which are then all
> searched), whereas semantic-symref-filepattern-alist contains a
> mapping from modes to file globs, limiting both the scan and
> subsequent search to those.
>
> Second -- using project-files means we're forced to round-trip the
> list of files names from the first project's stdout, to buffer, then
> to a list of Lisp strings, and then back to another buffer, to use as
> stdin. I have a couple of things planner in the medium term to improve
> that, but some overhead is probably unavoidable (unless we get some
> new primitive that would allow "piping" between process buffers).

Yes, this is a very good point.

> Perhaps you could describe your case where you *did* see a significant
> improvement from this patch, and we can discuss the best steps to
> address that.

In short: I have a project.el backend for a large monorepo which has a
project-files backend which returns only the subset of files which are
relevant to work happening in a given clone.  (Generally a user will
have many clones and be doing different work in each one.)  The
relevant-files subset is determined by integration with the build
system.

So running find returns a vast number of files and then searches over
those, whereas running a search over project-files searches a much
smaller number of files.

Regarding your medium-term plans to improve project-files performance -
wildly guessing, but perhaps you have in mind a way to run a subprocess
that outputs the project-files list?  Let's call it
"project-files-process".  And then project-files-process could be piped
to grep instead, for maximum efficiency?  If that was the idea, then my
own backend could certainly have a project-files-process implementation
too, for maximum efficiency.

> BTW, at first I figured you're using MacOS (which historically has
> bundled outdated versions of find and grep, with worse
> performance). But apparently not?

Nope, Linux.





  parent reply	other threads:[~2023-04-15 21:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 15:37 bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files Spencer Baugh
2023-04-14 22:38 ` Dmitry Gutov
2023-04-15  6:50   ` Eli Zaretskii
2023-04-15 12:37     ` Dmitry Gutov
2023-04-15 21:56   ` sbaugh [this message]
2023-04-19  1:10     ` Dmitry Gutov
2023-04-19  1:26       ` Spencer Baugh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qkkn720.fsf@catern.com \
    --to=sbaugh@catern.com \
    --cc=62837@debbugs.gnu.org \
    --cc=dgutov@yandex.ru \
    --cc=sbaugh@janestreet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).