From: sbaugh@catern.com
To: Dmitry Gutov <dgutov@yandex.ru>
Cc: Spencer Baugh <sbaugh@janestreet.com>, 62837@debbugs.gnu.org
Subject: bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
Date: Sat, 15 Apr 2023 21:56:24 +0000 (UTC) [thread overview]
Message-ID: <871qkkn720.fsf@catern.com> (raw)
In-Reply-To: <5e6eddd5-4b38-5765-05f3-dd6c1927edd3@yandex.ru> (Dmitry Gutov's message of "Sat, 15 Apr 2023 01:38:18 +0300")
Dmitry Gutov <dgutov@yandex.ru> writes:
> Hi!
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
>> When project-files is available, this is a much more efficient
>> fallback than the current grep fallback. Ultimately, this is
>> motivated by making xref-find-references faster by default even in the
>> absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
>
> E.g. I have this checkout of gecko-dev (a big project, just for
> testing: https://github.com/mozilla/gecko-dev) which contains
> different types of files: cpp, js, py.
>
> If I do an xref-find-references search with the current code, it
> finishes in around ~0.8s. 'find' is not that slow, actually:
>
> time find . -type f -name "*.cpp" >/dev/null
>
> reports just 400 ms here.
>
> Whereas with your patch the search, depending on the language (cpp --
> more files, py -- less files) can take 3 seconds and more.
>
> Why? First of all, project-files returns all files (which are then all
> searched), whereas semantic-symref-filepattern-alist contains a
> mapping from modes to file globs, limiting both the scan and
> subsequent search to those.
>
> Second -- using project-files means we're forced to round-trip the
> list of files names from the first project's stdout, to buffer, then
> to a list of Lisp strings, and then back to another buffer, to use as
> stdin. I have a couple of things planner in the medium term to improve
> that, but some overhead is probably unavoidable (unless we get some
> new primitive that would allow "piping" between process buffers).
Yes, this is a very good point.
> Perhaps you could describe your case where you *did* see a significant
> improvement from this patch, and we can discuss the best steps to
> address that.
In short: I have a project.el backend for a large monorepo which has a
project-files backend which returns only the subset of files which are
relevant to work happening in a given clone. (Generally a user will
have many clones and be doing different work in each one.) The
relevant-files subset is determined by integration with the build
system.
So running find returns a vast number of files and then searches over
those, whereas running a search over project-files searches a much
smaller number of files.
Regarding your medium-term plans to improve project-files performance -
wildly guessing, but perhaps you have in mind a way to run a subprocess
that outputs the project-files list? Let's call it
"project-files-process". And then project-files-process could be piped
to grep instead, for maximum efficiency? If that was the idea, then my
own backend could certainly have a project-files-process implementation
too, for maximum efficiency.
> BTW, at first I figured you're using MacOS (which historically has
> bundled outdated versions of find and grep, with worse
> performance). But apparently not?
Nope, Linux.
next prev parent reply other threads:[~2023-04-15 21:56 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-14 15:37 bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files Spencer Baugh
2023-04-14 22:38 ` Dmitry Gutov
2023-04-15 6:50 ` Eli Zaretskii
2023-04-15 12:37 ` Dmitry Gutov
2023-04-15 21:56 ` sbaugh [this message]
2023-04-19 1:10 ` Dmitry Gutov
2023-04-19 1:26 ` Spencer Baugh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qkkn720.fsf@catern.com \
--to=sbaugh@catern.com \
--cc=62837@debbugs.gnu.org \
--cc=dgutov@yandex.ru \
--cc=sbaugh@janestreet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).