* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
@ 2023-04-14 15:37 Spencer Baugh
2023-04-14 22:38 ` Dmitry Gutov
0 siblings, 1 reply; 7+ messages in thread
From: Spencer Baugh @ 2023-04-14 15:37 UTC (permalink / raw)
To: 62837; +Cc: Dmitry Gutov
[-- Attachment #1: Type: text/plain, Size: 804 bytes --]
Tags: patch
When project-files is available, this is a much more efficient
fallback than the current grep fallback. Ultimately, this is
motivated by making xref-find-references faster by default even in the
absence of an index.
* lisp/cedet/semantic/symref/project.el:
Add.
* lisp/cedet/semantic/symref.el (semantic-symref-tool-alist):
Add project tool
In GNU Emacs 29.0.60 (build 3, x86_64-pc-linux-gnu, X toolkit, cairo
version 1.15.12, Xaw scroll bars) of 2023-03-13 built on
igm-qws-u22796a
Repository revision: e759905d2e0828eac4c8164b09113b40f6899656
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: CentOS Linux 7 (Core)
Configured using:
'configure --with-x-toolkit=lucid --with-modules
--with-gif=ifavailable'
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-a-semantic-symref-backend-which-uses-xref-matche.patch --]
[-- Type: text/patch, Size: 4244 bytes --]
From 2241f428f0d4809d00f397aafd97270272e966e0 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Fri, 14 Apr 2023 11:26:49 -0400
Subject: [PATCH] Add a semantic-symref backend which uses
xref-matches-in-files
When project-files is available, this is a much more efficient
fallback than the current grep fallback. Ultimately, this is
motivated by making xref-find-references faster by default even in the
absence of an index.
* lisp/cedet/semantic/symref/project.el:
Add.
* lisp/cedet/semantic/symref.el (semantic-symref-tool-alist):
Add project tool
---
lisp/cedet/semantic/symref.el | 2 +
lisp/cedet/semantic/symref/project.el | 73 +++++++++++++++++++++++++++
2 files changed, 75 insertions(+)
create mode 100644 lisp/cedet/semantic/symref/project.el
diff --git a/lisp/cedet/semantic/symref.el b/lisp/cedet/semantic/symref.el
index 1ebd7ea154b..7dfe892b7e8 100644
--- a/lisp/cedet/semantic/symref.el
+++ b/lisp/cedet/semantic/symref.el
@@ -93,6 +93,8 @@ semantic-symref-tool-alist
idutils)
( (lambda (rootdir) (file-exists-p (expand-file-name "cscope.out" rootdir))) .
cscope )
+ ( (lambda (rootdir) (project-current nil rootdir)) .
+ project)
)
"Alist of tools usable by `semantic-symref'.
Each entry is of the form:
diff --git a/lisp/cedet/semantic/symref/project.el b/lisp/cedet/semantic/symref/project.el
new file mode 100644
index 00000000000..e822e7a2ba3
--- /dev/null
+++ b/lisp/cedet/semantic/symref/project.el
@@ -0,0 +1,73 @@
+;;; semantic/symref/project.el --- Symref implementation using project and xref -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2008-2023 Free Software Foundation, Inc.
+
+;; Author: Spencer Baugh <sbaugh@janestreet.com>
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+;;
+;; Implement the symref tool API using project-files and
+;; xref-matches-in-files, which in turn use grep or more efficient
+;; tools if available.
+;;
+;; This is basically a replacement for the symref GREP tool, as a new
+;; lowest-common-denominator which works without indices or
+;; project-specific configuration. It has better performance than the
+;; GREP tool because project-files provides a narrower set of files to
+;; search, and xref-matches-in-files is highly efficient.
+
+(require 'semantic/symref)
+(require 'project)
+(require 'xref)
+
+;;; Code:
+
+;;;###autoload
+(defclass semantic-symref-tool-project (semantic-symref-tool-baseclass) ()
+ "A symref tool implementation using project.el.
+This uses `xref-matches-in-files' over `project-files'")
+
+(cl-defmethod semantic-symref-perform-search ((tool semantic-symref-tool-project))
+ (pcase-let
+ (((eieio
+ searchfor
+ (searchtype (or 'symbol 'regexp))
+ ;; for now, we only really support being called by
+ ;; xref-backend-references, and this is what it passes.
+ (resulttype 'line-and-text))
+ tool))
+ (mapcar
+ (pcase-lambda
+ ((cl-struct xref-match-item
+ summary (location
+ (cl-struct xref-file-location file line))))
+ (list line file summary))
+ (xref-matches-in-files searchfor (project-files (project-current))))))
+
+(add-to-list 'semantic-symref-tool-alist
+ '((lambda (rootdir) (project-current nil rootdir))
+ . project))
+
+(provide 'semantic/symref/project)
+
+;; Local variables:
+;; generated-autoload-file: "../loaddefs.el"
+;; generated-autoload-load-name: "semantic/symref/project"
+;; End:
+
+;;; semantic/symref/project.el ends here
--
2.30.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
2023-04-14 15:37 bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files Spencer Baugh
@ 2023-04-14 22:38 ` Dmitry Gutov
2023-04-15 6:50 ` Eli Zaretskii
2023-04-15 21:56 ` sbaugh
0 siblings, 2 replies; 7+ messages in thread
From: Dmitry Gutov @ 2023-04-14 22:38 UTC (permalink / raw)
To: Spencer Baugh, 62837
Hi!
On 14/04/2023 18:37, Spencer Baugh wrote:
> When project-files is available, this is a much more efficient
> fallback than the current grep fallback. Ultimately, this is
> motivated by making xref-find-references faster by default even in the
> absence of an index.
It's a clever enough idea, but unfortunately it doesn't look like the
performance is always improved by this change.
E.g. I have this checkout of gecko-dev (a big project, just for testing:
https://github.com/mozilla/gecko-dev) which contains different types of
files: cpp, js, py.
If I do an xref-find-references search with the current code, it
finishes in around ~0.8s. 'find' is not that slow, actually:
time find . -type f -name "*.cpp" >/dev/null
reports just 400 ms here.
Whereas with your patch the search, depending on the language (cpp --
more files, py -- less files) can take 3 seconds and more.
Why? First of all, project-files returns all files (which are then all
searched), whereas semantic-symref-filepattern-alist contains a mapping
from modes to file globs, limiting both the scan and subsequent search
to those.
Second -- using project-files means we're forced to round-trip the list
of files names from the first project's stdout, to buffer, then to a
list of Lisp strings, and then back to another buffer, to use as stdin.
I have a couple of things planner in the medium term to improve that,
but some overhead is probably unavoidable (unless we get some new
primitive that would allow "piping" between process buffers).
Perhaps you could describe your case where you *did* see a significant
improvement from this patch, and we can discuss the best steps to
address that.
BTW, at first I figured you're using MacOS (which historically has
bundled outdated versions of find and grep, with worse performance). But
apparently not?
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
2023-04-14 22:38 ` Dmitry Gutov
@ 2023-04-15 6:50 ` Eli Zaretskii
2023-04-15 12:37 ` Dmitry Gutov
2023-04-15 21:56 ` sbaugh
1 sibling, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-04-15 6:50 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: sbaugh, 62837
> Date: Sat, 15 Apr 2023 01:38:18 +0300
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
> > When project-files is available, this is a much more efficient
> > fallback than the current grep fallback. Ultimately, this is
> > motivated by making xref-find-references faster by default even in the
> > absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
Maybe we could offer that as optional behavior, turned on by some user
option? Then people who do experience performance boost could use it.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
2023-04-15 6:50 ` Eli Zaretskii
@ 2023-04-15 12:37 ` Dmitry Gutov
0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Gutov @ 2023-04-15 12:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: sbaugh, 62837
On 15/04/2023 09:50, Eli Zaretskii wrote:
>> Date: Sat, 15 Apr 2023 01:38:18 +0300
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>> On 14/04/2023 18:37, Spencer Baugh wrote:
>>> When project-files is available, this is a much more efficient
>>> fallback than the current grep fallback. Ultimately, this is
>>> motivated by making xref-find-references faster by default even in the
>>> absence of an index.
>> It's a clever enough idea, but unfortunately it doesn't look like the
>> performance is always improved by this change.
> Maybe we could offer that as optional behavior, turned on by some user
> option? Then people who do experience performance boost could use it.
Sure. That's also possible. But I'd like more info anyway, for example,
to be able to make the choice about which value of said option should be
the default.
Or if the scenario with the improvement turns out to be a rare one,
concentrate on what project.el needs to provide to make it better.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
2023-04-14 22:38 ` Dmitry Gutov
2023-04-15 6:50 ` Eli Zaretskii
@ 2023-04-15 21:56 ` sbaugh
2023-04-19 1:10 ` Dmitry Gutov
1 sibling, 1 reply; 7+ messages in thread
From: sbaugh @ 2023-04-15 21:56 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Spencer Baugh, 62837
Dmitry Gutov <dgutov@yandex.ru> writes:
> Hi!
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
>> When project-files is available, this is a much more efficient
>> fallback than the current grep fallback. Ultimately, this is
>> motivated by making xref-find-references faster by default even in the
>> absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
>
> E.g. I have this checkout of gecko-dev (a big project, just for
> testing: https://github.com/mozilla/gecko-dev) which contains
> different types of files: cpp, js, py.
>
> If I do an xref-find-references search with the current code, it
> finishes in around ~0.8s. 'find' is not that slow, actually:
>
> time find . -type f -name "*.cpp" >/dev/null
>
> reports just 400 ms here.
>
> Whereas with your patch the search, depending on the language (cpp --
> more files, py -- less files) can take 3 seconds and more.
>
> Why? First of all, project-files returns all files (which are then all
> searched), whereas semantic-symref-filepattern-alist contains a
> mapping from modes to file globs, limiting both the scan and
> subsequent search to those.
>
> Second -- using project-files means we're forced to round-trip the
> list of files names from the first project's stdout, to buffer, then
> to a list of Lisp strings, and then back to another buffer, to use as
> stdin. I have a couple of things planner in the medium term to improve
> that, but some overhead is probably unavoidable (unless we get some
> new primitive that would allow "piping" between process buffers).
Yes, this is a very good point.
> Perhaps you could describe your case where you *did* see a significant
> improvement from this patch, and we can discuss the best steps to
> address that.
In short: I have a project.el backend for a large monorepo which has a
project-files backend which returns only the subset of files which are
relevant to work happening in a given clone. (Generally a user will
have many clones and be doing different work in each one.) The
relevant-files subset is determined by integration with the build
system.
So running find returns a vast number of files and then searches over
those, whereas running a search over project-files searches a much
smaller number of files.
Regarding your medium-term plans to improve project-files performance -
wildly guessing, but perhaps you have in mind a way to run a subprocess
that outputs the project-files list? Let's call it
"project-files-process". And then project-files-process could be piped
to grep instead, for maximum efficiency? If that was the idea, then my
own backend could certainly have a project-files-process implementation
too, for maximum efficiency.
> BTW, at first I figured you're using MacOS (which historically has
> bundled outdated versions of find and grep, with worse
> performance). But apparently not?
Nope, Linux.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
2023-04-15 21:56 ` sbaugh
@ 2023-04-19 1:10 ` Dmitry Gutov
2023-04-19 1:26 ` Spencer Baugh
0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Gutov @ 2023-04-19 1:10 UTC (permalink / raw)
To: sbaugh; +Cc: Spencer Baugh, 62837
On 16/04/2023 00:56, sbaugh@catern.com wrote:
>> Perhaps you could describe your case where you *did* see a significant
>> improvement from this patch, and we can discuss the best steps to
>> address that.
>
> In short: I have a project.el backend for a large monorepo which has a
> project-files backend which returns only the subset of files which are
> relevant to work happening in a given clone. (Generally a user will
> have many clones and be doing different work in each one.) The
> relevant-files subset is determined by integration with the build
> system.
>
> So running find returns a vast number of files and then searches over
> those, whereas running a search over project-files searches a much
> smaller number of files.
Neat.
> Regarding your medium-term plans to improve project-files performance -
> wildly guessing, but perhaps you have in mind a way to run a subprocess
> that outputs the project-files list? Let's call it
> "project-files-process". And then project-files-process could be piped
> to grep instead, for maximum efficiency? If that was the idea, then my
> own backend could certainly have a project-files-process implementation
> too, for maximum efficiency.
That might be step number 3, although I'm not sure yet which kind of
code will be required for the piping to be done efficiently enough.
The other two things I was looking at are:
- Use relative file names (less text to parse, memory to allocate, GC to
thrash). The awkward part is how to merge that with the idea that
project-files can include files from directories ("external roots").
Split those off into a different method? Treat them as separate projects
to flat-map the lists of files at?
- Add arguments to allow filtering the files using the underlying tool.
That can also result is much fewer files to parse in the output under
suitable circumstances (e.g. we'd be able to pass a list of globs here).
There is one implementation of the second item in the branch
scratch/etags-regen.
And both items need to be done carefully enough to maintain some
backward compatibility.
So unless you're in a hurry, give me a few weeks to get around to this.
Further suggestions and patches are welcome, of course.
^ permalink raw reply [flat|nested] 7+ messages in thread
* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
2023-04-19 1:10 ` Dmitry Gutov
@ 2023-04-19 1:26 ` Spencer Baugh
0 siblings, 0 replies; 7+ messages in thread
From: Spencer Baugh @ 2023-04-19 1:26 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: sbaugh, 62837
Dmitry Gutov <dgutov@yandex.ru> writes:
> On 16/04/2023 00:56, sbaugh@catern.com wrote:
>
>>> Perhaps you could describe your case where you *did* see a significant
>>> improvement from this patch, and we can discuss the best steps to
>>> address that.
>> In short: I have a project.el backend for a large monorepo which has
>> a
>> project-files backend which returns only the subset of files which are
>> relevant to work happening in a given clone. (Generally a user will
>> have many clones and be doing different work in each one.) The
>> relevant-files subset is determined by integration with the build
>> system.
>> So running find returns a vast number of files and then searches
>> over
>> those, whereas running a search over project-files searches a much
>> smaller number of files.
>
> Neat.
>
>> Regarding your medium-term plans to improve project-files performance -
>> wildly guessing, but perhaps you have in mind a way to run a subprocess
>> that outputs the project-files list? Let's call it
>> "project-files-process". And then project-files-process could be piped
>> to grep instead, for maximum efficiency? If that was the idea, then my
>> own backend could certainly have a project-files-process implementation
>> too, for maximum efficiency.
>
> That might be step number 3, although I'm not sure yet which kind of
> code will be required for the piping to be done efficiently enough.
>
> The other two things I was looking at are:
>
> - Use relative file names (less text to parse, memory to allocate, GC
> to thrash). The awkward part is how to merge that with the idea that
> project-files can include files from directories ("external
> roots"). Split those off into a different method? Treat them as
> separate projects to flat-map the lists of files at?
>
> - Add arguments to allow filtering the files using the underlying
> tool. That can also result is much fewer files to parse in the
> output under suitable circumstances (e.g. we'd be able to pass a
> list of globs here).
>
> There is one implementation of the second item in the branch
> scratch/etags-regen.
>
> And both items need to be done carefully enough to maintain some
> backward compatibility.
>
> So unless you're in a hurry, give me a few weeks to get around to this.
>
> Further suggestions and patches are welcome, of course.
I'm in no hurry. I will probably add this backend locally at my site in
the meantime. We have no existing (non-trivial) xref-find-references
backend, so speeding this one up isn't too urgent (it's not competing
with anything), but definitely I am interested in project-files (and
project.el in general) speed improvements and will try to help out as it
becomes relevant.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-04-19 1:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-14 15:37 bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files Spencer Baugh
2023-04-14 22:38 ` Dmitry Gutov
2023-04-15 6:50 ` Eli Zaretskii
2023-04-15 12:37 ` Dmitry Gutov
2023-04-15 21:56 ` sbaugh
2023-04-19 1:10 ` Dmitry Gutov
2023-04-19 1:26 ` Spencer Baugh
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.