unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
@ 2023-04-14 15:37 Spencer Baugh
  2023-04-14 22:38 ` Dmitry Gutov
  0 siblings, 1 reply; 7+ messages in thread
From: Spencer Baugh @ 2023-04-14 15:37 UTC (permalink / raw)
  To: 62837; +Cc: Dmitry Gutov

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

Tags: patch


When project-files is available, this is a much more efficient
fallback than the current grep fallback.  Ultimately, this is
motivated by making xref-find-references faster by default even in the
absence of an index.

* lisp/cedet/semantic/symref/project.el:
Add.
* lisp/cedet/semantic/symref.el (semantic-symref-tool-alist):
Add project tool

In GNU Emacs 29.0.60 (build 3, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2023-03-13 built on
 igm-qws-u22796a
Repository revision: e759905d2e0828eac4c8164b09113b40f6899656
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: CentOS Linux 7 (Core)

Configured using:
 'configure --with-x-toolkit=lucid --with-modules
 --with-gif=ifavailable'


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-a-semantic-symref-backend-which-uses-xref-matche.patch --]
[-- Type: text/patch, Size: 4244 bytes --]

From 2241f428f0d4809d00f397aafd97270272e966e0 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Fri, 14 Apr 2023 11:26:49 -0400
Subject: [PATCH] Add a semantic-symref backend which uses
 xref-matches-in-files

When project-files is available, this is a much more efficient
fallback than the current grep fallback.  Ultimately, this is
motivated by making xref-find-references faster by default even in the
absence of an index.

* lisp/cedet/semantic/symref/project.el:
Add.
* lisp/cedet/semantic/symref.el (semantic-symref-tool-alist):
Add project tool
---
 lisp/cedet/semantic/symref.el         |  2 +
 lisp/cedet/semantic/symref/project.el | 73 +++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)
 create mode 100644 lisp/cedet/semantic/symref/project.el

diff --git a/lisp/cedet/semantic/symref.el b/lisp/cedet/semantic/symref.el
index 1ebd7ea154b..7dfe892b7e8 100644
--- a/lisp/cedet/semantic/symref.el
+++ b/lisp/cedet/semantic/symref.el
@@ -93,6 +93,8 @@ semantic-symref-tool-alist
        idutils)
      ( (lambda (rootdir) (file-exists-p (expand-file-name "cscope.out" rootdir))) .
        cscope )
+     ( (lambda (rootdir) (project-current nil rootdir)) .
+       project)
     )
   "Alist of tools usable by `semantic-symref'.
 Each entry is of the form:
diff --git a/lisp/cedet/semantic/symref/project.el b/lisp/cedet/semantic/symref/project.el
new file mode 100644
index 00000000000..e822e7a2ba3
--- /dev/null
+++ b/lisp/cedet/semantic/symref/project.el
@@ -0,0 +1,73 @@
+;;; semantic/symref/project.el --- Symref implementation using project and xref  -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2008-2023 Free Software Foundation, Inc.
+
+;; Author: Spencer Baugh <sbaugh@janestreet.com>
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+;;
+;; Implement the symref tool API using project-files and
+;; xref-matches-in-files, which in turn use grep or more efficient
+;; tools if available.
+;;
+;; This is basically a replacement for the symref GREP tool, as a new
+;; lowest-common-denominator which works without indices or
+;; project-specific configuration.  It has better performance than the
+;; GREP tool because project-files provides a narrower set of files to
+;; search, and xref-matches-in-files is highly efficient.
+
+(require 'semantic/symref)
+(require 'project)
+(require 'xref)
+
+;;; Code:
+
+;;;###autoload
+(defclass semantic-symref-tool-project (semantic-symref-tool-baseclass) ()
+  "A symref tool implementation using project.el.
+This uses `xref-matches-in-files' over `project-files'")
+
+(cl-defmethod semantic-symref-perform-search ((tool semantic-symref-tool-project))
+  (pcase-let
+      (((eieio
+         searchfor
+         (searchtype (or 'symbol 'regexp))
+         ;; for now, we only really support being called by
+         ;; xref-backend-references, and this is what it passes.
+         (resulttype 'line-and-text))
+        tool))
+    (mapcar
+     (pcase-lambda
+       ((cl-struct xref-match-item
+                  summary (location
+                           (cl-struct xref-file-location file line))))
+       (list line file summary))
+     (xref-matches-in-files searchfor (project-files (project-current))))))
+
+(add-to-list 'semantic-symref-tool-alist
+             '((lambda (rootdir) (project-current nil rootdir))
+               . project))
+
+(provide 'semantic/symref/project)
+
+;; Local variables:
+;; generated-autoload-file: "../loaddefs.el"
+;; generated-autoload-load-name: "semantic/symref/project"
+;; End:
+
+;;; semantic/symref/project.el ends here
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
  2023-04-14 15:37 bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files Spencer Baugh
@ 2023-04-14 22:38 ` Dmitry Gutov
  2023-04-15  6:50   ` Eli Zaretskii
  2023-04-15 21:56   ` sbaugh
  0 siblings, 2 replies; 7+ messages in thread
From: Dmitry Gutov @ 2023-04-14 22:38 UTC (permalink / raw)
  To: Spencer Baugh, 62837

Hi!

On 14/04/2023 18:37, Spencer Baugh wrote:
> When project-files is available, this is a much more efficient
> fallback than the current grep fallback.  Ultimately, this is
> motivated by making xref-find-references faster by default even in the
> absence of an index.

It's a clever enough idea, but unfortunately it doesn't look like the 
performance is always improved by this change.

E.g. I have this checkout of gecko-dev (a big project, just for testing: 
https://github.com/mozilla/gecko-dev) which contains different types of 
files: cpp, js, py.

If I do an xref-find-references search with the current code, it 
finishes in around ~0.8s. 'find' is not that slow, actually:

   time find . -type f -name "*.cpp" >/dev/null

reports just 400 ms here.

Whereas with your patch the search, depending on the language (cpp -- 
more files, py -- less files) can take 3 seconds and more.

Why? First of all, project-files returns all files (which are then all 
searched), whereas semantic-symref-filepattern-alist contains a mapping 
from modes to file globs, limiting both the scan and subsequent search 
to those.

Second -- using project-files means we're forced to round-trip the list 
of files names from the first project's stdout, to buffer, then to a 
list of Lisp strings, and then back to another buffer, to use as stdin. 
I have a couple of things planner in the medium term to improve that, 
but some overhead is probably unavoidable (unless we get some new 
primitive that would allow "piping" between process buffers).

Perhaps you could describe your case where you *did* see a significant 
improvement from this patch, and we can discuss the best steps to 
address that.

BTW, at first I figured you're using MacOS (which historically has 
bundled outdated versions of find and grep, with worse performance). But 
apparently not?





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
  2023-04-14 22:38 ` Dmitry Gutov
@ 2023-04-15  6:50   ` Eli Zaretskii
  2023-04-15 12:37     ` Dmitry Gutov
  2023-04-15 21:56   ` sbaugh
  1 sibling, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-04-15  6:50 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: sbaugh, 62837

> Date: Sat, 15 Apr 2023 01:38:18 +0300
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> On 14/04/2023 18:37, Spencer Baugh wrote:
> > When project-files is available, this is a much more efficient
> > fallback than the current grep fallback.  Ultimately, this is
> > motivated by making xref-find-references faster by default even in the
> > absence of an index.
> 
> It's a clever enough idea, but unfortunately it doesn't look like the 
> performance is always improved by this change.

Maybe we could offer that as optional behavior, turned on by some user
option?  Then people who do experience performance boost could use it.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
  2023-04-15  6:50   ` Eli Zaretskii
@ 2023-04-15 12:37     ` Dmitry Gutov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Gutov @ 2023-04-15 12:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: sbaugh, 62837

On 15/04/2023 09:50, Eli Zaretskii wrote:
>> Date: Sat, 15 Apr 2023 01:38:18 +0300
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>> On 14/04/2023 18:37, Spencer Baugh wrote:
>>> When project-files is available, this is a much more efficient
>>> fallback than the current grep fallback.  Ultimately, this is
>>> motivated by making xref-find-references faster by default even in the
>>> absence of an index.
>> It's a clever enough idea, but unfortunately it doesn't look like the
>> performance is always improved by this change.
> Maybe we could offer that as optional behavior, turned on by some user
> option?  Then people who do experience performance boost could use it.

Sure. That's also possible. But I'd like more info anyway, for example, 
to be able to make the choice about which value of said option should be 
the default.

Or if the scenario with the improvement turns out to be a rare one, 
concentrate on what project.el needs to provide to make it better.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
  2023-04-14 22:38 ` Dmitry Gutov
  2023-04-15  6:50   ` Eli Zaretskii
@ 2023-04-15 21:56   ` sbaugh
  2023-04-19  1:10     ` Dmitry Gutov
  1 sibling, 1 reply; 7+ messages in thread
From: sbaugh @ 2023-04-15 21:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Spencer Baugh, 62837

Dmitry Gutov <dgutov@yandex.ru> writes:
> Hi!
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
>> When project-files is available, this is a much more efficient
>> fallback than the current grep fallback.  Ultimately, this is
>> motivated by making xref-find-references faster by default even in the
>> absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
>
> E.g. I have this checkout of gecko-dev (a big project, just for
> testing: https://github.com/mozilla/gecko-dev) which contains
> different types of files: cpp, js, py.
>
> If I do an xref-find-references search with the current code, it
> finishes in around ~0.8s. 'find' is not that slow, actually:
>
>   time find . -type f -name "*.cpp" >/dev/null
>
> reports just 400 ms here.
>
> Whereas with your patch the search, depending on the language (cpp --
> more files, py -- less files) can take 3 seconds and more.
>
> Why? First of all, project-files returns all files (which are then all
> searched), whereas semantic-symref-filepattern-alist contains a
> mapping from modes to file globs, limiting both the scan and
> subsequent search to those.
>
> Second -- using project-files means we're forced to round-trip the
> list of files names from the first project's stdout, to buffer, then
> to a list of Lisp strings, and then back to another buffer, to use as
> stdin. I have a couple of things planner in the medium term to improve
> that, but some overhead is probably unavoidable (unless we get some
> new primitive that would allow "piping" between process buffers).

Yes, this is a very good point.

> Perhaps you could describe your case where you *did* see a significant
> improvement from this patch, and we can discuss the best steps to
> address that.

In short: I have a project.el backend for a large monorepo which has a
project-files backend which returns only the subset of files which are
relevant to work happening in a given clone.  (Generally a user will
have many clones and be doing different work in each one.)  The
relevant-files subset is determined by integration with the build
system.

So running find returns a vast number of files and then searches over
those, whereas running a search over project-files searches a much
smaller number of files.

Regarding your medium-term plans to improve project-files performance -
wildly guessing, but perhaps you have in mind a way to run a subprocess
that outputs the project-files list?  Let's call it
"project-files-process".  And then project-files-process could be piped
to grep instead, for maximum efficiency?  If that was the idea, then my
own backend could certainly have a project-files-process implementation
too, for maximum efficiency.

> BTW, at first I figured you're using MacOS (which historically has
> bundled outdated versions of find and grep, with worse
> performance). But apparently not?

Nope, Linux.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
  2023-04-15 21:56   ` sbaugh
@ 2023-04-19  1:10     ` Dmitry Gutov
  2023-04-19  1:26       ` Spencer Baugh
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Gutov @ 2023-04-19  1:10 UTC (permalink / raw)
  To: sbaugh; +Cc: Spencer Baugh, 62837

On 16/04/2023 00:56, sbaugh@catern.com wrote:

>> Perhaps you could describe your case where you *did* see a significant
>> improvement from this patch, and we can discuss the best steps to
>> address that.
> 
> In short: I have a project.el backend for a large monorepo which has a
> project-files backend which returns only the subset of files which are
> relevant to work happening in a given clone.  (Generally a user will
> have many clones and be doing different work in each one.)  The
> relevant-files subset is determined by integration with the build
> system.
> 
> So running find returns a vast number of files and then searches over
> those, whereas running a search over project-files searches a much
> smaller number of files.

Neat.

> Regarding your medium-term plans to improve project-files performance -
> wildly guessing, but perhaps you have in mind a way to run a subprocess
> that outputs the project-files list?  Let's call it
> "project-files-process".  And then project-files-process could be piped
> to grep instead, for maximum efficiency?  If that was the idea, then my
> own backend could certainly have a project-files-process implementation
> too, for maximum efficiency.

That might be step number 3, although I'm not sure yet which kind of 
code will be required for the piping to be done efficiently enough.

The other two things I was looking at are:

- Use relative file names (less text to parse, memory to allocate, GC to 
thrash). The awkward part is how to merge that with the idea that 
project-files can include files from directories ("external roots"). 
Split those off into a different method? Treat them as separate projects 
to flat-map the lists of files at?

- Add arguments to allow filtering the files using the underlying tool. 
That can also result is much fewer files to parse in the output under 
suitable circumstances (e.g. we'd be able to pass a list of globs here).

There is one implementation of the second item in the branch 
scratch/etags-regen.

And both items need to be done carefully enough to maintain some 
backward compatibility.

So unless you're in a hurry, give me a few weeks to get around to this.

Further suggestions and patches are welcome, of course.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
  2023-04-19  1:10     ` Dmitry Gutov
@ 2023-04-19  1:26       ` Spencer Baugh
  0 siblings, 0 replies; 7+ messages in thread
From: Spencer Baugh @ 2023-04-19  1:26 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: sbaugh, 62837

Dmitry Gutov <dgutov@yandex.ru> writes:
> On 16/04/2023 00:56, sbaugh@catern.com wrote:
>
>>> Perhaps you could describe your case where you *did* see a significant
>>> improvement from this patch, and we can discuss the best steps to
>>> address that.
>> In short: I have a project.el backend for a large monorepo which has
>> a
>> project-files backend which returns only the subset of files which are
>> relevant to work happening in a given clone.  (Generally a user will
>> have many clones and be doing different work in each one.)  The
>> relevant-files subset is determined by integration with the build
>> system.
>> So running find returns a vast number of files and then searches
>> over
>> those, whereas running a search over project-files searches a much
>> smaller number of files.
>
> Neat.
>
>> Regarding your medium-term plans to improve project-files performance -
>> wildly guessing, but perhaps you have in mind a way to run a subprocess
>> that outputs the project-files list?  Let's call it
>> "project-files-process".  And then project-files-process could be piped
>> to grep instead, for maximum efficiency?  If that was the idea, then my
>> own backend could certainly have a project-files-process implementation
>> too, for maximum efficiency.
>
> That might be step number 3, although I'm not sure yet which kind of
> code will be required for the piping to be done efficiently enough.
>
> The other two things I was looking at are:
>
> - Use relative file names (less text to parse, memory to allocate, GC
>   to thrash). The awkward part is how to merge that with the idea that
>   project-files can include files from directories ("external
>   roots"). Split those off into a different method? Treat them as
>   separate projects to flat-map the lists of files at?
>
> - Add arguments to allow filtering the files using the underlying
>   tool. That can also result is much fewer files to parse in the
>   output under suitable circumstances (e.g. we'd be able to pass a
>  list of globs here).
>
> There is one implementation of the second item in the branch
> scratch/etags-regen.
>
> And both items need to be done carefully enough to maintain some
> backward compatibility.
>
> So unless you're in a hurry, give me a few weeks to get around to this.
>
> Further suggestions and patches are welcome, of course.

I'm in no hurry.  I will probably add this backend locally at my site in
the meantime.  We have no existing (non-trivial) xref-find-references
backend, so speeding this one up isn't too urgent (it's not competing
with anything), but definitely I am interested in project-files (and
project.el in general) speed improvements and will try to help out as it
becomes relevant.





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-19  1:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-14 15:37 bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files Spencer Baugh
2023-04-14 22:38 ` Dmitry Gutov
2023-04-15  6:50   ` Eli Zaretskii
2023-04-15 12:37     ` Dmitry Gutov
2023-04-15 21:56   ` sbaugh
2023-04-19  1:10     ` Dmitry Gutov
2023-04-19  1:26       ` Spencer Baugh

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).