* Re: master 544db1e: Faster grep pattern for identifiers
@ 2021-09-15 15:56 Eli Zaretskii
2021-09-15 16:25 ` Dmitry Gutov
2021-09-15 16:29 ` Mattias Engdegård
0 siblings, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-15 15:56 UTC (permalink / raw)
To: Mattias Engdegård; +Cc: emacs-devel
> branch: master
> commit 544db1ee8679eec9edd5cee81a340ee1c4d70158
> Author: Mattias Engdegård <mattiase@acm.org>
>
> Faster grep pattern for identifiers
>
> * lisp/cedet/semantic/symref/grep.el (semantic-symref-perform-search):
> Use the `-w` flag instead of wrapping the pattern in regexps that make
> matching much slower. This speeds up `xref-find-references` by about
> 3× on macOS.
Doesn't this change the semantics of the "word"? The Grep notion of
the word is not necessarily identical to that of Emacs, since the
latter depends on the major mode. The comment in the deleted code
says that much, AFAICT. Or what am I missing?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: master 544db1e: Faster grep pattern for identifiers
2021-09-15 15:56 master 544db1e: Faster grep pattern for identifiers Eli Zaretskii
@ 2021-09-15 16:25 ` Dmitry Gutov
2021-09-15 16:33 ` Eli Zaretskii
2021-09-15 16:29 ` Mattias Engdegård
1 sibling, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2021-09-15 16:25 UTC (permalink / raw)
To: Eli Zaretskii, Mattias Engdegård; +Cc: emacs-devel
On 15.09.2021 18:56, Eli Zaretskii wrote:
>> branch: master
>> commit 544db1ee8679eec9edd5cee81a340ee1c4d70158
>> Author: Mattias Engdegård<mattiase@acm.org>
>>
>> Faster grep pattern for identifiers
>>
>> * lisp/cedet/semantic/symref/grep.el (semantic-symref-perform-search):
>> Use the `-w` flag instead of wrapping the pattern in regexps that make
>> matching much slower. This speeds up `xref-find-references` by about
>> 3× on macOS.
> Doesn't this change the semantics of the "word"? The Grep notion of
> the word is not necessarily identical to that of Emacs, since the
> latter depends on the major mode. The comment in the deleted code
> says that much, AFAICT. Or what am I missing?
Luckily, -w actually corresponds to the regexp which the previous
version of the code was using. Rather than to \<...\> which one might
surmise from reading the docs for some versions of Grep (or Ripgrep).
And the comment was about \< and \>.
The latest Grep manual describes it correctly:
-w, --word-regexp
Select only those lines containing matches that
form whole words. The test is that the matching substring must either
be at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at
the end of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: master 544db1e: Faster grep pattern for identifiers
2021-09-15 16:25 ` Dmitry Gutov
@ 2021-09-15 16:33 ` Eli Zaretskii
2021-09-15 18:06 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-15 16:33 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: mattiase, emacs-devel
> Cc: emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 15 Sep 2021 19:25:25 +0300
>
> The latest Grep manual describes it correctly:
>
> -w, --word-regexp
> Select only those lines containing matches that
> form whole words. The test is that the matching substring must either
> be at the beginning of the line, or preceded by a non-word
> constituent character. Similarly, it must be either at
> the end of the line or followed by a non-word constituent character.
> Word-constituent characters are letters, digits, and the
> underscore.
That's only GNU Grep, no?
And what about the "alternative Grep's"?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: master 544db1e: Faster grep pattern for identifiers
2021-09-15 16:33 ` Eli Zaretskii
@ 2021-09-15 18:06 ` Dmitry Gutov
2021-09-15 18:14 ` Eli Zaretskii
0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2021-09-15 18:06 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mattiase, emacs-devel
On 15.09.2021 19:33, Eli Zaretskii wrote:
> And what about the "alternative Grep's"?
The author of the commit uses one such Grep.
I also tested with ripgrep, to similar success.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: master 544db1e: Faster grep pattern for identifiers
2021-09-15 18:06 ` Dmitry Gutov
@ 2021-09-15 18:14 ` Eli Zaretskii
2021-09-15 18:39 ` Dmitry Gutov
2021-09-16 7:28 ` master 544db1e: Faster grep pattern for identifiers Omar Polo
0 siblings, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-15 18:14 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: mattiase, emacs-devel
> Cc: mattiase@acm.org, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 15 Sep 2021 21:06:09 +0300
>
> On 15.09.2021 19:33, Eli Zaretskii wrote:
> > And what about the "alternative Grep's"?
>
> The author of the commit uses one such Grep.
>
> I also tested with ripgrep, to similar success.
So they all have, miraculously, the same notion of what is a word,
regardless of the programming language and the script/character set?
Amazing.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: master 544db1e: Faster grep pattern for identifiers
2021-09-15 18:14 ` Eli Zaretskii
@ 2021-09-15 18:39 ` Dmitry Gutov
2021-09-17 16:07 ` bug#49836: Support ripgrep in semantic-symref-tool-grep Juri Linkov
2021-09-16 7:28 ` master 544db1e: Faster grep pattern for identifiers Omar Polo
1 sibling, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2021-09-15 18:39 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mattiase, emacs-devel
On 15.09.2021 21:14, Eli Zaretskii wrote:
>> Cc:mattiase@acm.org,emacs-devel@gnu.org
>> From: Dmitry Gutov<dgutov@yandex.ru>
>> Date: Wed, 15 Sep 2021 21:06:09 +0300
>>
>> On 15.09.2021 19:33, Eli Zaretskii wrote:
>>> And what about the "alternative Grep's"?
>> The author of the commit uses one such Grep.
>>
>> I also tested with ripgrep, to similar success.
> So they all have, miraculously, the same notion of what is a word,
> regardless of the programming language and the script/character set?
> Amazing.
Not exactly (e.g. Grep includes international chars in the "word" set,
and Ripgrep does not), but the notions of "not word" are compatible
enough for our purposes.
Speaking of Ripgrep, the compatible behavior of -w is only with recent
versions (reported and fixed in
https://github.com/BurntSushi/ripgrep/issues/389), starting with 0.10.0.
Debian 10 and Fedora 31 include that versions or newer
(https://repology.org/project/ripgrep/versions).
Not that it's really important: we don't support Ripgrep officially.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: master 544db1e: Faster grep pattern for identifiers
2021-09-15 18:14 ` Eli Zaretskii
2021-09-15 18:39 ` Dmitry Gutov
@ 2021-09-16 7:28 ` Omar Polo
1 sibling, 0 replies; 29+ messages in thread
From: Omar Polo @ 2021-09-16 7:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mattiase, emacs-devel, Dmitry Gutov
Eli Zaretskii <eliz@gnu.org> writes:
>> Cc: mattiase@acm.org, emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Wed, 15 Sep 2021 21:06:09 +0300
>>
>> On 15.09.2021 19:33, Eli Zaretskii wrote:
>> > And what about the "alternative Grep's"?
>>
>> The author of the commit uses one such Grep.
>>
>> I also tested with ripgrep, to similar success.
>
> So they all have, miraculously, the same notion of what is a word,
> regardless of the programming language and the script/character set?
> Amazing.
It seems to be available on OpenBSD' grep
> -w The expression is searched for as a word (as if surrounded by
> ‘[[:<:]]’ and ‘[[:>:]]’; see re_format(7)).
it also seems to be available on NetBSD and FreeBSD judging from the
manpages.
don't know about other grep implementations and honestly haven't tested
the code (yet).
Cheers,
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: master 544db1e: Faster grep pattern for identifiers
2021-09-15 15:56 master 544db1e: Faster grep pattern for identifiers Eli Zaretskii
2021-09-15 16:25 ` Dmitry Gutov
@ 2021-09-15 16:29 ` Mattias Engdegård
1 sibling, 0 replies; 29+ messages in thread
From: Mattias Engdegård @ 2021-09-15 16:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Emacs developers
15 sep. 2021 kl. 17.56 skrev Eli Zaretskii <eliz@gnu.org>:
> Doesn't this change the semantics of the "word"? The Grep notion of
> the word is not necessarily identical to that of Emacs, since the
> latter depends on the major mode. The comment in the deleted code
> says that much, AFAICT. Or what am I missing?
Sorry, I should have written a more descriptive commit message.
First of all, there is no risk for false positives because the grep output is filtered for occurrence of the sought identifier in post-processing. Thus, the only correctness risk is for false negatives.
The effect of -w is to reject matches with a word char immediately before or after a match. This is exactly what the previous glued-on regexps did.
Both the old and new approaches are sound with respect to the programming languages they are used for, because what grep considers to be word chars are alphanumeric characters (as determined by the locale) and underline. Thus, a false negative would require an identifier to occur immediately before or after such a character, and the lexical rules for supported languages don't allow that.
There could be exceptions. For example, ancient Smalltalk used _ as assignment operator because Xerox's character set was based on the 1963 ASCII draft where that code was used for a left-pointing arrow. That wouldn't work with our scheme, now or before.
One might wonder why we use -w at all given the post-processing. It reduces the grep output so that the post-processor isn't overwhelmed by false positives: consider a search for the identifier `i`. That said, -w has a nonzero cost, so omitting it for searches of identifiers above a certain length is likely to be advantageous, especially when the grep tool is slow. We haven't doe that at this time.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
@ 2021-08-02 21:05 Juri Linkov
2021-08-03 8:10 ` Juri Linkov
2021-09-18 13:53 ` Mattias Engdegård
0 siblings, 2 replies; 29+ messages in thread
From: Juri Linkov @ 2021-08-02 21:05 UTC (permalink / raw)
To: 49836
[-- Attachment #1: Type: text/plain, Size: 3642 bytes --]
Creating a separate bug report from bug#49731 because this is a real problem.
Now grep.el completely supports ripgrep when 'grep-find-template'
is customized to a command line that uses 'rg' such as e.g.
"find <D> <X> -type f <F> -print0 | sort -z | xargs -0 -e
rg <C> -nH --no-heading -j8 --sort path -M 200 --max-columns-preview -e <R>"
But such grep setting breaks the command 'xref-find-references':
>>>> 1. while xref-find-references works fine in `emacs -Q`,
>>>> I don't know why with my customization typing e.g.
>>>> 'M-? isearch-lazy-highlight RET' reports
>>>> "No references found for: isearch-lazy-highlight".
>>>
>>> Try and see which of the "tools" semantic-symref-perform-search ends
>>> up using.
>>
>> Thanks for the pointers to semantic-symref-perform-search.
>> It prepends "-n " to my customized pattern "rg -nH",
>> so the arg "-n" is duplicated on the command line:
>> `rg -n -nH`
>> and signals the error:
>> error: The argument '--line-number' was provided more than once, but
>> cannot be used multiple times
>> This error is caused by the bug in the command line parser used by
>> ripgrep:
>> https://github.com/clap-rs/clap/issues/2171
>> that was fixed only 6 months ago, so it will take much time
>> before this fix will reach ripgrep, and this bug will be closed:
>> https://github.com/BurntSushi/ripgrep/issues/1701
>
> The above might be worked around with creating a symref-grep specific user
> option for grep-find-template which would default to the "global" value of
> that variable.
Maybe like the existing option 'semantic-symref-grep-shell', e.g.:
(defcustom semantic-symref-grep-program 'grep
"The program to use for regexp search inside files."
:type `(choice
(const :tag "Use Grep" grep)
(const :tag "Use ripgrep" ripgrep)
(symbol :tag "User defined"))
:version "28.1")
But the problem is that for users it's hard to see the connection
between the broken 'xref-find-references' and the need to customize an option
with unrelated name 'semantic-symref-grep'.
>> But even without duplicated "-n" semantic-symref-perform-search
>> doesn't work with ripgrep because it doesn't find such pattern:
>> \\\\\\(\\^\\\\\\|\\\\W\\\\\\)isearch-lazy-highlight\\\\\\(\\\\W\\\\\\|\\$\\\\\\)
>> Maybe semantic-symref-perform-search could be improved to
>> support ripgrep?
>> Because without these two problems it works fine with ripgrep.
>
> ...but the above tells us (I think) that semantic-symref-perform-search is
> trying to use the basic regexp syntax, and ripgrep doesn't support that
> (only Extended, or PCRE).
>
> For your personal consumption, perhaps the best approach is to create
> a separate "tool", like Grep (by copying symref/grep.el and tweaking some
> of its definitions), and then register it in semantic-symref-tool-alist.
>
> I don't know if ripgrep is that much faster for this particular purpose. So
> maybe it's too much work for little benefit.
A more general solution would be to add to grep.el the same options
that you added to xref:
xref-search-program grep/ripgrep
xref-search-program-alist
'((grep . "xargs -0 grep <C> -snHE -e <R>")
(ripgrep . "xargs -0 rg <C> -nH --no-messages -g '!*/' -e <R> | sort -t: -k1,1 -k2n,2"))
This means to turn the existing variable 'grep-program' into the user option
as the following patch does.
Also later grep.el could use the value "rg" of 'grep-program'
to create the corresponding grep-find-template in grep-compute-defaults.
But I don't know if it's ok to mention rigrep in grep.el?
Anyway, here is the patch that fixes 'xref-find-references':
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: grep-program.patch --]
[-- Type: text/x-diff, Size: 2029 bytes --]
diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index 8f0a5acf70..aba4d59371 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -484,9 +484,13 @@ grep-mode-font-lock-keywords
This gets tacked on the end of the generated expressions.")
;;;###autoload
-(defvar grep-program (purecopy "grep")
+(defcustom grep-program (purecopy "grep")
"The default grep program for `grep-command' and `grep-find-command'.
-This variable's value takes effect when `grep-compute-defaults' is called.")
+This variable's value takes effect when `grep-compute-defaults' is called."
+ :type `(choice
+ (const :tag "Use Grep" "grep")
+ (string :tag "User defined"))
+ :version "28.1")
;;;###autoload
(defvar find-program (purecopy "find")
diff --git a/lisp/cedet/semantic/symref/grep.el b/lisp/cedet/semantic/symref/grep.el
index 180d779a78..e13c21bc07 100644
--- a/lisp/cedet/semantic/symref/grep.el
+++ b/lisp/cedet/semantic/symref/grep.el
@@ -150,15 +150,17 @@ semantic-symref-perform-search
"-l ")
((eq (oref tool searchtype) 'regexp)
"-nE ")
- (t "-n ")))
+ (t (if (equal grep-program "rg") "" "-n "))))
(greppat (cond ((eq (oref tool searchtype) 'regexp)
(oref tool searchfor))
(t
;; Can't use the word boundaries: Grep
;; doesn't always agree with the language
;; syntax on those.
- (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
- (oref tool searchfor)))))
+ (if (equal grep-program "rg")
+ (oref tool searchfor)
+ (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
+ (oref tool searchfor))))))
;; Misc
(b (get-buffer-create "*Semantic SymRef*"))
(ans nil)
^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-08-02 21:05 bug#49836: Support ripgrep in semantic-symref-tool-grep Juri Linkov
@ 2021-08-03 8:10 ` Juri Linkov
2021-08-04 3:14 ` Dmitry Gutov
2021-09-18 13:53 ` Mattias Engdegård
1 sibling, 1 reply; 29+ messages in thread
From: Juri Linkov @ 2021-08-03 8:10 UTC (permalink / raw)
To: 49836
[-- Attachment #1: Type: text/plain, Size: 395 bytes --]
tags 49836 + patch
quit
> This means to turn the existing variable 'grep-program' into the user option
> as the following patch does.
>
> Also later grep.el could use the value "rg" of 'grep-program'
> to create the corresponding grep-find-template in grep-compute-defaults.
>
> But I don't know if it's ok to mention rigrep in grep.el?
So here is the complete support for rigrep in grep.el:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: grep-program-ripgrep.patch --]
[-- Type: text/x-diff, Size: 5196 bytes --]
diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 7453dbed99..c4adee609c 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -1544,7 +1544,7 @@ xref-search-program
"The program to use for regexp search inside files.
This must reference a corresponding entry in `xref-search-program-alist'."
- :type `(choice
+ :type '(choice
(const :tag "Use Grep" grep)
(const :tag "Use ripgrep" ripgrep)
(symbol :tag "User defined"))
diff --git a/lisp/progmodes/grep.el b/lisp/progmodes/grep.el
index 8f0a5acf70..cc3375a284 100644
--- a/lisp/progmodes/grep.el
+++ b/lisp/progmodes/grep.el
@@ -484,9 +484,14 @@ grep-mode-font-lock-keywords
This gets tacked on the end of the generated expressions.")
;;;###autoload
-(defvar grep-program (purecopy "grep")
+(defcustom grep-program (purecopy "grep")
"The default grep program for `grep-command' and `grep-find-command'.
-This variable's value takes effect when `grep-compute-defaults' is called.")
+This variable's value takes effect when `grep-compute-defaults' is called."
+ :type '(choice
+ (const :tag "Use Grep" "grep")
+ (const :tag "Use ripgrep" "rg")
+ (string :tag "User defined"))
+ :version "28.1")
;;;###autoload
(defvar find-program (purecopy "find")
@@ -709,13 +714,14 @@ grep-compute-defaults
(let ((grep-options
(concat (if grep-use-null-device "-n" "-nH")
(if grep-use-null-filename-separator " --null")
+ (if (equal grep-program "rg") " --no-heading")
(when (grep-probe grep-program
`(nil nil nil "-e" "foo" ,(null-device))
nil 1)
" -e"))))
(unless grep-command
(setq grep-command
- (format "%s %s %s " grep-program
+ (format "%s%s %s " grep-program
(or
(and grep-highlight-matches
(grep-probe
@@ -723,7 +729,7 @@ grep-compute-defaults
`(nil nil nil "--color" "x" ,(null-device))
nil 1)
(if (eq grep-highlight-matches 'always)
- "--color=always" "--color=auto"))
+ " --color=always" " --color=auto"))
"")
grep-options)))
(unless grep-template
@@ -983,6 +989,8 @@ grep-expand-template
(push "--color=always" opts))
((eq grep-highlight-matches 'auto)
(push "--color=auto" opts)))
+ (when (equal grep-program "rg")
+ (push "--no-heading" opts))
opts))
(excl . ,excl)
(dir . ,dir)
@@ -1131,7 +1139,7 @@ lgrep
files
nil
(and grep-find-ignored-files
- (concat " --exclude="
+ (concat (if (equal grep-program "rg") " -g=!" " --exclude=")
(mapconcat
(lambda (ignore)
(cond ((stringp ignore)
@@ -1141,7 +1149,7 @@ lgrep
(shell-quote-argument
(cdr ignore))))))
grep-find-ignored-files
- " --exclude=")))
+ (if (equal grep-program "rg") " -g=!" " --exclude="))))
(and (eq grep-use-directories-skip t)
'("--directories=skip"))))
(when command
@@ -1339,7 +1347,8 @@ zrgrep
nil default-directory t))
(confirm (equal current-prefix-arg '(4))))
(list regexp files dir confirm grep-find-template)))))))
- (let ((grep-find-template template)
+ (let ((grep-program "zgrep")
+ (grep-find-template template)
;; Set `grep-highlight-matches' to `always'
;; since `zgrep' puts filters in the grep output.
(grep-highlight-matches 'always))
diff --git a/lisp/cedet/semantic/symref/grep.el b/lisp/cedet/semantic/symref/grep.el
index 180d779a78..034f797076 100644
--- a/lisp/cedet/semantic/symref/grep.el
+++ b/lisp/cedet/semantic/symref/grep.el
@@ -150,15 +150,14 @@ semantic-symref-perform-search
"-l ")
((eq (oref tool searchtype) 'regexp)
"-nE ")
- (t "-n ")))
+ (t (if (equal grep-program "rg") "" "-n "))))
(greppat (cond ((eq (oref tool searchtype) 'regexp)
(oref tool searchfor))
(t
;; Can't use the word boundaries: Grep
;; doesn't always agree with the language
;; syntax on those.
- (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
- (oref tool searchfor)))))
+ (format "\\b%s\\b" (oref tool searchfor)))))
;; Misc
(b (get-buffer-create "*Semantic SymRef*"))
(ans nil)
^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-08-03 8:10 ` Juri Linkov
@ 2021-08-04 3:14 ` Dmitry Gutov
2021-08-04 21:23 ` Juri Linkov
0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2021-08-04 3:14 UTC (permalink / raw)
To: Juri Linkov, 49836
I think we can improve this part:
On 03.08.2021 11:10, Juri Linkov wrote:
> diff --git a/lisp/cedet/semantic/symref/grep.el b/lisp/cedet/semantic/symref/grep.el
> index 180d779a78..034f797076 100644
> --- a/lisp/cedet/semantic/symref/grep.el
> +++ b/lisp/cedet/semantic/symref/grep.el
> @@ -150,15 +150,14 @@ semantic-symref-perform-search
> "-l ")
> ((eq (oref tool searchtype) 'regexp)
> "-nE ")
> - (t "-n ")))
> + (t (if (equal grep-program "rg") "" "-n "))))
It might be cleaner to see whether grep-find-template already includes
that flag, and if so, omit it. Though the search might be non-trivial if
it's in the form like "-abcn", still, that's searchable by regexp.
> (greppat (cond ((eq (oref tool searchtype) 'regexp)
> (oref tool searchfor))
> (t
> ;; Can't use the word boundaries: Grep
> ;; doesn't always agree with the language
> ;; syntax on those.
> - (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
> - (oref tool searchfor)))))
> + (format "\\b%s\\b" (oref tool searchfor)))))
> ;; Misc
> (b (get-buffer-create "*Semantic SymRef*"))
> (ans nil)
I think the original idea (surrounding with \W) is sound: after all, not
every symbol boundary in Emacs sense is a word boundary in Grep or RG.
If a method, say, ends with ?, then it won't be.
The problem with the above regexp is that it uses the basic syntax,
instead of Extended. But we can flip it.
As long as we're able to ask Grep to search with Extended syntax, we can
use (format "(^|\\W)%s(\\W|$)" (oref tool searchfor)). And that can be
achieved with the same method as is used in xref-matches-in-directory:
Something like (replace-regexp-in-string "grep <C>" "grep <C> -E"
grep-find-template t t),
to be sure it's not ripgrep in there.
The new user option can be used too, but I'd probably prefer a more
independent solution here.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-08-04 3:14 ` Dmitry Gutov
@ 2021-08-04 21:23 ` Juri Linkov
2021-08-05 3:03 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Juri Linkov @ 2021-08-04 21:23 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 49836
[-- Attachment #1: Type: text/plain, Size: 2264 bytes --]
>> @@ -150,15 +150,14 @@ semantic-symref-perform-search
>> "-l ")
>> ((eq (oref tool searchtype) 'regexp)
>> "-nE ")
>> - (t "-n ")))
>> + (t (if (equal grep-program "rg") "" "-n "))))
>
> It might be cleaner to see whether grep-find-template already includes that
> flag, and if so, omit it. Though the search might be non-trivial if it's in
> the form like "-abcn", still, that's searchable by regexp.
Indeed, but such a hack is a temporary measure and can be removed later
after ripgrep will be fixed.
>> ;; Can't use the word boundaries: Grep
>> ;; doesn't always agree with the language
>> ;; syntax on those.
>> - (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
>> - (oref tool searchfor)))))
>> + (format "\\b%s\\b" (oref tool searchfor)))))
>
> I think the original idea (surrounding with \W) is sound: after all, not
> every symbol boundary in Emacs sense is a word boundary in Grep or RG. If
> a method, say, ends with ?, then it won't be.
I tried to search for 'soap-type-is-array?' in the Emacs tree,
and ripgrep can find it with "\\b%s\\b", but Grep can't.
> The problem with the above regexp is that it uses the basic syntax, instead
> of Extended. But we can flip it.
>
> As long as we're able to ask Grep to search with Extended syntax, we can
> use (format "(^|\\W)%s(\\W|$)" (oref tool searchfor)). And that can be
> achieved with the same method as is used in xref-matches-in-directory:
>
> Something like (replace-regexp-in-string "grep <C>" "grep <C> -E"
> grep-find-template t t), to be sure it's not ripgrep in there.
>
> The new user option can be used too, but I'd probably prefer a more
> independent solution here.
It would be more preferable not to change the existing default logic
to avoid possible troubles. Since Grep with Basic syntax works fine,
then better not to switch to Extended syntax.
The new user option is already used in many places in grep.el
in the previous patch, so it should be ok to use it in semantic-symref
as well:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: semantic-symref-perform-search-rg.patch --]
[-- Type: text/x-diff, Size: 1573 bytes --]
diff --git a/lisp/cedet/semantic/symref/grep.el b/lisp/cedet/semantic/symref/grep.el
index 180d779a78..b7d08409aa 100644
--- a/lisp/cedet/semantic/symref/grep.el
+++ b/lisp/cedet/semantic/symref/grep.el
@@ -150,15 +150,22 @@ semantic-symref-perform-search
"-l ")
((eq (oref tool searchtype) 'regexp)
"-nE ")
- (t "-n ")))
+ (t (if (equal grep-program "rg")
+ ;; TODO: remove this after ripgrep is fixed (bug#49836)
+ (unless (string-search "rg <C> -nH" grep-find-template)
+ "-n ")
+ "-n "))))
(greppat (cond ((eq (oref tool searchtype) 'regexp)
(oref tool searchfor))
(t
;; Can't use the word boundaries: Grep
;; doesn't always agree with the language
;; syntax on those.
- (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
- (oref tool searchfor)))))
+ (if (equal grep-program "rg")
+ (format "(^|\\W)%s(\\W|$)"
+ (oref tool searchfor))
+ (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
+ (oref tool searchfor))))))
;; Misc
(b (get-buffer-create "*Semantic SymRef*"))
(ans nil)
^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-08-04 21:23 ` Juri Linkov
@ 2021-08-05 3:03 ` Dmitry Gutov
2021-08-06 0:35 ` Juri Linkov
0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2021-08-05 3:03 UTC (permalink / raw)
To: Juri Linkov; +Cc: 49836
On 05.08.2021 00:23, Juri Linkov wrote:
>> I think the original idea (surrounding with \W) is sound: after all, not
>> every symbol boundary in Emacs sense is a word boundary in Grep or RG. If
>> a method, say, ends with ?, then it won't be.
> I tried to search for 'soap-type-is-array?' in the Emacs tree,
> and ripgrep can find it with "\\b%s\\b", but Grep can't.
Did you search through symref, or in console? If the former, it seems
some regexp-quoting is missing somewhere (the question mark was no
escaped). Because see what the console says:
$ rg "\bsoap-type-is-array?\b"
lisp/net/soap-client.el
950:(defun soap-type-is-array? (type)
990: (if (soap-type-is-array? type)
ChangeLog.2
19080: * lisp/net/soap-client.el (soap-type-is-array?): new defun
$ rg "\bsoap-type-is-array\?\b"
^^ no matches
And
$ rg "\bsoap-type-is-array\?"
has matches, of course.
> It would be more preferable not to change the existing default logic
> to avoid possible troubles. Since Grep with Basic syntax works fine,
> then better not to switch to Extended syntax.
See above. But also consider what happens if a user sees that
grep-program is now customizable and ripgrep is an officially supported
value. They change it to "rg", and then suddenly their 'M-x rgrep' input
has to use the extended regexp format?
Worse than that, any third-party package that uses grep-find-template
will suddenly have a high chance of failing if they pass any nontrivial
regexps to it, especially if those have groupings or alternations.
It's a hard problem: grep.el is not prepared for abstracting like that.
If we at least standardized it internally on Extended format, that would
at least remove one source of uncertainty and bugs. The user still can
input basic regexps interactively, we can translate them easily.
Further: seeing xref-search-program-alist, people asked for support for
other similar programs, such as ag and pt. Any solution we end up with,
we should try to ensure they are valid values of grep-program as well.
> The new user option is already used in many places in grep.el
> in the previous patch, so it should be ok to use it in semantic-symref
> as well:
>
> diff --git a/lisp/cedet/semantic/symref/grep.el b/lisp/cedet/semantic/symref/grep.el
> index 180d779a78..b7d08409aa 100644
> --- a/lisp/cedet/semantic/symref/grep.el
> +++ b/lisp/cedet/semantic/symref/grep.el
> @@ -150,15 +150,22 @@ semantic-symref-perform-search
> "-l ")
> ((eq (oref tool searchtype) 'regexp)
> "-nE ")
> - (t "-n ")))
> + (t (if (equal grep-program "rg")
> + ;; TODO: remove this after ripgrep is fixed (bug#49836)
> + (unless (string-search "rg <C> -nH" grep-find-template)
> + "-n ")
> + "-n "))))
I'm actually fine with this part.
> (greppat (cond ((eq (oref tool searchtype) 'regexp)
> (oref tool searchfor))
> (t
> ;; Can't use the word boundaries: Grep
> ;; doesn't always agree with the language
> ;; syntax on those.
> - (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
> - (oref tool searchfor)))))
> + (if (equal grep-program "rg")
> + (format "(^|\\W)%s(\\W|$)"
> + (oref tool searchfor))
> + (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
> + (oref tool searchfor))))))
This can work. Except the comparison should be with "grep", I think: all
other alternatives only work with the Extended format.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-08-05 3:03 ` Dmitry Gutov
@ 2021-08-06 0:35 ` Juri Linkov
2021-08-07 14:12 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Juri Linkov @ 2021-08-06 0:35 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 49836
>>> I think the original idea (surrounding with \W) is sound: after all, not
>>> every symbol boundary in Emacs sense is a word boundary in Grep or RG. If
>>> a method, say, ends with ?, then it won't be.
>> I tried to search for 'soap-type-is-array?' in the Emacs tree,
>> and ripgrep can find it with "\\b%s\\b", but Grep can't.
>
> Did you search through symref, or in console? If the former, it seems some
> regexp-quoting is missing somewhere (the question mark was no
> escaped). Because see what the console says:
>
> $ rg "\bsoap-type-is-array?\b"
> lisp/net/soap-client.el
> 950:(defun soap-type-is-array? (type)
> 990: (if (soap-type-is-array? type)
>
> ChangeLog.2
> 19080: * lisp/net/soap-client.el (soap-type-is-array?): new defun
>
> $ rg "\bsoap-type-is-array\?\b"
>
> ^^ no matches
>
> And
>
> $ rg "\bsoap-type-is-array\?"
>
> has matches, of course.
semantic-symref-grep-use-template constructs such command line:
"rg ... -e \\\\bsoap-type-is-array\\?\\\\b"
that finds matches.
>> It would be more preferable not to change the existing default logic
>> to avoid possible troubles. Since Grep with Basic syntax works fine,
>> then better not to switch to Extended syntax.
>
> See above. But also consider what happens if a user sees that grep-program
> is now customizable and ripgrep is an officially supported value. They
> change it to "rg", and then suddenly their 'M-x rgrep' input has to use the
> extended regexp format?
This difference could be explained in the documentation.
> Worse than that, any third-party package that uses grep-find-template will
> suddenly have a high chance of failing if they pass any nontrivial regexps
> to it, especially if those have groupings or alternations.
This already happened after trying to customize grep-find-template
to use rg broke xref-find-references, so the problem already exists
and needs to be solved.
> It's a hard problem: grep.el is not prepared for abstracting like that. If
> we at least standardized it internally on Extended format, that would at
> least remove one source of uncertainty and bugs. The user still can input
> basic regexps interactively, we can translate them easily.
Is there a package that can translate between them reliably?
> Further: seeing xref-search-program-alist, people asked for support for
> other similar programs, such as ag and pt. Any solution we end up with, we
> should try to ensure they are valid values of grep-program as well.
Why not, semantic-symref already supports alternative tools
such as cscope, global, idutils. So xref could support more too.
>> + (if (equal grep-program "rg")
>> + (format "(^|\\W)%s(\\W|$)"
>> + (oref tool searchfor))
>> + (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
>> + (oref tool searchfor))))))
>
> This can work. Except the comparison should be with "grep", I think: all
> other alternatives only work with the Extended format.
I'm worried about the case when the user customizes
'grep-program' to e.g. an absolute path "/bin/grep"
or "/usr/local/bin/grep", etc.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-08-06 0:35 ` Juri Linkov
@ 2021-08-07 14:12 ` Dmitry Gutov
0 siblings, 0 replies; 29+ messages in thread
From: Dmitry Gutov @ 2021-08-07 14:12 UTC (permalink / raw)
To: Juri Linkov; +Cc: 49836
On 06.08.2021 03:35, Juri Linkov wrote:
> semantic-symref-grep-use-template constructs such command line:
>
> "rg ... -e \\\\bsoap-type-is-array\\?\\\\b"
>
> that finds matches.
The correct one will probably look like
"rg ... -e \\\\bsoap-type-is-array\\\\?\\\\b"
(same number of backslashes before '?' as before 'b'), and it won't find
any. The one you mentioned will find false positives.
E.g., try searching for 'file-name-as-directory?'. Or 'carr?'.
>> See above. But also consider what happens if a user sees that grep-program
>> is now customizable and ripgrep is an officially supported value. They
>> change it to "rg", and then suddenly their 'M-x rgrep' input has to use the
>> extended regexp format?
>
> This difference could be explained in the documentation.
If it comes to that, yes, but it's usually better to fix usability
problems that just document them.
>> Worse than that, any third-party package that uses grep-find-template will
>> suddenly have a high chance of failing if they pass any nontrivial regexps
>> to it, especially if those have groupings or alternations.
>
> This already happened after trying to customize grep-find-template
> to use rg broke xref-find-references, so the problem already exists
> and needs to be solved.
The problem exists, and has been for a long time: grep.el doesn't
properly support the "alternative" search programs, which are very
popular now. Its abstraction is leaky and doesn't work with anything but
grep. But I think that means we need a better abstraction.
Let's try to make sure we don't create bigger problems when fixing it.
And "packages stop working when I customize grep-program" sounds worse
than "I can't customize grep-program to 'rg', so my searches are a bit
slower than they could have been".
>> It's a hard problem: grep.el is not prepared for abstracting like that. If
>> we at least standardized it internally on Extended format, that would at
>> least remove one source of uncertainty and bugs. The user still can input
>> basic regexps interactively, we can translate them easily.
>
> Is there a package that can translate between them reliably?
For the limited purpose of symref/grep, we could use
xref--regexp-to-extended. It's already used in xref-matches-in-directory
and xref-matches-in-files. Better name/documentation and tests are pending.
Note that it actually translates from a (subset of) Emacs regexp to
Extended and back (it's reversible). The proper basic regexp syntax
treats '+' and '?' as normal characters unless escaped, but they're
special in Emacs regexps.
The above function is how one can use Emacs syntax (though only limited
a subset, for now) in project-find-regexp.
I also saw some commits to ELPA yesterday, that show that Consult
includes a more advanced version of this feature:
https://git.savannah.gnu.org/cgit/emacs/elpa.git/commit/?h=externals/consult&id=7bd3e44929d44cf0e17f38e943e9be2bd6014237
https://git.savannah.gnu.org/cgit/emacs/elpa.git/commit/?h=externals/consult&id=95dadd98a6a0f08955f67f1e9a7cc312435a86b8
Not sure how mature it is (seems still in development), but perhaps we
could move it to the core sooner or later. And use it instead, if it
does provide any improvement for our use case here.
>> Further: seeing xref-search-program-alist, people asked for support for
>> other similar programs, such as ag and pt. Any solution we end up with, we
>> should try to ensure they are valid values of grep-program as well.
>
> Why not, semantic-symref already supports alternative tools
> such as cscope, global, idutils. So xref could support more too.
It's easy enough for Xref, yes. It only has to support one single,
well-defined scenario.
>>> + (if (equal grep-program "rg")
>>> + (format "(^|\\W)%s(\\W|$)"
>>> + (oref tool searchfor))
>>> + (format "\\(^\\|\\W\\)%s\\(\\W\\|$\\)"
>>> + (oref tool searchfor))))))
>>
>> This can work. Except the comparison should be with "grep", I think: all
>> other alternatives only work with the Extended format.
>
> I'm worried about the case when the user customizes
> 'grep-program' to e.g. an absolute path "/bin/grep"
> or "/usr/local/bin/grep", etc.
(string-match "\\bgrep\\b" grep-program) could take care of this.
To sum up, I'm all for adding some clutches to symref/grep.el, to
support your advanced scenario, right now.
As for having grep-program customizable, perhaps we should add some new
feature/abstraction/package? To avoid breakage, and for it to be opt-in
for any new callers from Lisp.
Or indeed have templates use Extended syntax, and grep-expand-template
translate REGEXP to it. That can cause breakage for existing users,
though, those who already customize grep-find-template, etc, to their
particular programs.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-08-02 21:05 bug#49836: Support ripgrep in semantic-symref-tool-grep Juri Linkov
2021-08-03 8:10 ` Juri Linkov
@ 2021-09-18 13:53 ` Mattias Engdegård
2021-09-18 14:14 ` Eli Zaretskii
2021-09-18 23:53 ` Dmitry Gutov
1 sibling, 2 replies; 29+ messages in thread
From: Mattias Engdegård @ 2021-09-18 13:53 UTC (permalink / raw)
To: Juri Linkov; +Cc: Lars Ingebrigtsen, 49836, Dmitry Gutov
[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]
Attached is a straw-man patch that is not production-quality but illustrates some of the concepts I'd like to see:
* ripgrep supported for M-?
* ripgrep auto-detected and used by default if available
* treat ripgrep as a search method of its own and not just a different grep program
* corollary: selection of search method should be made symbolically and not by supplying shell command strings
Concerns not addressed by the patch:
* unify back-ends and customisation options in xref.el and symref/grep.el
* tramp
* correct way to auto-detect ripgrep -- I have no idea, really, and would gladly settle for something dead simple, perhaps looking at the output of rg --version, instead of the voodoo code in the patch
* other search methods -- for example, it would be interesting to allow use of Spotlight or similar indexed searching tools in some cases.
* speeding up the parts that are not ripgrep. If the actual command (ripgrep or something else) takes zero seconds, what if anything prevents a crisp snappy response from Emacs?
That said, it appears to work. Right now I'm not near my Linux machine and can just compare against the fairly slow BSD grep that comes with macOS, so obviously the speed-up is tremendous.
[-- Attachment #2: 0001-xref-ripgrep-support.patch --]
[-- Type: application/octet-stream, Size: 7394 bytes --]
From 0d3fa647fd6a0c3a27bc5f9905e24103a3d6b617 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Thu, 16 Sep 2021 17:14:33 +0200
Subject: [PATCH] xref ripgrep support
Auto-detect presence of ripgrep and use it if available.
It speeds up `xref-find-references` (M-q), between a little and very much.
---
lisp/cedet/semantic/symref/grep.el | 121 +++++++++++++++++++++--------
1 file changed, 88 insertions(+), 33 deletions(-)
diff --git a/lisp/cedet/semantic/symref/grep.el b/lisp/cedet/semantic/symref/grep.el
index 1e282c3052..106d46dc76 100644
--- a/lisp/cedet/semantic/symref/grep.el
+++ b/lisp/cedet/semantic/symref/grep.el
@@ -79,16 +79,7 @@ semantic-symref-derive-find-filepatterns
;; Only take in simple patterns, so try to convert this one.
(string-match "\\\\\\.\\([^\\'>]+\\)\\\\'" (car X)))
(push (concat "*." (match-string 1 (car X))) pat))))
- ;; Convert the list into some find-flags.
- (if (null pat)
- (error "Customize `semantic-symref-filepattern-alist' for %S"
- major-mode)
- (let ((args `("-name" ,(car pat))))
- (if (null (cdr pat))
- args
- `("(" ,@args
- ,@(mapcan (lambda (s) `("-o" "-name" ,s)) (cdr pat))
- ")"))))))
+ pat))
(defvar semantic-symref-grep-flags)
@@ -133,12 +124,91 @@ semantic-symref-grep-shell
:group 'semantic
:type 'string)
+(defcustom semantic-symref-grep-tool 'auto-detect
+ "The tool to use for searching in files."
+ :type '(choice (const :tag "Auto-detect" auto-detect)
+ (const :tag "Ripgrep" ripgrep)
+ (const :tag "find + grep" find-grep))
+ :group 'semantic)
+
+(defcustom semantic-symref-grep-ripgrep-command "rg"
+ "Name of the ripgrep command to use."
+ :type 'file
+ :group 'semantic)
+
+(defun semantic-symref-grep--auto-detect-tool ()
+ (let ((have-rg
+ (with-temp-buffer
+ (let ((hello-file (grep-hello-file)))
+ (let ((process-file-side-effects nil))
+ (unwind-protect
+ (eql (ignore-errors
+ (process-file
+ semantic-symref-grep-ripgrep-command
+ nil t nil "^Copyright"
+ (file-local-name hello-file)))
+ 0)
+ (when (file-remote-p hello-file)
+ (delete-file hello-file))))))))
+ (if have-rg
+ 'ripgrep
+ 'find-grep)))
+
(defun semantic-symref-grep--quote-extended (string)
"Quote STRING as an extended-syntax regexp."
(replace-regexp-in-string (rx (in ".^$*+?|{}[]()|\\"))
(lambda (s) (concat "\\" s))
string nil t))
+(defun semantic-symref-grep--command (rootdir filepatterns file-name-only
+ pattern-type pattern)
+ "Compute search command to run.
+ROOTDIR is the root of the tree to search. FILEPATTERNS is a
+list of glob patterns that match the files to search.
+If FILE-NAME-ONLY is non-nil, the search only wants the names of files.
+PATTERN-TYPE indicates the type of PATTERN:
+ `regexp' -- PATTERN is an extended (egrep) regexp.
+ `symbol' -- PATTERN is a literal identifier.
+
+Return the command and arguments as a list of strings."
+ (when (eq semantic-symref-grep-tool 'auto-detect)
+ (setq semantic-symref-grep-tool
+ (semantic-symref-grep--auto-detect-tool)))
+ (cond
+ ((eq semantic-symref-grep-tool 'ripgrep)
+ (let ((filepat-args
+ (mapcan (lambda (s) (list "-g" s)) filepatterns))
+ (flags (cond (file-name-only "-l")
+ ((eq pattern-type 'symbol) "-nw")
+ (t "-n")))
+ (pat-arg (if (eq pattern-type 'symbol)
+ (semantic-symref-grep--quote-extended pattern)
+ pattern))
+ (dir (expand-file-name rootdir)))
+ `(,semantic-symref-grep-ripgrep-command
+ ,@filepat-args ,flags "-e" ,pat-arg ,dir)))
+ (t
+ (let* ((filepat-args
+ ;; Convert the list into some find-flags.
+ (let ((args `("-name" ,(car filepatterns))))
+ (if (cdr filepatterns)
+ `("(" ,@args
+ ,@(mapcan (lambda (s) (list "-o" "-name" s))
+ (cdr filepatterns))
+ ")")
+ args)))
+ (filepattern (mapconcat #'shell-quote-argument filepat-args " "))
+ (grepflags (cond (file-name-only "-l ")
+ ((eq pattern-type 'regexp) "-nE ")
+ (t "-nwE ")))
+ (pat-arg (if (eq pattern-type 'symbol)
+ (semantic-symref-grep--quote-extended pattern)
+ pattern))
+ (cmd (semantic-symref-grep-use-template
+ (directory-file-name (file-local-name rootdir))
+ filepattern grepflags pat-arg)))
+ `(,semantic-symref-grep-shell ,shell-command-switch ,cmd)))))
+
(cl-defmethod semantic-symref-perform-search ((tool semantic-symref-tool-grep))
"Perform a search with Grep."
;; Grep doesn't support some types of searches.
@@ -150,33 +220,18 @@ semantic-symref-perform-search
(let* (;; Find the file patterns to use.
(rootdir (semantic-symref-calculate-rootdir))
(filepatterns (semantic-symref-derive-find-filepatterns))
- (filepattern (mapconcat #'shell-quote-argument filepatterns " "))
- ;; Grep based flags.
- (grepflags (cond ((eq (oref tool resulttype) 'file)
- "-l ")
- ((eq (oref tool searchtype) 'regexp)
- "-nE ")
- (t "-nw ")))
- (searchfor (oref tool searchfor))
- (greppat (if (eq (oref tool searchtype) 'regexp)
- searchfor
- (semantic-symref-grep--quote-extended searchfor)))
- ;; Misc
- (b (get-buffer-create "*Semantic SymRef*"))
- (ans nil)
- )
+ (greppat (oref tool searchfor))
+ (file-names-only (eq (oref tool resulttype) 'file))
+ (search-type (oref tool searchtype))
+ (command (semantic-symref-grep--command
+ rootdir filepatterns file-names-only search-type greppat))
+ (b (get-buffer-create "*Semantic SymRef*")))
(with-current-buffer b
(erase-buffer)
(setq default-directory rootdir)
- (let ((cmd (semantic-symref-grep-use-template
- (directory-file-name (file-local-name rootdir))
- filepattern grepflags greppat)))
- (process-file semantic-symref-grep-shell nil b nil
- shell-command-switch cmd)))
- (setq ans (semantic-symref-parse-tool-output tool b))
- ;; Return the answer
- ans))
+ (apply #'process-file (car command) nil b nil (cdr command)))
+ (semantic-symref-parse-tool-output tool b)))
(cl-defmethod semantic-symref-parse-tool-output-one-line ((tool semantic-symref-tool-grep))
"Parse one line of grep output, and return it as a match list.
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-18 13:53 ` Mattias Engdegård
@ 2021-09-18 14:14 ` Eli Zaretskii
2021-09-18 14:18 ` Mattias Engdegård
2021-09-18 23:53 ` Dmitry Gutov
1 sibling, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-18 14:14 UTC (permalink / raw)
To: Mattias Engdegård; +Cc: juri, larsi, 49836, dgutov
> From: Mattias Engdegård <mattiase@acm.org>
> Date: Sat, 18 Sep 2021 15:53:51 +0200
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, 49836@debbugs.gnu.org,
> Dmitry Gutov <dgutov@yandex.ru>
>
> If the actual command (ripgrep or something else) takes zero seconds, what if anything prevents a crisp snappy response from Emacs?
The sub-process communications infrastructure, of course.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-18 14:14 ` Eli Zaretskii
@ 2021-09-18 14:18 ` Mattias Engdegård
2021-09-18 14:25 ` Eli Zaretskii
2021-09-18 21:48 ` Dmitry Gutov
0 siblings, 2 replies; 29+ messages in thread
From: Mattias Engdegård @ 2021-09-18 14:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: juri, larsi, 49836, dgutov
18 sep. 2021 kl. 16.14 skrev Eli Zaretskii <eliz@gnu.org>:
>> If the actual command (ripgrep or something else) takes zero seconds, what if anything prevents a crisp snappy response from Emacs?
>
> The sub-process communications infrastructure, of course.
Yes, very likely, but also post-processing the output from the search process (matching, sorting, etc) and preparing it for display. It would be interesting to see a break-down and to see what if anything can be down to make to go faster.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-18 14:18 ` Mattias Engdegård
@ 2021-09-18 14:25 ` Eli Zaretskii
2021-09-18 21:48 ` Dmitry Gutov
1 sibling, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-18 14:25 UTC (permalink / raw)
To: Mattias Engdegård; +Cc: juri, larsi, 49836, dgutov
> From: Mattias Engdegård <mattiase@acm.org>
> Date: Sat, 18 Sep 2021 16:18:54 +0200
> Cc: juri@linkov.net, larsi@gnus.org, 49836@debbugs.gnu.org, dgutov@yandex.ru
>
> > The sub-process communications infrastructure, of course.
>
> Yes, very likely, but also post-processing the output from the search process (matching, sorting, etc) and preparing it for display. It would be interesting to see a break-down and to see what if anything can be down to make to go faster.
You could build Emacs with profiling and run that, I guess.
And on GNU/Linux, there's Perf.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-18 14:18 ` Mattias Engdegård
2021-09-18 14:25 ` Eli Zaretskii
@ 2021-09-18 21:48 ` Dmitry Gutov
1 sibling, 0 replies; 29+ messages in thread
From: Dmitry Gutov @ 2021-09-18 21:48 UTC (permalink / raw)
To: Mattias Engdegård, Eli Zaretskii; +Cc: larsi, 49836, juri
On 18.09.2021 17:18, Mattias Engdegård wrote:
> 18 sep. 2021 kl. 16.14 skrev Eli Zaretskii<eliz@gnu.org>:
>
>>> If the actual command (ripgrep or something else) takes zero seconds, what if anything prevents a crisp snappy response from Emacs?
>> The sub-process communications infrastructure, of course.
> Yes, very likely, but also post-processing the output from the search process (matching, sorting, etc) and preparing it for display. It would be interesting to see a break-down and to see what if anything can be down to make to go faster.
>
Before you dig in with OS-level debugger, here are things to try to
narrow down the problems:
- Do the search for a common term (many matches).
- Do the search for a rare term (with 1-2 matches).
See how the inside/outside timings compare. I'm guessing the latter case
should be snappy, with a relatively small ratio. Whatever overhead Emacs
has, will probably be on the sub-process infrastructure.
To investigate the former case (many matches), I suggest starting with
'benchmark-progn'.
Wrapping the 'process-file' call will measure the shell invocation and
getting the output into the buffer.
diff --git a/lisp/cedet/semantic/symref/grep.el
b/lisp/cedet/semantic/symref/grep.el
index 53745b429a..7db5e79c91 100644
--- a/lisp/cedet/semantic/symref/grep.el
+++ b/lisp/cedet/semantic/symref/grep.el
@@ -163,8 +163,9 @@ semantic-symref-perform-search
(let ((cmd (semantic-symref-grep-use-template
(directory-file-name (file-local-name rootdir))
filepattern grepflags greppat)))
- (process-file semantic-symref-grep-shell nil b nil
- shell-command-switch cmd)))
+ (benchmark-progn
+ (process-file semantic-symref-grep-shell nil b nil
+ shell-command-switch cmd))))
(setq ans (semantic-symref-parse-tool-output tool b))
;; Return the answer
ans))
Measuring the subsequent semantic-symref-parse-tool-output call can also
show something, but it's usually fast.
Wrapping the xref--convert-hits call alone inside
xref-references-in-directory should be more interesting: its work is
less trivial. I don't know how much further it can be optimized, but
help is welcome, of course.
diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 69cabd0b5a..ab0476b2bb 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -1548,9 +1548,11 @@ xref-references-in-directory
(inst (semantic-symref-instantiate :searchfor symbol
:searchtype 'symbol
:searchscope 'subdirs
- :resulttype 'line-and-text)))
- (xref--convert-hits (semantic-symref-perform-search inst)
- (format "\\_<%s\\_>" (regexp-quote symbol)))))
+ :resulttype 'line-and-text))
+ (search-hits (semantic-symref-perform-search inst)))
+ (benchmark-progn
+ (xref--convert-hits search-hits
+ (format "\\_<%s\\_>" (regexp-quote symbol))))))
(define-obsolete-function-alias
'xref-collect-references
^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-18 13:53 ` Mattias Engdegård
2021-09-18 14:14 ` Eli Zaretskii
@ 2021-09-18 23:53 ` Dmitry Gutov
2021-09-19 0:21 ` Jim Porter
1 sibling, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2021-09-18 23:53 UTC (permalink / raw)
To: Mattias Engdegård, Juri Linkov; +Cc: Lars Ingebrigtsen, 49836
On 18.09.2021 16:53, Mattias Engdegård wrote:
> Attached is a straw-man patch that is not production-quality but illustrates some of the concepts I'd like to see:
>
> * ripgrep supported for M-?
> * ripgrep auto-detected and used by default if available
> * treat ripgrep as a search method of its own and not just a different grep program
> * corollary: selection of search method should be made symbolically and not by supplying shell command strings
Perhaps we should split off the auto-detection feature and consider the
patch without it first. If people don't mind adding yet another
grep-or-ripgrep custom variable, this can be a reasonable change.
After landing that we could discuss the auto-detection approach, on
local and remote machines, and whether we could manage to do it only
once per host.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-18 23:53 ` Dmitry Gutov
@ 2021-09-19 0:21 ` Jim Porter
2021-09-19 10:11 ` Mattias Engdegård
0 siblings, 1 reply; 29+ messages in thread
From: Jim Porter @ 2021-09-19 0:21 UTC (permalink / raw)
To: Dmitry Gutov, Mattias Engdegård, Juri Linkov
Cc: Lars Ingebrigtsen, 49836
On 9/18/2021 4:53 PM, Dmitry Gutov wrote:
> Perhaps we should split off the auto-detection feature and consider the
> patch without it first. If people don't mind adding yet another
> grep-or-ripgrep custom variable, this can be a reasonable change.
>
> After landing that we could discuss the auto-detection approach, on
> local and remote machines, and whether we could manage to do it only
> once per host.
I've done something along these lines for `urgrep'[1], a package to
provide something like `M-x rgrep' that works across the seemingly
ever-growing list of grep-like tools out there. I'm still working on
improving the documentation before I think about putting it on ELPA, but
maybe there are some bits in there that would be useful here. I'd be
happy to coordinate on this if there's interest.
[1] https://github.com/jimporter/urgrep
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-19 0:21 ` Jim Porter
@ 2021-09-19 10:11 ` Mattias Engdegård
2021-09-20 0:14 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Mattias Engdegård @ 2021-09-19 10:11 UTC (permalink / raw)
To: Jim Porter; +Cc: Lars Ingebrigtsen, 49836, Dmitry Gutov, Juri Linkov
19 sep. 2021 kl. 02.21 skrev Jim Porter <jporterbugs@gmail.com>:
> [1] https://github.com/jimporter/urgrep
Thank you -- many fine ideas here and I'll be sure to steal at least something, even if the goals are slightly different.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-19 10:11 ` Mattias Engdegård
@ 2021-09-20 0:14 ` Dmitry Gutov
2021-09-20 5:09 ` Jim Porter
0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Gutov @ 2021-09-20 0:14 UTC (permalink / raw)
To: Mattias Engdegård, Jim Porter; +Cc: Lars Ingebrigtsen, 49836, Juri Linkov
On 19.09.2021 13:11, Mattias Engdegård wrote:
> even if the goals are slightly different
Are the goals different?
It seems like a good direction: search program specified as a symbol,
and the package knows how to generate search queries based on given
requirements (with a certain space).
I'm not sure it's flexible enough to be used in both
xref-matches-in-files and semantic/symref (yet?), but when I tried to
imagine a package that would, it looked fairly similar.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-20 0:14 ` Dmitry Gutov
@ 2021-09-20 5:09 ` Jim Porter
2021-09-20 17:04 ` Dmitry Gutov
0 siblings, 1 reply; 29+ messages in thread
From: Jim Porter @ 2021-09-20 5:09 UTC (permalink / raw)
To: Dmitry Gutov, Mattias Engdegård
Cc: Lars Ingebrigtsen, 49836, Juri Linkov
On 9/19/2021 5:14 PM, Dmitry Gutov wrote:
> On 19.09.2021 13:11, Mattias Engdegård wrote:
>> even if the goals are slightly different
>
> Are the goals different?
I think in the long run, the goals are very much the same. But in the
short run, my goal with urgrep was just to make something that would
work like `rgrep', but support multiple tools. There are already a
multitude of Emacs packages that provide `rgrep'-like functionality for
a particular tool, but I wanted something that worked (almost) the same
no matter what happens to be installed on the system.
> I'm not sure it's flexible enough to be used in both
> xref-matches-in-files and semantic/symref (yet?), but when I tried to
> imagine a package that would, it looked fairly similar.
If there are any (useful) commands that can't be generated with
`urgrep-command', but which most grep-like tools support, I definitely
want to add that capability. The current set of options is really just
what I use semi-regularly, so there's bound to be stuff I missed,
especially regarding semantic/symref.
That said, I don't want to slow things down too much in this bug. Maybe
for Emacs 29 though, it would make sense to put (parts of?) urgrep into
Emacs, since a unified solution would probably be helpful. I'll try to
find some time to post a message to emacs-devel to discuss this and get
some feedback.
^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#49836: Support ripgrep in semantic-symref-tool-grep
2021-09-20 5:09 ` Jim Porter
@ 2021-09-20 17:04 ` Dmitry Gutov
0 siblings, 0 replies; 29+ messages in thread
From: Dmitry Gutov @ 2021-09-20 17:04 UTC (permalink / raw)
To: Jim Porter, Mattias Engdegård; +Cc: Lars Ingebrigtsen, 49836, Juri Linkov
On 20.09.2021 08:09, Jim Porter wrote:
> On 9/19/2021 5:14 PM, Dmitry Gutov wrote:
>> On 19.09.2021 13:11, Mattias Engdegård wrote:
>>> even if the goals are slightly different
>>
>> Are the goals different?
>
> I think in the long run, the goals are very much the same. But in the
> short run, my goal with urgrep was just to make something that would
> work like `rgrep', but support multiple tools. There are already a
> multitude of Emacs packages that provide `rgrep'-like functionality for
> a particular tool, but I wanted something that worked (almost) the same
> no matter what happens to be installed on the system.
>
>> I'm not sure it's flexible enough to be used in both
>> xref-matches-in-files and semantic/symref (yet?), but when I tried to
>> imagine a package that would, it looked fairly similar.
>
> If there are any (useful) commands that can't be generated with
> `urgrep-command', but which most grep-like tools support, I definitely
> want to add that capability. The current set of options is really just
> what I use semi-regularly, so there's bound to be stuff I missed,
> especially regarding semantic/symref.
semantic/symref/grep is not too complicated in that regard: the command
looks like, for example
find -H ~/vc/emacs-master -type f \( -name \*..\*emacs -o -name
\*.ede -o -name \*.el \) -exec grep -nw -nH --null -e mhtml-mode \{\} +
xref's use is slightly different, but ultimately simpler: it assumes
files are piped from stdin. So it's either
xargs -0 rg -i -nH --no-messages -g '!*/' -e xref-search-program
or
xargs -0 grep -i -snHE -e <R>
And the xargs prefix can be just added on outside of your package.
> That said, I don't want to slow things down too much in this bug. Maybe
> for Emacs 29 though, it would make sense to put (parts of?) urgrep into
> Emacs, since a unified solution would probably be helpful. I'll try to
> find some time to post a message to emacs-devel to discuss this and get
> some feedback.
Sure. Thank you.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2021-09-20 17:04 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-15 15:56 master 544db1e: Faster grep pattern for identifiers Eli Zaretskii
2021-09-15 16:25 ` Dmitry Gutov
2021-09-15 16:33 ` Eli Zaretskii
2021-09-15 18:06 ` Dmitry Gutov
2021-09-15 18:14 ` Eli Zaretskii
2021-09-15 18:39 ` Dmitry Gutov
2021-09-17 16:07 ` bug#49836: Support ripgrep in semantic-symref-tool-grep Juri Linkov
2021-09-17 16:24 ` Lars Ingebrigtsen
2021-09-18 18:37 ` Juri Linkov
2021-09-16 7:28 ` master 544db1e: Faster grep pattern for identifiers Omar Polo
2021-09-15 16:29 ` Mattias Engdegård
-- strict thread matches above, loose matches on Subject: below --
2021-08-02 21:05 bug#49836: Support ripgrep in semantic-symref-tool-grep Juri Linkov
2021-08-03 8:10 ` Juri Linkov
2021-08-04 3:14 ` Dmitry Gutov
2021-08-04 21:23 ` Juri Linkov
2021-08-05 3:03 ` Dmitry Gutov
2021-08-06 0:35 ` Juri Linkov
2021-08-07 14:12 ` Dmitry Gutov
2021-09-18 13:53 ` Mattias Engdegård
2021-09-18 14:14 ` Eli Zaretskii
2021-09-18 14:18 ` Mattias Engdegård
2021-09-18 14:25 ` Eli Zaretskii
2021-09-18 21:48 ` Dmitry Gutov
2021-09-18 23:53 ` Dmitry Gutov
2021-09-19 0:21 ` Jim Porter
2021-09-19 10:11 ` Mattias Engdegård
2021-09-20 0:14 ` Dmitry Gutov
2021-09-20 5:09 ` Jim Porter
2021-09-20 17:04 ` Dmitry Gutov
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.