bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
@ 2021-03-01 20:40 Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-01 22:07 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-02 19:25 ` Juri Linkov
  0 siblings, 2 replies; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-01 20:40 UTC (permalink / raw)
  To: 46859

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]


Hi!

When using the xref buffer, especially in combination with
'project-find-regexp', sometimes my projects has huge one-line
files. The simplest example of these kind of files are the minified
".js" files that are compiled. Right now I have one at 500 000 columns,
which admittedly is a lot. However, when 'project-find-regexp' searches
these files and finds a hit in one of them, the search takes a long
time. In addition, navigating the xref buffer when the results show up
also takes a long time, because of the troubles emacs has with long
lines.

Before the supplied patch, one search with 'project-find-regexp' with
ripgrep enabled takes around 3-4 seconds. With the supplied patch, the
search is almost instantaneous.

The added functionality is created to not kick in before a certain
threshold, where 500 columns seems reasonably long. Anything above that
will be truncated, but xref will still show that there was a hit.

I'm sure the patch can be improved, so please, don't hesitate to tell
me.


I consider this a great improvement, and I hope you will to§

Have a nice day,

--
Theodor Thornhill


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-option-to-truncate-long-lines.patch --]
[-- Type: text/x-patch, Size: 3451 bytes --]

From dcc2078c20d8edd32407498c16d65ea46d44a3a0 Mon Sep 17 00:00:00 2001
From: Theodor Thornhill <theo@thornhill.no>
Date: Mon, 1 Mar 2021 21:26:44 +0100
Subject: [PATCH] Add option to truncate long lines

* (xref-truncate-line-to): New option.  Set cut-off point for long
lines in xref buffer

* (xref--truncate-long-lines-p): Helper function for xref--insert-xrefs.

* (xref--insert-xrefs): By truncating long lines we get a huge speedup
both when invoking xref, and while navigating the xref buffer.
---
 lisp/progmodes/xref.el | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 18fdd963fb..98dacef94f 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -872,6 +872,19 @@ beginning of the line."
       (xref--search-property 'xref-item))
   (xref-show-location-at-point))
 
+(defcustom xref-truncate-line-to 500
+  "Max number of columns to display in xref buffer."
+  :type '(choice
+          (fixnum :tag "Number of lines")
+          (null :tag "Don't truncate"))
+  :version "28.1"
+  :package-version '(xref . "1.0.5"))
+
+(defun xref--truncate-long-lines-p (summary)
+  (cl-check-type summary xref-item)
+  (and (numberp xref-truncate-line-to)
+       (> (length summary) xref-truncate-line-to)))
+
 (defun xref--insert-xrefs (xref-alist)
   "Insert XREF-ALIST in the current-buffer.
 XREF-ALIST is of the form ((GROUP . (XREF ...)) ...), where
@@ -902,14 +915,24 @@ GROUP is a string for decoration purposes and XREF is an
                                 "  ")))
                         ;; Render multiple matches on the same line, together.
                         (when (and line (equal prev-line-key line-key))
-                          (when-let ((column (xref-location-column location)))
-                            (delete-region
-                             (save-excursion
-                               (forward-line -1)
-                               (move-to-column (+ (length prefix) column))
+                          ;; Overwrite if we want to truncate long lines.
+                          (if (xref-truncate-long-lines-p new-summary)
+                              (delete-region
+                               (save-excursion (forward-line -1) (point))
+                               (point))
+                            (when-let ((column (xref-location-column location)))
+                              (delete-region
+                               (save-excursion
+                                 (forward-line -1)
+                                 (move-to-column (+ (length prefix) column))
+                                 (point))
                                (point))
-                             (point))
-                            (setq new-summary (substring summary column) prefix "")))
+                              (setq new-summary (substring summary column) prefix ""))))
+                        ;; Truncate
+                        (when (xref-truncate-long-lines-p new-summary)
+                          (setq new-summary
+                                (concat (substring new-summary 0 xref-truncate-line-to)
+                                        " (...truncated)")))
                         (xref--insert-propertized
                          (list 'xref-item xref
                                'mouse-face 'highlight
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-01 20:40 bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-01 22:07 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-02 19:25 ` Juri Linkov
  1 sibling, 0 replies; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-01 22:07 UTC (permalink / raw)
  To: 46859

[-- Attachment #1: Type: text/plain, Size: 70 bytes --]


There was a typo in the previous patch, sorry about that!

--
Theo



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-option-to-truncate-long-lines.patch --]
[-- Type: text/x-patch, Size: 3453 bytes --]

From 1697033f216a9792e374bda20816ff91491aa38e Mon Sep 17 00:00:00 2001
From: Theodor Thornhill <theo@thornhill.no>
Date: Mon, 1 Mar 2021 23:05:32 +0100
Subject: [PATCH] Add option to truncate long lines

* (xref-truncate-line-to): New option.  Set cut-off point for long
lines in xref buffer

* (xref--truncate-long-lines-p): Helper function for xref--insert-xrefs.

* (xref--insert-xrefs): By truncating long lines we get a huge speedup
both when invoking xref, and while navigating the xref buffer.
---
 lisp/progmodes/xref.el | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 18fdd963fb..c893137973 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -872,6 +872,19 @@ beginning of the line."
       (xref--search-property 'xref-item))
   (xref-show-location-at-point))
 
+(defcustom xref-truncate-line-to 500
+  "Max number of columns to display in xref buffer."
+  :type '(choice
+          (fixnum :tag "Number of lines")
+          (null :tag "Don't truncate"))
+  :version "28.1"
+  :package-version '(xref . "1.0.5"))
+
+(defun xref--truncate-long-lines-p (summary)
+  (cl-check-type summary xref-item)
+  (and (numberp xref-truncate-line-to)
+       (> (length summary) xref-truncate-line-to)))
+
 (defun xref--insert-xrefs (xref-alist)
   "Insert XREF-ALIST in the current-buffer.
 XREF-ALIST is of the form ((GROUP . (XREF ...)) ...), where
@@ -902,14 +915,24 @@ GROUP is a string for decoration purposes and XREF is an
                                 "  ")))
                         ;; Render multiple matches on the same line, together.
                         (when (and line (equal prev-line-key line-key))
-                          (when-let ((column (xref-location-column location)))
-                            (delete-region
-                             (save-excursion
-                               (forward-line -1)
-                               (move-to-column (+ (length prefix) column))
+                          ;; Overwrite if we want to truncate long lines.
+                          (if (xref--truncate-long-lines-p new-summary)
+                              (delete-region
+                               (save-excursion (forward-line -1) (point))
+                               (point))
+                            (when-let ((column (xref-location-column location)))
+                              (delete-region
+                               (save-excursion
+                                 (forward-line -1)
+                                 (move-to-column (+ (length prefix) column))
+                                 (point))
                                (point))
-                             (point))
-                            (setq new-summary (substring summary column) prefix "")))
+                              (setq new-summary (substring summary column) prefix ""))))
+                        ;; Truncate
+                        (when (xref--truncate-long-lines-p new-summary)
+                          (setq new-summary
+                                (concat (substring new-summary 0 xref-truncate-line-to)
+                                        " (...truncated)")))
                         (xref--insert-propertized
                          (list 'xref-item xref
                                'mouse-face 'highlight
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-01 20:40 bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-01 22:07 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-02 19:25 ` Juri Linkov
  2021-03-02 21:13   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 74+ messages in thread
From: Juri Linkov @ 2021-03-02 19:25 UTC (permalink / raw)
  To: 46859; +Cc: theo

> The added functionality is created to not kick in before a certain
> threshold, where 500 columns seems reasonably long. Anything above that
> will be truncated, but xref will still show that there was a hit.
>
> I'm sure the patch can be improved, so please, don't hesitate to tell
> me.
>
> I consider this a great improvement, and I hope you will to§

I've customized 'xref-search-program-alist' that defines ripgrep
command line to contain additional options:

  -M 500 --max-columns-preview

that truncates long lines of ripgrep output
after we discussed this in https://debbugs.gnu.org/44983

But maybe your new option would be easier to customize.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-02 19:25 ` Juri Linkov
@ 2021-03-02 21:13   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-02 21:37     ` Dmitry Gutov
  2021-03-03  9:05     ` Juri Linkov
  0 siblings, 2 replies; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-02 21:13 UTC (permalink / raw)
  To: juri, 46859


Hi!

Juri Linkov <juri@linkov.net> writes:
>
> I've customized 'xref-search-program-alist' that defines ripgrep
> command line to contain additional options:
>
>   -M 500 --max-columns-preview
>
Oh, right. That's pretty smart - and fast!

> that truncates long lines of ripgrep output
> after we discussed this in https://debbugs.gnu.org/44983
>

I'm sorry - completely missed that bug, it seems.

> But maybe your new option would be easier to customize.

Yeah, maybe.  However, without benchmarking, it is quite clear that
adding your option is faster than my patch, since ripgrep has to search
the whole minified file.  I assume it short circuits, so that results
are delivered quicker to emacs.  Maybe this bug can be closed.

Thanks for linking to that thread, and sorry for the noise :)

--
Theo





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-02 21:13   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-02 21:37     ` Dmitry Gutov
  2021-03-02 21:45       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-02 22:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-03  9:05     ` Juri Linkov
  1 sibling, 2 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-02 21:37 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

Hi Theodor,

On 02.03.2021 23:13, Theodor Thornhill via Bug reports for GNU Emacs, 
the Swiss army knife of text editors wrote:
> Yeah, maybe.  However, without benchmarking, it is quite clear that
> adding your option is faster than my patch, since ripgrep has to search
> the whole minified file.  I assume it short circuits, so that results
> are delivered quicker to emacs.  Maybe this bug can be closed.

Could you try benchmarking both approaches?

If the performance improvement from yours is at all comparable with 
Juri's, I'm inclined to prefer that direction for reasons described in 
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=44983#71.

In both cases Ripgrep (or Grep) will search the whole file. The -M flag 
just affects its output.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-02 21:37     ` Dmitry Gutov
@ 2021-03-02 21:45       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-02 22:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-02 21:45 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859, juri



> 2. mar. 2021 kl. 22:37 skrev Dmitry Gutov <dgutov@yandex.ru>:
> 
> Hi Theodor,
> 
>> On 02.03.2021 23:13, Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors wrote:
>> Yeah, maybe.  However, without benchmarking, it is quite clear that
>> adding your option is faster than my patch, since ripgrep has to search
>> the whole minified file.  I assume it short circuits, so that results
>> are delivered quicker to emacs.  Maybe this bug can be closed.
> 
> Could you try benchmarking both approaches?
> 

Absolutely, I will see what I can come up with.

> If the performance improvement from yours is at all comparable with Juri's, I'm inclined to prefer that direction for reasons described in https://debbugs.gnu.org/cgi/bugreport.cgi?bug=44983#71.
> 

Yeah, that was my initial motivation for this change as well. 

> In both cases Ripgrep (or Grep) will search the whole file. The -M flag just affects its output.

Oh, ok!

Ill get back to you.

—
Theo




^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-02 21:37     ` Dmitry Gutov
  2021-03-02 21:45       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-02 22:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-02 22:37         ` Dmitry Gutov
  1 sibling, 1 reply; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-02 22:14 UTC (permalink / raw)
  To: Dmitry Gutov, juri, 46859


Hi, Dmitry,

Dmitry Gutov <dgutov@yandex.ru> writes:

> Hi Theodor,
>
> On 02.03.2021 23:13, Theodor Thornhill via Bug reports for GNU Emacs, 
> the Swiss army knife of text editors wrote:
>> Yeah, maybe.  However, without benchmarking, it is quite clear that
>> adding your option is faster than my patch, since ripgrep has to search
>> the whole minified file.  I assume it short circuits, so that results
>> are delivered quicker to emacs.  Maybe this bug can be closed.
>
> Could you try benchmarking both approaches?
>
Ok, so here are some numbers:

;; With nothing
(benchmark-run 10 (project-find-regexp "UrlChange")) ;; (11.748253 14 0.23526199999999997)

;; With -M 500
(benchmark-run 10 (project-find-regexp "UrlChange")) ;; (0.293626 0 0.0)

;; My patch
(benchmark-run 10 (project-find-regexp "UrlChange")) ;; (1.230833 8 0.13783999999999996)

> If the performance improvement from yours is at all comparable with 
> Juri's, I'm inclined to prefer that direction for reasons described in 
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=44983#71.
>

So, now it looks like my patch is an improvement, but not as much as
limiting from ripgrep.  I think that is because in my version, we still
loop over the whole file, we just delete the contents so that we always
show 500 columns.  I'm interested in seeing if I could gain some more
performance by short circuiting after the first iteration of a match on
the same line.  In my test scenario there are a lot of matches on the
same huge line.  What do you think?

--
Theo





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-02 22:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-02 22:37         ` Dmitry Gutov
  2021-03-03 16:13           ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-02 22:37 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

On 03.03.2021 00:14, Theodor Thornhill via Bug reports for GNU Emacs, 
the Swiss army knife of text editors wrote:
> I'm interested in seeing if I could gain some more
> performance by short circuiting after the first iteration of a match on
> the same line.  In my test scenario there are a lot of matches on the
> same huge line.  What do you think?

You probably mean to short-circuit as soon as you reach the target 
column (there might be multiple matches within those 500 chars), 
skipping the rest of the matches on the same line.

Sounds worth a try.

Another approach would be to truncate the line sometime earlier, like:

diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 18fdd963fb..63a17a8521 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -1531,7 +1531,10 @@ xref-matches-in-files
        (while (re-search-forward grep-re nil t)
          (push (list (string-to-number (match-string line-group))
                      (match-string file-group)
-                    (buffer-substring-no-properties (point) 
(line-end-position)))
+                    (buffer-substring-no-properties (point)
+                                                    (min
+                                                     (+ (point) 500)
+                                                     (line-end-position))))
                hits)))
      (xref--convert-hits (nreverse hits) regexp)))

...of course, ideally we would keep all contents of the line somewhere 
in memory and truncate with (setq truncate-line t). But IIRC Juri said 
this didn't give us as much of a speedup as we'd want.

Another question: how many hits do you usually have in that huge 
one-line file? If it's more than 2-3, it might be that our current 
algorithm which creates "match objects" will duplicate this string 
unnecessarily N times (which is the number of hits), in 
xref--collect-matches-1, to then cut it up and merge into one line again 
when printing the buffer. In which case the patch above should also show 
a healthy improvement, but we could figure out something better instead.

Anyway, please benchmark your "earlier short-circuit" approach and then 
the above patch too.

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-02 21:13   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-02 21:37     ` Dmitry Gutov
@ 2021-03-03  9:05     ` Juri Linkov
  2021-03-03  9:52       ` Gregory Heytings
  2021-03-03 16:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 74+ messages in thread
From: Juri Linkov @ 2021-03-03  9:05 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: 46859

>> But maybe your new option would be easier to customize.
>
> Yeah, maybe.  However, without benchmarking, it is quite clear that
> adding your option is faster than my patch, since ripgrep has to search
> the whole minified file.  I assume it short circuits, so that results
> are delivered quicker to emacs.  Maybe this bug can be closed.

Your new option is still necessary for the default case when
'xref-search-program' is 'grep' since GNU grep has no option
to truncate output, so xref should do post-processing for grep.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03  9:05     ` Juri Linkov
@ 2021-03-03  9:52       ` Gregory Heytings
  2021-03-03 12:47         ` Dmitry Gutov
  2021-03-03 16:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-03  9:52 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 46859


>>> But maybe your new option would be easier to customize.
>>
>> Yeah, maybe.  However, without benchmarking, it is quite clear that 
>> adding your option is faster than my patch, since ripgrep has to search 
>> the whole minified file.  I assume it short circuits, so that results 
>> are delivered quicker to emacs.  Maybe this bug can be closed.
>
> Your new option is still necessary for the default case when 
> 'xref-search-program' is 'grep' since GNU grep has no option to truncate 
> output, so xref should do post-processing for grep.
>

Actually, it is possible to truncate output with GNU grep:

grep -oE '.{0,100}PATTERN.{0,100}'

prints at most 100 characters before and after PATTERN.  I find this much 
better than ripgrep -M.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03  9:52       ` Gregory Heytings
@ 2021-03-03 12:47         ` Dmitry Gutov
  2021-03-03 13:53           ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-03 12:47 UTC (permalink / raw)
  To: Gregory Heytings, Juri Linkov; +Cc: 46859

On 03.03.2021 11:52, Gregory Heytings wrote:
> Actually, it is possible to truncate output with GNU grep:
> 
> grep -oE '.{0,100}PATTERN.{0,100}'
> 
> prints at most 100 characters before and after PATTERN.  I find this 
> much better than ripgrep -M.

I'm not sure how to parse that output (it would be quite different from 
what we get now), and if the one-long-line file has many matches inside, 
we'll still get them all, which we might or might not want.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 12:47         ` Dmitry Gutov
@ 2021-03-03 13:53           ` Gregory Heytings
  2021-03-03 14:00             ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-03 13:53 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]

>> Actually, it is possible to truncate output with GNU grep:
>> 
>> grep -oE '.{0,100}PATTERN.{0,100}'
>> 
>> prints at most 100 characters before and after PATTERN.  I find this 
>> much better than ripgrep -M.
>
> I'm not sure how to parse that output (it would be quite different from 
> what we get now),
>

How so?  AFAICS, it's the exact same kind of output, except that it gets 
truncated.  And it's (obviously?) better to see the context of the pattern 
you are searching for, instead of the first characters of the lines on 
which the pattern is found, in which the pattern might not be present.

>
> and if the one-long-line file has many matches inside, we'll still get 
> them all, which we might or might not want.
>

Indeed, if one-long-line has many matches inside, you'll get them all, 
which IMO makes perfect sense.

Note that this does not happen when all matches are inside the boundaries. 
For example, if your search for '.{0,100}b.{0,100}' on "aaaabbbbcccc", you 
get a single match; if you search for '.{0,1}b.{0,1}' on that same string 
you get two matches.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 13:53           ` Gregory Heytings
@ 2021-03-03 14:00             ` Dmitry Gutov
  2021-03-03 15:04               ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-03 14:00 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 03.03.2021 15:53, Gregory Heytings wrote:
> How so?  AFAICS, it's the exact same kind of output, except that it gets 
> truncated.  And it's (obviously?) better to see the context of the 
> pattern you are searching for, instead of the first characters of the 
> lines on which the pattern is found, in which the pattern might not be 
> present.

Since Grep doesn't return the column number of the match, we get it from 
parsing the string again. And if the string is now modified to be 
truncated from both sides, the column number will become wrong.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 14:00             ` Dmitry Gutov
@ 2021-03-03 15:04               ` Gregory Heytings
  2021-03-03 17:11                 ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-03 15:04 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]


>> How so?  AFAICS, it's the exact same kind of output, except that it 
>> gets truncated.  And it's (obviously?) better to see the context of the 
>> pattern you are searching for, instead of the first characters of the 
>> lines on which the pattern is found, in which the pattern might not be 
>> present.
>
> Since Grep doesn't return the column number of the match, we get it from 
> parsing the string again. And if the string is now modified to be 
> truncated from both sides, the column number will become wrong.
>

I did not understand that you need the column number of the match.  That 
could perhaps become a feature request for GNU grep: with -o and -n, also 
print the column number of the first character.

That being said, if you truncate the N first characters of the matching 
line, you have to parse the original line (that is, not the line that grep 
or another tool outputs) to find the matches anyway.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-02 22:37         ` Dmitry Gutov
@ 2021-03-03 16:13           ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-03 17:29             ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-03 16:13 UTC (permalink / raw)
  To: Dmitry Gutov, juri, 46859

Hi again!
Dmitry Gutov <dgutov@yandex.ru> writes:

> On 03.03.2021 00:14, Theodor Thornhill via Bug reports for GNU Emacs, 
> the Swiss army knife of text editors wrote:
>> I'm interested in seeing if I could gain some more
>> performance by short circuiting after the first iteration of a match on
>> the same line.  In my test scenario there are a lot of matches on the
>> same huge line.  What do you think?

I couldn't really find any approaches that yielded better results with
short-circuiting in mind, so I dropped that idea.

> You probably mean to short-circuit as soon as you reach the target 

>
> ...of course, ideally we would keep all contents of the line somewhere 
> in memory and truncate with (setq truncate-line t). But IIRC Juri said 
> this didn't give us as much of a speedup as we'd want.
>
> Another question: how many hits do you usually have in that huge 
> one-line file? If it's more than 2-3, it might be that our current 
> algorithm which creates "match objects" will duplicate this string 
> unnecessarily N times (which is the number of hits), in 
> xref--collect-matches-1, to then cut it up and merge into one line again 
> when printing the buffer. In which case the patch above should also show 
> a healthy improvement, but we could figure out something better instead.
>

This long line has 25 matches, so yeah, that takes some time. With your
hint here I tried another approach which yielded some nice results.

Ok, some benchmarks:

;; With nothing
(benchmark-run 10 (project-find-regexp "UrlChange")) ;; (11.748253 14 0.23526199999999997)

;; With -M 500
(benchmark-run 10 (project-find-regexp "UrlChange")) ;; (0.293626 0 0.0)

;; My first patch
(benchmark-run 10 (project-find-regexp "UrlChange")) ;; (1.230833 8 0.13783999999999996)

;; Dmitrys patch
(benchmark-run 10 (project-find-regexp "UrlChange")) ;; (1.007787 0 0.0)

;; Latest diff (attached at the bottom)
(benchmark-run 10 (project-find-regexp "UrlChange")) ;; (1.0351299999999999 0 0.0)


So there are some interesting findings here:

- There are some improvements to gain
- None so far kills "-M 500"
- Pretty close between Dmitrys and my last patch

However, only my patch actually renders the long file as a match in the
output buffer. All the others seem to drop it altogether. IMO that is
one point in favour of my approaches. In addition, we could add another
defcustom for the xref--collect-matches-1 function,
"xref--collect-all-matches-p" or something like that. Retrofitting the
current variable seems a little off. That means you could customize xref
to render the whole long line if you want, while not bothering about
multiple matches. Not sure if that has a great benefit, though.

What do you think? Are any of these approaches worth pursuing further?

--
Theo


diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 18fdd963fb..fb422dcffa 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -872,6 +872,18 @@ beginning of the line."
       (xref--search-property 'xref-item))
   (xref-show-location-at-point))
 
+(defcustom xref-truncate-line-to 500
+  "Max number of columns to display in xref buffer."
+  :type '(choice
+          (fixnum :tag "Number of lines")
+          (null :tag "Don't truncate"))
+  :version "28.1"
+  :package-version '(xref . "1.0.5"))
+
+(defun xref--truncate-long-lines-p (summary)
+  (and (numberp xref-truncate-line-to)
+       (> (length summary) xref-truncate-line-to)))
+
 (defun xref--insert-xrefs (xref-alist)
   "Insert XREF-ALIST in the current-buffer.
 XREF-ALIST is of the form ((GROUP . (XREF ...)) ...), where
@@ -902,14 +914,22 @@ GROUP is a string for decoration purposes and XREF is an
                                 "  ")))
                         ;; Render multiple matches on the same line, together.
                         (when (and line (equal prev-line-key line-key))
-                          (when-let ((column (xref-location-column location)))
-                            (delete-region
-                             (save-excursion
-                               (forward-line -1)
-                               (move-to-column (+ (length prefix) column))
+                          (if (xref--truncate-long-lines-p summary)
+                              (delete-region
+                               (save-excursion (forward-line -1) (point))
+                               (point))
+                            (when-let ((column (xref-location-column location)))
+                              (delete-region
+                               (save-excursion
+                                 (forward-line -1)
+                                 (move-to-column (+ (length prefix) column))
+                                 (point))
                                (point))
-                             (point))
-                            (setq new-summary (substring summary column) prefix "")))
+                              (setq new-summary (substring summary column) prefix ""))))
+                        (when (xref--truncate-long-lines-p new-summary)
+                          (setq new-summary
+                                (concat (substring new-summary 0 xref-truncate-line-to)
+                                        " (...truncated)")))
                         (xref--insert-propertized
                          (list 'xref-item xref
                                'mouse-face 'highlight
@@ -1678,7 +1698,7 @@ Such as the current syntax table and the applied syntax properties."
                                  syntax-needed)))))
 
 (defun xref--collect-matches-1 (regexp file line line-beg line-end syntax-needed)
-  (let (matches)
+  (let (matches prev-line)
     (when syntax-needed
       (syntax-propertize line-end))
     ;; FIXME: This results in several lines with the same
@@ -1688,14 +1708,18 @@ Such as the current syntax table and the applied syntax properties."
             (or (null matches)
                 (> (point) line-beg))
             (re-search-forward regexp line-end t))
-      (let* ((beg-column (- (match-beginning 0) line-beg))
-             (end-column (- (match-end 0) line-beg))
-             (loc (xref-make-file-location file line beg-column))
-             (summary (buffer-substring line-beg line-end)))
-        (add-face-text-property beg-column end-column 'xref-match
-                                t summary)
-        (push (xref-make-match summary loc (- end-column beg-column))
-              matches)))
+
+      (unless (and (eq prev-line line)
+                   (numberp xref-truncate-line-to))
+        (let* ((beg-column (- (match-beginning 0) line-beg))
+               (end-column (- (match-end 0) line-beg))
+               (loc (xref-make-file-location file line beg-column))
+               (summary (buffer-substring line-beg line-end)))
+          (add-face-text-property beg-column end-column 'xref-match
+                                  t summary)
+          (push (xref-make-match summary loc (- end-column beg-column))
+                matches)))
+      (setq prev-line line))
     (nreverse matches)))
 
 (defun xref--find-file-buffer (file)





^ permalink raw reply related	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03  9:05     ` Juri Linkov
  2021-03-03  9:52       ` Gregory Heytings
@ 2021-03-03 16:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-03 16:14 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 46859

Hi Juri,

Juri Linkov <juri@linkov.net> writes:

>>> But maybe your new option would be easier to customize.
>>
>> Yeah, maybe.  However, without benchmarking, it is quite clear that
>> adding your option is faster than my patch, since ripgrep has to search
>> the whole minified file.  I assume it short circuits, so that results
>> are delivered quicker to emacs.  Maybe this bug can be closed.
>
> Your new option is still necessary for the default case when
> 'xref-search-program' is 'grep' since GNU grep has no option
> to truncate output, so xref should do post-processing for grep.

Yeah, agreed

--
Theo





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 15:04               ` Gregory Heytings
@ 2021-03-03 17:11                 ` Gregory Heytings
  2021-03-03 17:26                   ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-03 17:11 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]


>>> How so?  AFAICS, it's the exact same kind of output, except that it 
>>> gets truncated.  And it's (obviously?) better to see the context of 
>>> the pattern you are searching for, instead of the first characters of 
>>> the lines on which the pattern is found, in which the pattern might 
>>> not be present.
>> 
>> Since Grep doesn't return the column number of the match, we get it 
>> from parsing the string again. And if the string is now modified to be 
>> truncated from both sides, the column number will become wrong.
>
> I did not understand that you need the column number of the match. 
> That could perhaps become a feature request for GNU grep: with -o and 
> -n, also print the column number of the first character.
>

I wrote too fast.  In fact you can get the column number with GNU grep 
without parsing the original line:

grep -nb -oE '.{0,100}PATTERN.{0,100}'

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 17:11                 ` Gregory Heytings
@ 2021-03-03 17:26                   ` Dmitry Gutov
  2021-03-03 17:42                     ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-03 17:26 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 03.03.2021 19:11, Gregory Heytings wrote:
> I wrote too fast.  In fact you can get the column number with GNU grep 
> without parsing the original line:
> 
> grep -nb -oE '.{0,100}PATTERN.{0,100}'

This outputs byte offset from the beginning of the file, doesn't it?

Which will require at least reading the file into memory to convert.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 16:13           ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-03 17:29             ` Dmitry Gutov
  2021-03-03 19:54               ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-03 17:29 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

On 03.03.2021 18:13, Theodor Thornhill via Bug reports for GNU Emacs, 
the Swiss army knife of text editors wrote:
> This long line has 25 matches, so yeah, that takes some time. With your
> hint here I tried another approach which yielded some nice results.

Thank you.

Could you also try this benchmark with an input string that has no more 
than, say, 3 matches in the big one-line file? Or maybe just 1.

I'd like to compare the relative performance in such scenario, too.

 > However, only my patch actually renders the long file as a match in the
output buffer. All the others seem to drop it altogether. IMO that is
one point in favour of my approaches.

Indeed.

 > In addition, we could add another
defcustom for the xref--collect-matches-1 function,

That can be done already by the user customizing 
xref-search-program-alist, I think.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 17:26                   ` Dmitry Gutov
@ 2021-03-03 17:42                     ` Gregory Heytings
  2021-03-03 19:14                       ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-03 17:42 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

[-- Attachment #1: Type: text/plain, Size: 841 bytes --]

>> I wrote too fast.  In fact you can get the column number with GNU grep 
>> without parsing the original line:
>> 
>> grep -nb -oE '.{0,100}PATTERN.{0,100}'
>
> This outputs byte offset from the beginning of the file, doesn't it?
>

Yes.  You get, for each match: the line number (from the beginning of the 
file), the byte offset (from the beginning of the file) of the first 
displayed character, and the context of the match.

>
> Which will require at least reading the file into memory to convert.
>

I don't understand what you mean by that, but it seems to me that in any 
case it's much more efficient than parsing the output of grep with Elisp.

And you can easily get the byte offset of each beginning of line with 
"grep -nbo '^.'", so calculating the byte offset from the beginning of the 
line is easy.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 17:42                     ` Gregory Heytings
@ 2021-03-03 19:14                       ` Dmitry Gutov
  2021-03-03 19:34                         ` Gregory Heytings
  2021-03-03 19:59                         ` Juri Linkov
  0 siblings, 2 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-03 19:14 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 03.03.2021 19:42, Gregory Heytings wrote:
> 
>>> I wrote too fast.  In fact you can get the column number with GNU 
>>> grep without parsing the original line:
>>>
>>> grep -nb -oE '.{0,100}PATTERN.{0,100}'
>>
>> This outputs byte offset from the beginning of the file, doesn't it?
>>
> 
> Yes.  You get, for each match: the line number (from the beginning of 
> the file), the byte offset (from the beginning of the file) of the first 
> displayed character, and the context of the match.

OK, so we get the byte offset, but not the length of the match (which 
we'll also need later, for purposes such as highlighting and 
replacement). And what happens if there are several matches on the same 
line? We need columns for all of them.

>> Which will require at least reading the file into memory to convert.
>>
> 
> I don't understand what you mean by that, but it seems to me that in any 
> case it's much more efficient than parsing the output of grep with Elisp.

We currently don't visit the file buffer if it's not already visited, 
parsing the line in a temp buffer instead. That approach resulted in a 
nice perf improvement.

> And you can easily get the byte offset of each beginning of line with 
> "grep -nbo '^.'", so calculating the byte offset from the beginning of 
> the line is easy.

Do you mean to suggest we call grep one more time for each matching line?





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 19:14                       ` Dmitry Gutov
@ 2021-03-03 19:34                         ` Gregory Heytings
  2021-03-03 19:52                           ` Juri Linkov
  2021-03-03 20:30                           ` Dmitry Gutov
  2021-03-03 19:59                         ` Juri Linkov
  1 sibling, 2 replies; 74+ messages in thread
From: Gregory Heytings @ 2021-03-03 19:34 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

>> Yes.  You get, for each match: the line number (from the beginning of 
>> the file), the byte offset (from the beginning of the file) of the 
>> first displayed character, and the context of the match.
>
> OK, so we get the byte offset, but not the length of the match (which 
> we'll also need later, for purposes such as highlighting and 
> replacement). And what happens if there are several matches on the same 
> line? We need columns for all of them.
>

I don't know exactly what you want to do, I initially chimed in this 
conversation to react to Juri's "GNU grep has no option to truncate 
output", to mention that GNU grep does have an option to do this; perhaps 
it doesn't do exactly what you want.

I could be wrong, but I believe that adapting what you want to what GNU 
grep provides will always be more efficient than the opposite.

>> And you can easily get the byte offset of each beginning of line with 
>> "grep -nbo '^.'", so calculating the byte offset from the beginning of 
>> the line is easy.
>
> Do you mean to suggest we call grep one more time for each matching 
> line?
>

No, once for each file.  "grep -nbo '^.' FILE" returns a "<line>:<offset 
of first char>:<first char>" line for each line in FILE.  With this you 
can easily calculate the offset of a match on a given line.  This will be 
more efficient than calculating the offset of a match by parsing each line 
with Elisp code.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 19:34                         ` Gregory Heytings
@ 2021-03-03 19:52                           ` Juri Linkov
  2021-03-03 20:34                             ` Gregory Heytings
  2021-03-03 20:30                           ` Dmitry Gutov
  1 sibling, 1 reply; 74+ messages in thread
From: Juri Linkov @ 2021-03-03 19:52 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859, Dmitry Gutov

>> OK, so we get the byte offset, but not the length of the match (which
>> we'll also need later, for purposes such as highlighting and
>> replacement). And what happens if there are several matches on the same
>> line? We need columns for all of them.
>
> I don't know exactly what you want to do, I initially chimed in this
> conversation to react to Juri's "GNU grep has no option to truncate
> output", to mention that GNU grep does have an option to do this; perhaps
> it doesn't do exactly what you want.

By an option I meant a command line switch of GNU grep,
not something that looks like a hack.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 17:29             ` Dmitry Gutov
@ 2021-03-03 19:54               ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-06 22:26                 ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-03 19:54 UTC (permalink / raw)
  To: Dmitry Gutov, juri, 46859

Hi again,

>> I tried another approach which yielded some nice results.
>
> Thank you.
>
> Could you also try this benchmark with an input string that has no more 
> than, say, 3 matches in the big one-line file? Or maybe just 1.
>
> I'd like to compare the relative performance in such scenario, too.
>

Curiously, it doesn't seem to affect things much, neither for your patch
or mine. 

>> However, only my patch actually renders the long file as a match in the
> output buffer. All the others seem to drop it altogether. IMO that is
> one point in favour of my approaches.
>
> Indeed.
>


>> In addition, we could add another
>> defcustom for the xref--collect-matches-1 function,
>
> That can be done already by the user customizing 
> xref-search-program-alist, I think.

Oh? How so?

--
Theo





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 19:14                       ` Dmitry Gutov
  2021-03-03 19:34                         ` Gregory Heytings
@ 2021-03-03 19:59                         ` Juri Linkov
  2021-03-04  2:50                           ` Dmitry Gutov
  1 sibling, 1 reply; 74+ messages in thread
From: Juri Linkov @ 2021-03-03 19:59 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Gregory Heytings, 46859

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]

> We currently don't visit the file buffer if it's not already visited,
> parsing the line in a temp buffer instead. That approach resulted in a nice
> perf improvement.

Reusing visited files is a nice feature, but still needs a fix.

Test case: visit emacs/src/xdisp.c and type

  C-x p g expose_frame RET

See that not all lines from xdisp.c have font-lock highlighting.
After applying this patch, all xref output lines from xdisp.c
have font-lock faces:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xref-font-lock-ensure.patch --]
[-- Type: text/x-diff, Size: 834 bytes --]

diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 18fdd963fb..6a5361f852 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -1691,7 +1701,10 @@ xref--collect-matches-1
       (let* ((beg-column (- (match-beginning 0) line-beg))
              (end-column (- (match-end 0) line-beg))
              (loc (xref-make-file-location file line beg-column))
-             (summary (buffer-substring line-beg line-end)))
+             (summary (progn
+                        (unless syntax-needed
+                          (font-lock-ensure line-beg line-end))
+                        (buffer-substring line-beg line-end))))
         (add-face-text-property beg-column end-column 'xref-match
                                 t summary)
         (push (xref-make-match summary loc (- end-column beg-column))

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 19:34                         ` Gregory Heytings
  2021-03-03 19:52                           ` Juri Linkov
@ 2021-03-03 20:30                           ` Dmitry Gutov
  2021-03-03 21:06                             ` Gregory Heytings
  1 sibling, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-03 20:30 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 03.03.2021 21:34, Gregory Heytings wrote:
> 
>>> Yes.  You get, for each match: the line number (from the beginning of 
>>> the file), the byte offset (from the beginning of the file) of the 
>>> first displayed character, and the context of the match.
>>
>> OK, so we get the byte offset, but not the length of the match (which 
>> we'll also need later, for purposes such as highlighting and 
>> replacement). And what happens if there are several matches on the 
>> same line? We need columns for all of them.
>>
> 
> I don't know exactly what you want to do, I initially chimed in this 
> conversation to react to Juri's "GNU grep has no option to truncate 
> output", to mention that GNU grep does have an option to do this; 
> perhaps it doesn't do exactly what you want.
> 
> I could be wrong, but I believe that adapting what you want to what GNU 
> grep provides will always be more efficient than the opposite.

That's the general principle I have tried to follow, but Grep has proved 
suboptimal for a number of purposes (matching one regexp to multiple 
lines among them).

>>> And you can easily get the byte offset of each beginning of line with 
>>> "grep -nbo '^.'", so calculating the byte offset from the beginning 
>>> of the line is easy.
>>
>> Do you mean to suggest we call grep one more time for each matching line?
>>
> 
> No, once for each file.  "grep -nbo '^.' FILE" returns a "<line>:<offset 
> of first char>:<first char>" line for each line in FILE.  With this you 
> can easily calculate the offset of a match on a given line.  This will 
> be more efficient than calculating the offset of a match by parsing each 
> line with Elisp code.

That's still +1 Grep invocation per file, right? Can't say for sure, 
perhaps it will be more efficient than parsing in Lisp, but at least 
with Lisp I know that parsing 10-20 matches is fast (and, actually, it's 
fairly instantaneous with 1000s of matches, as long as we don't 
encounter pathological files where all contents reside on one line).

With your approach we'll have to deal with interpreting Grep outputs 
which list every line in the searched files. This will almost certainly 
be slower in the case when there are only handful of matches. But 
benchmarks welcome.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 19:52                           ` Juri Linkov
@ 2021-03-03 20:34                             ` Gregory Heytings
  2021-03-04  3:36                               ` Eli Zaretskii
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-03 20:34 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 46859


>>> OK, so we get the byte offset, but not the length of the match (which 
>>> we'll also need later, for purposes such as highlighting and 
>>> replacement). And what happens if there are several matches on the 
>>> same line? We need columns for all of them.
>>
>> I don't know exactly what you want to do, I initially chimed in this 
>> conversation to react to Juri's "GNU grep has no option to truncate 
>> output", to mention that GNU grep does have an option to do this; 
>> perhaps it doesn't do exactly what you want.
>
> By an option I meant a command line switch of GNU grep, not something 
> that looks like a hack.
>

It's not a hack at all, it's a command line switch: -o.  The amount of 
context, which defaults to zero, is given in the regexp instead of as an 
argument to the command line switch.

This -o option has been present since GNU grep 2.5, twenty years ago.

You can use it together with other options:

grep -o PATTERN FILE prints the matches
grep -no PATTERN FILE prints the matches and their line number
grep -bo PATTERN FILE prints the matches and their offset
grep -bo '.\{0,BEFORE\}PATTERN.\{0,AFTER\}' FILE prints the matches with a given BEFORE and AFTER context

and so forth.

And the -o option is supported by ripgrep, ag and ack, with almost the 
same syntax.

It's perhaps not what you want, but at least to me it seems more powerful 
than ripgrep's -M option.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 20:30                           ` Dmitry Gutov
@ 2021-03-03 21:06                             ` Gregory Heytings
  2021-03-06 12:44                               ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-03 21:06 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859


>
> With your approach we'll have to deal with interpreting Grep outputs 
> which list every line in the searched files. This will almost certainly 
> be slower in the case when there are only handful of matches. But 
> benchmarks welcome.
>

I don't know what you exactly need (I don't (yet?) use project), so I 
can't elaborate further or provide benchmarks alas.

Could you perhaps tell me what you need?





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 19:59                         ` Juri Linkov
@ 2021-03-04  2:50                           ` Dmitry Gutov
  2021-03-04  9:24                             ` Juri Linkov
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-04  2:50 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Gregory Heytings, 46859

Hi Juri,

On 03.03.2021 21:59, Juri Linkov wrote:
>> We currently don't visit the file buffer if it's not already visited,
>> parsing the line in a temp buffer instead. That approach resulted in a nice
>> perf improvement.
> 
> Reusing visited files is a nice feature, but still needs a fix.
> 
> Test case: visit emacs/src/xdisp.c and type
> 
>    C-x p g expose_frame RET
> 
> See that not all lines from xdisp.c have font-lock highlighting.
> After applying this patch, all xref output lines from xdisp.c
> have font-lock faces:

I thought about this, but applying font-lock rules is not exactly a 
trivial action. So I figured we better avoid it (and only call 
syntax-propertize when necessary) to get the best performance possible.

Have you tried benchmarking with and without your patch? Particular case 
of interest: many files, each already visited, with 1 match in each of 
them. Or few matches.

The opposite would be one file with many matches inside of it. This case 
should be relatively inexpensive for this patch, but it's worth 
measuring to compare too.

P.S. The "(unless syntax-needed" guard in the proposed patch is not needed.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 20:34                             ` Gregory Heytings
@ 2021-03-04  3:36                               ` Eli Zaretskii
  2021-03-04  9:19                                 ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Eli Zaretskii @ 2021-03-04  3:36 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859, juri

> Date: Wed, 03 Mar 2021 20:34:46 +0000
> From: Gregory Heytings <gregory@heytings.org>
> Cc: 46859@debbugs.gnu.org
> 
> > By an option I meant a command line switch of GNU grep, not something 
> > that looks like a hack.
> 
> It's not a hack at all, it's a command line switch: -o.  The amount of 
> context, which defaults to zero, is given in the regexp instead of as an 
> argument to the command line switch.
> 
> This -o option has been present since GNU grep 2.5, twenty years ago.
> 
> You can use it together with other options:
> 
> grep -o PATTERN FILE prints the matches
> grep -no PATTERN FILE prints the matches and their line number
> grep -bo PATTERN FILE prints the matches and their offset
> grep -bo '.\{0,BEFORE\}PATTERN.\{0,AFTER\}' FILE prints the matches with a given BEFORE and AFTER context
> 
> and so forth.
> 
> And the -o option is supported by ripgrep, ag and ack, with almost the 
> same syntax.

While you discuss all those possibilities, please be aware that byte
offsets have one more problem: converting them to character offsets or
columns might not be trivial, especially if the encoding of the file
is not UTF-8.  (Apologies if you already discussed this.)





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04  3:36                               ` Eli Zaretskii
@ 2021-03-04  9:19                                 ` Gregory Heytings
  2021-03-04 14:08                                   ` Eli Zaretskii
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-04  9:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46859

>
> While you discuss all those possibilities, please be aware that byte 
> offsets have one more problem: converting them to character offsets or 
> columns might not be trivial, especially if the encoding of the file is 
> not UTF-8.  (Apologies if you already discussed this.)
>

We did not discuss this, thanks for pointing that out.

Is this not easy to do with byte-to-position?

What I would suggest is to use "grep -nbo '.\{0,50\}PATTERN.\{0,50\}'", to 
hide the byte position in the xref buffer, and when the user jumps to an 
occurrence to use something like (goto-char (byte-to-position 
(get-byte-position))).  Does that make sense?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04  2:50                           ` Dmitry Gutov
@ 2021-03-04  9:24                             ` Juri Linkov
  2021-03-04 17:20                               ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Juri Linkov @ 2021-03-04  9:24 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Gregory Heytings, 46859

>> Reusing visited files is a nice feature, but still needs a fix.
>> Test case: visit emacs/src/xdisp.c and type
>>    C-x p g expose_frame RET
>> See that not all lines from xdisp.c have font-lock highlighting.
>> After applying this patch, all xref output lines from xdisp.c
>> have font-lock faces:
>
> I thought about this, but applying font-lock rules is not exactly a trivial
> action. So I figured we better avoid it (and only call syntax-propertize
> when necessary) to get the best performance possible.
>
> Have you tried benchmarking with and without your patch? Particular case of
> interest: many files, each already visited, with 1 match in each of
> them. Or few matches.
>
> The opposite would be one file with many matches inside of it. This case
> should be relatively inexpensive for this patch, but it's worth measuring
> to compare too.
>
> P.S. The "(unless syntax-needed" guard in the proposed patch is not needed.

;; Without patch
(benchmark-run 10 (project-find-regexp "expose_frame")) ;; (1.936206149 21 0.27307954999999995)

;; With patch without the "(unless syntax-needed" guard
(benchmark-run 10 (project-find-regexp "expose_frame")) ;; (2.195018443 31 0.354854643)

I don't know if extra font-locking is worth worse performance.
But I took this idea for consistency from occur that uses
'font-lock-ensure' in 'occur-engine-line'.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04  9:19                                 ` Gregory Heytings
@ 2021-03-04 14:08                                   ` Eli Zaretskii
  2021-03-04 14:39                                     ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Eli Zaretskii @ 2021-03-04 14:08 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

> Date: Thu, 04 Mar 2021 09:19:50 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: 46859@debbugs.gnu.org
> 
> > While you discuss all those possibilities, please be aware that byte 
> > offsets have one more problem: converting them to character offsets or 
> > columns might not be trivial, especially if the encoding of the file is 
> > not UTF-8.  (Apologies if you already discussed this.)
> >
> 
> We did not discuss this, thanks for pointing that out.
> 
> Is this not easy to do with byte-to-position?

No.  byte-to-position works for text in an Emacs buffer, whereas we
are talking about the text in its original file on disk.  Unless that
file is encoded in UTF-8, byte-to-position will give you wrong
results.  You need to use filepos-to-bufferpos, and you will need to
specify the file's encoding.  And it's relatively slow for non-UTF-8
encoded files.

> What I would suggest is to use "grep -nbo '.\{0,50\}PATTERN.\{0,50\}'", to 
> hide the byte position in the xref buffer, and when the user jumps to an 
> occurrence to use something like (goto-char (byte-to-position 
> (get-byte-position))).  Does that make sense?

Yes, but see above about encodings other than UTF-8.  For example, if
the original file is in Latin-1, each character is 1 byte, but in an
Emacs buffer non-ASCII Latin-1 characters will take 2 bytes.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 14:08                                   ` Eli Zaretskii
@ 2021-03-04 14:39                                     ` Gregory Heytings
  2021-03-04 15:13                                       ` Eli Zaretskii
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-04 14:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46859

>
> No.  byte-to-position works for text in an Emacs buffer, whereas we are 
> talking about the text in its original file on disk.  Unless that file 
> is encoded in UTF-8, byte-to-position will give you wrong results.  You 
> need to use filepos-to-bufferpos, and you will need to specify the 
> file's encoding.  And it's relatively slow for non-UTF-8 encoded files.
>

Thank you, I was not aware of that subtlety.

But you provide the solution: when an xref is followed, the file is opened 
in a buffer, at which point buffer-file-coding-system is set.  So it seems 
that it suffices to do (goto-char (filepos-to-bufferpos 
(get-byte-position))).

I just did a filepos-to-bufferpos for one of the last bytes of a 6 MB 
Latin-1 file, and it took only ~2 ms.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 14:39                                     ` Gregory Heytings
@ 2021-03-04 15:13                                       ` Eli Zaretskii
  2021-03-04 16:47                                         ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Eli Zaretskii @ 2021-03-04 15:13 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

> Date: Thu, 04 Mar 2021 14:39:23 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: 46859@debbugs.gnu.org
> 
> But you provide the solution: when an xref is followed, the file is opened 
> in a buffer, at which point buffer-file-coding-system is set.  So it seems 
> that it suffices to do (goto-char (filepos-to-bufferpos 
> (get-byte-position))).

Yes.  But it can be slow.

> I just did a filepos-to-bufferpos for one of the last bytes of a 6 MB 
> Latin-1 file, and it took only ~2 ms.

Which value of QUALITY did you use?

Also, what happens with multibyte encodings that are not UTF-8, like
iso-2022, for example?





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 15:13                                       ` Eli Zaretskii
@ 2021-03-04 16:47                                         ` Gregory Heytings
  2021-03-04 17:13                                           ` Eli Zaretskii
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-04 16:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46859

>> But you provide the solution: when an xref is followed, the file is 
>> opened in a buffer, at which point buffer-file-coding-system is set. 
>> So it seems that it suffices to do (goto-char (filepos-to-bufferpos 
>> (get-byte-position))).
>
> Yes.  But it can be slow.
>

Can it become so slow that it would have an impact on user experience? 
filepos-to-bufferpos would be called only when the xref link is followed, 
so I guess that even a 0.1 or 0.2 second delay should be okay.

>> I just did a filepos-to-bufferpos for one of the last bytes of a 6 MB 
>> Latin-1 file, and it took only ~2 ms.
>
> Which value of QUALITY did you use?
>

I just tried again on a 25 MB Latin-1 file, on one of the last bytes it 
took ~13 ms without specifying a quality.  I tried with nil, 'approximate 
and 'best, but do not see any difference, the result with benchmark-run is 
always ~13 ms.

>
> Also, what happens with multibyte encodings that are not UTF-8, like 
> iso-2022, for example?
>

Well, the Latin-1 file is already different from UTF-8.

I don't know anything about ISO-2022, but tried with a 25 MB file, created 
with iconv, which Emacs recognizes as an iso-2022-7bit-dos one.  In that 
case filepos-to-bufferpos does not work at all, with 'approximate you get 
a position that is about 2 million characters away from the correct one, 
and with 'best or nil you get nil...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 16:47                                         ` Gregory Heytings
@ 2021-03-04 17:13                                           ` Eli Zaretskii
  2021-03-04 17:35                                             ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Eli Zaretskii @ 2021-03-04 17:13 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

> Date: Thu, 04 Mar 2021 16:47:30 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: 46859@debbugs.gnu.org
> 
> > Yes.  But it can be slow.
> 
> Can it become so slow that it would have an impact on user experience? 

I don't know, Grep is a somewhat special application.

> filepos-to-bufferpos would be called only when the xref link is followed, 
> so I guess that even a 0.1 or 0.2 second delay should be okay.

Yes, 0.1 sec is definitely okay.

> I just tried again on a 25 MB Latin-1 file, on one of the last bytes it 
> took ~13 ms without specifying a quality.  I tried with nil, 'approximate 
> and 'best, but do not see any difference, the result with benchmark-run is 
> always ~13 ms.
> 
> > Also, what happens with multibyte encodings that are not UTF-8, like 
> > iso-2022, for example?
> 
> Well, the Latin-1 file is already different from UTF-8.

AFAIR, with single-byte encoding we take a shortcut there.

> I don't know anything about ISO-2022, but tried with a 25 MB file, created 
> with iconv, which Emacs recognizes as an iso-2022-7bit-dos one.  In that 
> case filepos-to-bufferpos does not work at all, with 'approximate you get 
> a position that is about 2 million characters away from the correct one, 
> and with 'best or nil you get nil...

Not 'best, 'exact.  Did you try 'exact?  It should have worked,
barring any bugs.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04  9:24                             ` Juri Linkov
@ 2021-03-04 17:20                               ` Dmitry Gutov
  2021-03-04 17:56                                 ` Juri Linkov
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-04 17:20 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Gregory Heytings, 46859

On 04.03.2021 11:24, Juri Linkov wrote:
> ;; Without patch
> (benchmark-run 10 (project-find-regexp "expose_frame")) ;; (1.936206149 21 0.27307954999999995)
> 
> ;; With patch without the "(unless syntax-needed" guard
> (benchmark-run 10 (project-find-regexp "expose_frame")) ;; (2.195018443 31 0.354854643)

Thanks.

Note that running the benchmark 10 times means that on the last 9 
iterations the lines are already fontified.

The cost of doing this should be apparent with more search hits, though. 
For example:

   (benchmark-run 1 (project-find-regexp "expose"))

> I don't know if extra font-locking is worth worse performance.
> But I took this idea for consistency from occur that uses
> 'font-lock-ensure' in 'occur-engine-line'.

If you're sure you want this behavior, we can make it a user option.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 17:13                                           ` Eli Zaretskii
@ 2021-03-04 17:35                                             ` Gregory Heytings
  2021-03-04 18:28                                               ` Eli Zaretskii
  2021-03-06 12:31                                               ` Dmitry Gutov
  0 siblings, 2 replies; 74+ messages in thread
From: Gregory Heytings @ 2021-03-04 17:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 46859


>> I don't know anything about ISO-2022, but tried with a 25 MB file, 
>> created with iconv, which Emacs recognizes as an iso-2022-7bit-dos one. 
>> In that case filepos-to-bufferpos does not work at all, with 
>> 'approximate you get a position that is about 2 million characters away 
>> from the correct one, and with 'best or nil you get nil...
>
> Not 'best, 'exact.  Did you try 'exact?  It should have worked, barring 
> any bugs.
>

Sorry for mixing things up.  Just tried 'exact, it works, it's slow (but 
not horribly slow: 1.2 seconds on that 25 MB file), but it doesn't give 
the correct answer alas, it's about ~1000 characters away from the correct 
position.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 17:20                               ` Dmitry Gutov
@ 2021-03-04 17:56                                 ` Juri Linkov
  2021-03-04 18:57                                   ` Dmitry Gutov
  2021-03-06 12:39                                   ` Dmitry Gutov
  0 siblings, 2 replies; 74+ messages in thread
From: Juri Linkov @ 2021-03-04 17:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Gregory Heytings, 46859

>> I don't know if extra font-locking is worth worse performance.
>> But I took this idea for consistency from occur that uses
>> 'font-lock-ensure' in 'occur-engine-line'.
>
> If you're sure you want this behavior, we can make it a user option.

I'm not sure because most hits are not fontified anyway
when not all files are visited especially in a large codebase.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 17:35                                             ` Gregory Heytings
@ 2021-03-04 18:28                                               ` Eli Zaretskii
  2021-03-06 12:31                                               ` Dmitry Gutov
  1 sibling, 0 replies; 74+ messages in thread
From: Eli Zaretskii @ 2021-03-04 18:28 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

> Date: Thu, 04 Mar 2021 17:35:42 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: 46859@debbugs.gnu.org
> 
> > Not 'best, 'exact.  Did you try 'exact?  It should have worked, barring 
> > any bugs.
> 
> Sorry for mixing things up.  Just tried 'exact, it works, it's slow (but 
> not horribly slow: 1.2 seconds on that 25 MB file), but it doesn't give 
> the correct answer alas, it's about ~1000 characters away from the correct 
> position.

Sounds like a bug, I'd appreciate a bug report with the details
(including the file, if you can share it).

Thanks.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 17:56                                 ` Juri Linkov
@ 2021-03-04 18:57                                   ` Dmitry Gutov
  2021-03-06 12:39                                   ` Dmitry Gutov
  1 sibling, 0 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-04 18:57 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Gregory Heytings, 46859

On 04.03.2021 19:56, Juri Linkov wrote:
> I'm not sure because most hits are not fontified anyway
> when not all files are visited especially in a large codebase.

Yes, so highlighting won't be universal anyway.

OTOH, if we have a long file which is visited somewhere near the 
beginning, and there are a lot of hits inside, font-lock-ensure can 
create some noticeable overhead.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 17:35                                             ` Gregory Heytings
  2021-03-04 18:28                                               ` Eli Zaretskii
@ 2021-03-06 12:31                                               ` Dmitry Gutov
  2021-03-06 12:37                                                 ` Dmitry Gutov
  2021-03-06 12:49                                                 ` Gregory Heytings
  1 sibling, 2 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 12:31 UTC (permalink / raw)
  To: Gregory Heytings, Eli Zaretskii; +Cc: 46859

On 04.03.2021 19:35, Gregory Heytings wrote:
> Just tried 'exact, it works, it's slow (but not horribly slow: 1.2 
> seconds on that 25 MB file)

1.2s is pretty slow for this purpose.

Otherwise it would be possible to use this approach (by introducing a 
Grep-specific location type) and avoid parsing the line contents for 
every match.

One downside would be missing on the syntax highlighting which we 
sometimes get for already-fontified parts of buffers, but it's 
unreliable anyway.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 12:31                                               ` Dmitry Gutov
@ 2021-03-06 12:37                                                 ` Dmitry Gutov
  2021-03-06 12:54                                                   ` Gregory Heytings
  2021-03-06 12:49                                                 ` Gregory Heytings
  1 sibling, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 12:37 UTC (permalink / raw)
  To: Gregory Heytings, Eli Zaretskii; +Cc: 46859

On 06.03.2021 14:31, Dmitry Gutov wrote:
> Otherwise it would be possible to use this approach (by introducing a 
> Grep-specific location type) and avoid parsing the line contents for 
> every match.

Except, um, we still need to fill in the "summary" attribute for all 
matches, so that there is something to display in the Xref buffer (the 
line contents around the match), and the -o flag strips those.

And if we were to use the '.{0,100}file.{0,100}' trick, it messes up the 
location of the match, the reported byte offset become unreliable. Also: 
grepping for that kind of regexp is noticeably slower than grepping for 
'file'. Or even '.file'. Like 85ms vs 7ms slower.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-04 17:56                                 ` Juri Linkov
  2021-03-04 18:57                                   ` Dmitry Gutov
@ 2021-03-06 12:39                                   ` Dmitry Gutov
  1 sibling, 0 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 12:39 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Gregory Heytings, 46859

On 04.03.2021 19:56, Juri Linkov wrote:
>>> I don't know if extra font-locking is worth worse performance.
>>> But I took this idea for consistency from occur that uses
>>> 'font-lock-ensure' in 'occur-engine-line'.
>> If you're sure you want this behavior, we can make it a user option.
> I'm not sure because most hits are not fontified anyway
> when not all files are visited especially in a large codebase.

Other values for this option for better consistency could mean "strip 
all fontifications" and "make sure to visit every file and fontify every 
line".

I'm not sure anybody would want either of them, though.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 21:06                             ` Gregory Heytings
@ 2021-03-06 12:44                               ` Dmitry Gutov
  2021-03-06 12:58                                 ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 12:44 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 03.03.2021 23:06, Gregory Heytings wrote:
> 
>>
>> With your approach we'll have to deal with interpreting Grep outputs 
>> which list every line in the searched files. This will almost 
>> certainly be slower in the case when there are only handful of 
>> matches. But benchmarks welcome.
>>
> 
> I don't know what you exactly need (I don't (yet?) use project), so I 
> can't elaborate further or provide benchmarks alas.

The same code is also used in the default implementation of 
xref-find-references, in case you ever tried it.

> Could you perhaps tell me what you need?

We are discussing changes to xref.el, because project-find-regexp 
delegates a lot of its logic to it.

Check out xref-matches-in-files and xref--convert-hits which it calls at 
the end. What you're thinking of seems to require a Grep-specific 
version of xref--convert-hits logic, which in the end constructs 
specialized xref items with a new type of location (alternative to 
xref-file-location).

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 12:31                                               ` Dmitry Gutov
  2021-03-06 12:37                                                 ` Dmitry Gutov
@ 2021-03-06 12:49                                                 ` Gregory Heytings
  2021-03-06 14:07                                                   ` Dmitry Gutov
  1 sibling, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-06 12:49 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

>> Just tried 'exact, it works, it's slow (but not horribly slow: 1.2 
>> seconds on that 25 MB file)
>
> 1.2s is pretty slow for this purpose.
>

It is: (1) on a 25 MB file (not the typical case), (2) on a file with an 
exotic encoding (not a typical case either).  On typical files (UTF-8 or 
single byte encoding) the delay is not noticeable (a few milliseconds).

>
> Otherwise it would be possible to use this approach (by introducing a 
> Grep-specific location type) and avoid parsing the line contents for 
> every match.
>

It would be much faster, especially with very long lines, which was the 
question with which this bug report started.

>
> One downside would be missing on the syntax highlighting which we 
> sometimes get for already-fontified parts of buffers, but it's 
> unreliable anyway.
>

Yes, I don't understand why syntax highlighting is used in such buffers, 
IMO it's useless, it would be useful only if (1) it were reliable and (2) 
all lines were highlighted.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 12:37                                                 ` Dmitry Gutov
@ 2021-03-06 12:54                                                   ` Gregory Heytings
  2021-03-06 14:26                                                     ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-06 12:54 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859


>
> Except, um, we still need to fill in the "summary" attribute for all 
> matches, so that there is something to display in the Xref buffer (the 
> line contents around the match), and the -o flag strips those.
>

That's the purpose of the '.{0,100}' context.  In typical cases (souce 
code files with lines that do not have more than 80 colums) you don't even 
see the difference in the result buffer: you see the whole line.

>
> And if we were to use the '.{0,100}file.{0,100}' trick, it messes up the 
> location of the match, the reported byte offset become unreliable.
>

That's a problem to solve, indeed.  At first sight it doesn't seem 
unsolvable.

>
> Also: grepping for that kind of regexp is noticeably slower than 
> grepping for 'file'. Or even '.file'. Like 85ms vs 7ms slower.
>

Well, the bug report mentioned delays of 3-4 seconds on files with very 
long lines, so I'd guess that 85 ms is a pretty reasonable speed...





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 12:44                               ` Dmitry Gutov
@ 2021-03-06 12:58                                 ` Gregory Heytings
  2021-03-06 14:06                                   ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-06 12:58 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859


>> I don't know what you exactly need (I don't (yet?) use project), so I 
>> can't elaborate further or provide benchmarks alas.
>
> The same code is also used in the default implementation of 
> xref-find-references, in case you ever tried it.
>
>> Could you perhaps tell me what you need?
>
> We are discussing changes to xref.el, because project-find-regexp 
> delegates a lot of its logic to it.
>
> Check out xref-matches-in-files and xref--convert-hits which it calls at 
> the end.
>

Okay, thanks, what you need is clearer to me, I'll have a look.

>
> What you're thinking of seems to require a Grep-specific version of 
> xref--convert-hits logic, which in the end constructs specialized xref 
> items with a new type of location (alternative to xref-file-location).
>

Yes, that's what I'm thinking of indeed, but it's not specific to "grep" 
because it would work the same way with ripgrep, ag and ack.  But indeed 
it's specific to grep-like tools.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 12:58                                 ` Gregory Heytings
@ 2021-03-06 14:06                                   ` Dmitry Gutov
  2021-03-06 22:55                                     ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 14:06 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 06.03.2021 14:58, Gregory Heytings wrote:
>> What you're thinking of seems to require a Grep-specific version of 
>> xref--convert-hits logic, which in the end constructs specialized xref 
>> items with a new type of location (alternative to xref-file-location).
>>
> 
> Yes, that's what I'm thinking of indeed, but it's not specific to "grep" 
> because it would work the same way with ripgrep, ag and ack.  But indeed 
> it's specific to grep-like tools.

Ah, ok. If ripgrep can do this as well, it would be more general (a good 
thing).

Note that it mentions the following in the description of its -b argument:

   If ripgrep does transcoding, then the byte offset is in terms of the
   the result of transcoding and not the original data. This applies
   similarly to another transformation on the source,
   such as decompression or a --pre filter. Note that when the PCRE2
   regex engine is used, then UTF-8 transcoding is done by default.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 12:49                                                 ` Gregory Heytings
@ 2021-03-06 14:07                                                   ` Dmitry Gutov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 14:07 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 06.03.2021 14:49, Gregory Heytings wrote:
> 
>>> Just tried 'exact, it works, it's slow (but not horribly slow: 1.2 
>>> seconds on that 25 MB file)
>>
>> 1.2s is pretty slow for this purpose.
>>
> 
> It is: (1) on a 25 MB file (not the typical case), (2) on a file with an 
> exotic encoding (not a typical case either).  On typical files (UTF-8 or 
> single byte encoding) the delay is not noticeable (a few milliseconds).

A few milliseconds is much better, as long as we're not too mistaken 
about the set of "typical files". Or we'd be exchanging one set of 
tradeoffs for another.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 12:54                                                   ` Gregory Heytings
@ 2021-03-06 14:26                                                     ` Dmitry Gutov
  2021-03-06 22:47                                                       ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 14:26 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 06.03.2021 14:54, Gregory Heytings wrote:
>> Also: grepping for that kind of regexp is noticeably slower than 
>> grepping for 'file'. Or even '.file'. Like 85ms vs 7ms slower.
>>
> 
> Well, the bug report mentioned delays of 3-4 seconds on files with very 
> long lines, so I'd guess that 85 ms is a pretty reasonable speed...

We do want fast searches to remain fast, too.

I got that 85ms timing when searching just one file. A project can often 
contain thousands of files.

In my further testing, the difference is not as stark because of other 
Elisp overhead on file listing, serialization/deserialization/process 
calls/parsing output, but even with all that in my work project the 
difference between such searches can be 0.27s vs 0.43s.

With further optimizations of project file listing logic, the difference 
can become even more pronounced (project-files in the same project takes 
0.14s).

You can benchmark it yourself with this form:

(benchmark 1 '(project-find-regexp ".\\{0,100\\}file.\\{0,100\\}"))

vs

(benchmark 1 '(project-find-regexp "file"))

(I get 9s vs 7s in the same project for this particular search).

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-03 19:54               ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-06 22:26                 ` Dmitry Gutov
  2021-03-07  1:29                   ` Dmitry Gutov
                                     ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 22:26 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

Hi Theodor,

On 03.03.2021 21:54, Theodor Thornhill via Bug reports for GNU Emacs, 
the Swiss army knife of text editors wrote:
> Hi again,
> 
>>> I tried another approach which yielded some nice results.
>>
>> Thank you.
>>
>> Could you also try this benchmark with an input string that has no more
>> than, say, 3 matches in the big one-line file? Or maybe just 1.
>>
>> I'd like to compare the relative performance in such scenario, too.
>>
> 
> Curiously, it doesn't seem to affect things much, neither for your patch
> or mine.

I just got around to testing this properly (sorry), and so far I've been 
able to reproduce the slow behavior only when there are many matches in 
the "big long line" file. I'm using a 500KB minified CSS as an example.

When there are only a few matches, the search is relatively 
instantaneous. So that's a weird mismatch with your reports. If you have 
some details to add to reproduce the slowdown in the "few matches" case, 
that could be helpful too.

I'm currently looking at the patch and trying to figure out whether we 
could apply some smaller change, or a change not in xref--insert-xrefs 
(which is relatively complex already) with the same benefits.

Also:

- Could you explain the change to xref--collect-matches-1 in the most 
recent patch? In my testing it doesn't move the needle at all, and it 
seems unnecessary because neither Grep or Ripgrep report matches on the 
same line separately with the current arguments that we pass to them. 
But if we did... what's the idea? Skip all subsequent matches, no matter 
if they're far or close?

- What do you think about making an effort to actually retain all the 
matches in the output? That would mean interpreting the 
xref-truncate-line-to value (or however the var could be renamed) as the 
maximum number of chars to render on the line *per match*. And if there 
is too much text between them, those parts can become "(truncated...)". 
Your current implementation can cut off valid matches, and we probably 
want to preserve them if feasible. OTOH, the default value could go down 
to 200 with this approach.

>>> In addition, we could add another
>>> defcustom for the xref--collect-matches-1 function,
>>
>> That can be done already by the user customizing
>> xref-search-program-alist, I think.
> 
> Oh? How so?

One can add " -M300 --max-columns-preview" in the middle of the ripgrep 
entry in xref-search-program-alist, as well as set xref-search-program 
to 'ripgrep'.

What I mean is, we can provide the "fullest featured" default behavior, 
one which never omits any valid matches and just truncated the line 
context around them, and the users who want even faster searches (at the 
cost of missing matches, esp. in find-replace scenarios) have something 
else to customize too.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 14:26                                                     ` Dmitry Gutov
@ 2021-03-06 22:47                                                       ` Gregory Heytings
  2021-03-06 23:00                                                         ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-06 22:47 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859


>>> Also: grepping for that kind of regexp is noticeably slower than 
>>> grepping for 'file'. Or even '.file'. Like 85ms vs 7ms slower.
>> 
>> Well, the bug report mentioned delays of 3-4 seconds on files with very 
>> long lines, so I'd guess that 85 ms is a pretty reasonable speed...
>
> We do want fast searches to remain fast, too.
>
> I got that 85ms timing when searching just one file. A project can often 
> contain thousands of files.
>

I just did a number of timing tests.  The timings were done in a shell, on 
a fresh clone of the Emacs repository, which contains ~5000 files, and in 
which one searches for the 43 occurrences of "expose_frame".

The timings are (in seconds):

with GNU grep (version 3.6):

0.124 | "find  -name '.?*' -prune -o -type f -print | xargs grep -i -snHE expose_frame"
0.178 | "find  -name '.?*' -prune -o -type f -print | xargs grep -i -snobHE '.{0,50}expose_frame.{0,50}'"
0.253 | "find  -name '.?*' -prune -o -type f -print | xargs grep -i -snobHE '.{0,80}expose_frame.{0,80}'"
0.325 | "find  -name '.?*' -prune -o -type f -print | xargs grep -i -snobHE '.{0,100}expose_frame.{0,100}'"

with ripgrep (version 12.1.1):

0.045 | "find  -name '.?*' -prune -o -type f -print | xargs rg -i -nH --no-messages expose_frame"
0.079 | "find  -name '.?*' -prune -o -type f -print | xargs rg -i -nobH --no-messages '.{0,50}expose_frame.{0,50}'"
0.109 | "find  -name '.?*' -prune -o -type f -print | xargs rg -i -nobH --no-messages '.{0,80}expose_frame.{0,80}'"
0.113 | "find  -name '.?*' -prune -o -type f -print | xargs rg -i -nobH --no-messages '.{0,100}expose_frame.{0,100}'"

It seems that a reasonable compromise is a context of 80 characters, which 
is only two times slower than a string search with both GNU grep and 
ripgrep, and still very fast.

(FTR, I also compared these performances with ack, ag and git grep.  To my 
surprise, they are much slower: ack is about three times slower than GNU 
grep on a string search, ag is a bit slower than GNU grep on string 
searches but much much slower on regexp searches, and git grep is a bit 
faster than ripgrep (and GNU grep) on string searches but again much much 
slower on regexp searches.)





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 14:06                                   ` Dmitry Gutov
@ 2021-03-06 22:55                                     ` Gregory Heytings
  0 siblings, 0 replies; 74+ messages in thread
From: Gregory Heytings @ 2021-03-06 22:55 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859


>
> Ah, ok. If ripgrep can do this as well, it would be more general (a good 
> thing).
>

Yes, indeed.

>
> Note that it mentions the following in the description of its -b 
> argument:
>
>  If ripgrep does transcoding, then the byte offset is in terms of the
>  the result of transcoding and not the original data. This applies
>  similarly to another transformation on the source, such as
>  decompression or a --pre filter. Note that when the PCRE2 regex engine
>  is used, then UTF-8 transcoding is done by default.
>

As the manpage mentions, this transcoding is done by default _only_ when 
the PCRE2 regex engine is used, that is, when ripgrep was built with PCRE2 
(the Debian package for example is built without PCRE2) and when the 
--pcre2 flag is passed.  And even in that case it is possible to disable 
the transcoding with --no-encoding.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 22:47                                                       ` Gregory Heytings
@ 2021-03-06 23:00                                                         ` Dmitry Gutov
  2021-03-06 23:24                                                           ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-06 23:00 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

Hi Gregory,

On 07.03.2021 00:47, Gregory Heytings wrote:

> I just did a number of timing tests.  The timings were done in a shell, 
> on a fresh clone of the Emacs repository, which contains ~5000 files, 
> and in which one searches for the 43 occurrences of "expose_frame".
> 
> The timings are (in seconds):
> 
> with GNU grep (version 3.6):
> 
> 0.124 | "find  -name '.?*' -prune -o -type f -print | xargs grep -i 
> -snHE expose_frame"
> 0.178 | "find  -name '.?*' -prune -o -type f -print | xargs grep -i 
> -snobHE '.{0,50}expose_frame.{0,50}'"
> 0.253 | "find  -name '.?*' -prune -o -type f -print | xargs grep -i 
> -snobHE '.{0,80}expose_frame.{0,80}'"
> 0.325 | "find  -name '.?*' -prune -o -type f -print | xargs grep -i 
> -snobHE '.{0,100}expose_frame.{0,100}'"
> 
> with ripgrep (version 12.1.1):
> 
> 0.045 | "find  -name '.?*' -prune -o -type f -print | xargs rg -i -nH 
> --no-messages expose_frame"
> 0.079 | "find  -name '.?*' -prune -o -type f -print | xargs rg -i -nobH 
> --no-messages '.{0,50}expose_frame.{0,50}'"
> 0.109 | "find  -name '.?*' -prune -o -type f -print | xargs rg -i -nobH 
> --no-messages '.{0,80}expose_frame.{0,80}'"
> 0.113 | "find  -name '.?*' -prune -o -type f -print | xargs rg -i -nobH 
> --no-messages '.{0,100}expose_frame.{0,100}'"
> 
> It seems that a reasonable compromise is a context of 80 characters, 
> which is only two times slower than a string search with both GNU grep 
> and ripgrep, and still very fast.

'find' is rarely the fastest way to list all the files in the project. 
Have you timed it alone?

'git ls-files' is usually much faster, and that's what 'project-files' 
uses under the covers. So if you redo your test with 
'project-find-regexp' as I suggested, you might discover a different 
slowdown multiplier.

> (FTR, I also compared these performances with ack, ag and git grep.  To 
> my surprise, they are much slower: ack is about three times slower than 
> GNU grep on a string search, ag is a bit slower than GNU grep on string 
> searches but much much slower on regexp searches, and git grep is a bit 
> faster than ripgrep (and GNU grep) on string searches but again much 
> much slower on regexp searches.)

ripgrep is generally the best all-arounder these days, though it might 
be slower in certain odd cases.

'git grep' is not a real option because it uses Git's list of tracked 
files directly, and we can't really do that. And that skews the comparisons.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 23:00                                                         ` Dmitry Gutov
@ 2021-03-06 23:24                                                           ` Gregory Heytings
  2021-03-07  3:08                                                             ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-06 23:24 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

>
> 'find' is rarely the fastest way to list all the files in the project. 
> Have you timed it alone?
>
> 'git ls-files' is usually much faster, and that's what 'project-files' 
> uses under the covers.
>

I don't see a big difference: find takes 0.006 s, git ls-files 0.002 s. 
Okay, that's three times slower, but those four milliseconds are not the 
bottleneck here.  I just ran some of the timing tests again, and they are 
about ten milliseconds faster with git ls-files, which is not a huge 
difference.  (Of course I do not object to the use of git ls-files.)

>
> So if you redo your test with 'project-find-regexp' as I suggested, you 
> might discover a different slowdown multiplier.
>

I wanted to first time these things outside of Emacs, it seems to me that 
it's a more objective comparison.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 22:26                 ` Dmitry Gutov
@ 2021-03-07  1:29                   ` Dmitry Gutov
  2021-03-07  3:22                   ` Dmitry Gutov
  2021-03-07 20:16                   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 0 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-07  1:29 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

On 07.03.2021 00:26, Dmitry Gutov wrote:
> - Could you explain the change to xref--collect-matches-1 in the most 
> recent patch? In my testing it doesn't move the needle at all, and it 
> seems unnecessary because neither Grep or Ripgrep report matches on the 
> same line separately with the current arguments that we pass to them. 
> But if we did... what's the idea? Skip all subsequent matches, no matter 
> if they're far or close?

I misread the code, so please skip the second sentence.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 23:24                                                           ` Gregory Heytings
@ 2021-03-07  3:08                                                             ` Dmitry Gutov
  2021-03-07  8:13                                                               ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-07  3:08 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 07.03.2021 01:24, Gregory Heytings wrote:
> 
>>
>> 'find' is rarely the fastest way to list all the files in the project. 
>> Have you timed it alone?
>>
>> 'git ls-files' is usually much faster, and that's what 'project-files' 
>> uses under the covers.
>>
> 
> I don't see a big difference: find takes 0.006 s, git ls-files 0.002 s. 
> Okay, that's three times slower, but those four milliseconds are not the 
> bottleneck here.  I just ran some of the timing tests again, and they 
> are about ten milliseconds faster with git ls-files, which is not a huge 
> difference.  (Of course I do not object to the use of git ls-files.)

Sounds like you're testing the case of a project with not many files 
which compensate for their number in (larger) size.

That would indeed be sweet sport for using find in this scenario, so 
please consider that objection withdrawn.

>> So if you redo your test with 'project-find-regexp' as I suggested, 
>> you might discover a different slowdown multiplier.
>>
> 
> I wanted to first time these things outside of Emacs, it seems to me 
> that it's a more objective comparison.

Very well.

But testing inside Emacs is important, too. Because with the results you 
shown as of yet, the proposed alternative is twice as slow as the 
existing code in the average case. Is that right? I wouldn't like 
searches that take 200ms now take 400ms.

Emacs's overhead could alter that picture, however.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 22:26                 ` Dmitry Gutov
  2021-03-07  1:29                   ` Dmitry Gutov
@ 2021-03-07  3:22                   ` Dmitry Gutov
  2021-03-07 20:03                     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 20:16                   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-07  3:22 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On 07.03.2021 00:26, Dmitry Gutov wrote:
> - What do you think about making an effort to actually retain all the 
> matches in the output? That would mean interpreting the 
> xref-truncate-line-to value (or however the var could be renamed) as the 
> maximum number of chars to render on the line *per match*. And if there 
> is too much text between them, those parts can become "(truncated...)". 
> Your current implementation can cut off valid matches, and we probably 
> want to preserve them if feasible. OTOH, the default value could go down 
> to 200 with this approach.

Please try out the attached preparation patch.

It improves the performance of the "very long line" case drastically 
over here, while not doing any truncation yet. Looks like we regressed 
that case when we added rendering of multiple matches on the same line.

We can add the truncation feature on top of it.

Probably also in xref--collect-matches-1 (truncating the value of 
SUMMARY just before the xref-make-match call).

Alternatively, we could experiment with hiding parts of the long line 
using some display/visibility features (except the truncate-lines 
variable, that one keeps things slow). That could be done in 
xref--insert-xrefs or somewhere nearby. That is trickier, though, given 
that we'll probably want to unhide it (wholly or partially) when 
iterating over matches inside.

[-- Attachment #2: xref-insert-xrefs-sparingly.diff --]
[-- Type: text/x-patch, Size: 4839 bytes --]

diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 18fdd963fb..5c5a0508de 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -886,30 +886,24 @@ xref--insert-xrefs
                                (length (and line (format "%d" line)))))
            for line-format = (and max-line-width
                                   (format "%%%dd: " max-line-width))
-           with prev-line-key = nil
+           with prev-group = nil
+           with prev-line = nil
            do
            (xref--insert-propertized '(face xref-file-header xref-group t)
                                      group "\n")
            (cl-loop for (xref . more2) on xrefs do
                     (with-slots (summary location) xref
                       (let* ((line (xref-location-line location))
-                             (new-summary summary)
-                             (line-key (list (xref-location-group location) line))
                              (prefix
-                              (if line
-                                  (propertize (format line-format line)
-                                              'face 'xref-line-number)
-                                "  ")))
+                              (cond
+                               ((not line) "  ")
+                               ((equal line prev-line) "")
+                              (t (propertize (format line-format line)
+                                             'face 'xref-line-number)))))
                         ;; Render multiple matches on the same line, together.
-                        (when (and line (equal prev-line-key line-key))
-                          (when-let ((column (xref-location-column location)))
-                            (delete-region
-                             (save-excursion
-                               (forward-line -1)
-                               (move-to-column (+ (length prefix) column))
-                               (point))
-                             (point))
-                            (setq new-summary (substring summary column) prefix "")))
+                        (when (and (equal prev-group group)
+                                   (not (equal prev-line line)))
+                          (insert "\n"))
                         (xref--insert-propertized
                          (list 'xref-item xref
                                'mouse-face 'highlight
@@ -917,9 +911,10 @@ xref--insert-xrefs
                                'help-echo
                                (concat "mouse-2: display in another window, "
                                        "RET or mouse-1: follow reference"))
-                         prefix new-summary)
-                        (setq prev-line-key line-key)))
-                    (insert "\n"))))
+                         prefix summary)
+                        (setq prev-line line
+                              prev-group group))))
+           (insert "\n")))
 
 (defun xref--analyze (xrefs)
   "Find common filenames in XREFS.
@@ -1678,20 +1673,30 @@ xref--collect-matches
                                  syntax-needed)))))
 
 (defun xref--collect-matches-1 (regexp file line line-beg line-end syntax-needed)
-  (let (matches)
+  (let (match-pairs matches)
     (when syntax-needed
       (syntax-propertize line-end))
-    ;; FIXME: This results in several lines with the same
-    ;; summary. Solve with composite pattern?
     (while (and
             ;; REGEXP might match an empty string.  Or line.
-            (or (null matches)
+            (or (null match-pairs)
                 (> (point) line-beg))
             (re-search-forward regexp line-end t))
-      (let* ((beg-column (- (match-beginning 0) line-beg))
-             (end-column (- (match-end 0) line-beg))
+      (push (cons (match-beginning 0)
+                  (match-end 0))
+            match-pairs))
+    (setq match-pairs (nreverse match-pairs))
+    (while match-pairs
+      (let* ((beg-end (pop match-pairs))
+             (beg-column (- (car beg-end) line-beg))
+             (end-column (- (cdr beg-end) line-beg))
              (loc (xref-make-file-location file line beg-column))
-             (summary (buffer-substring line-beg line-end)))
+             (summary (buffer-substring (if matches (car beg-end) line-beg)
+                                        (if match-pairs
+                                            (caar match-pairs)
+                                          line-end))))
+        (when matches
+          (cl-decf beg-column (- (car beg-end) line-beg))
+          (cl-decf end-column (- (car beg-end) line-beg)))
         (add-face-text-property beg-column end-column 'xref-match
                                 t summary)
         (push (xref-make-match summary loc (- end-column beg-column))

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-07  3:08                                                             ` Dmitry Gutov
@ 2021-03-07  8:13                                                               ` Gregory Heytings
  2021-03-08  3:24                                                                 ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-07  8:13 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]


>> I don't see a big difference: find takes 0.006 s, git ls-files 0.002 s. 
>> Okay, that's three times slower, but those four milliseconds are not 
>> the bottleneck here.  I just ran some of the timing tests again, and 
>> they are about ten milliseconds faster with git ls-files, which is not 
>> a huge difference.  (Of course I do not object to the use of git 
>> ls-files.)
>
> Sounds like you're testing the case of a project with not many files 
> which compensate for their number in (larger) size.
>

As I said, my tests are performed on a fresly cloned copy of the Emacs 
repository (~5000 files).  It's not a huge project, but it's not a small 
one either.

>>> So if you redo your test with 'project-find-regexp' as I suggested, 
>>> you might discover a different slowdown multiplier.
>> 
>> I wanted to first time these things outside of Emacs, it seems to me 
>> that it's a more objective comparison.
>
> Very well.
>
> But testing inside Emacs is important, too.
>

Yes.  It is important to test at each step of the pipe; step N can't 
become faster than step N-1.

>
> Because with the results you shown as of yet, the proposed alternative 
> is twice as slow as the existing code in the average case. Is that 
> right? I wouldn't like searches that take 200ms now take 400ms.
>

Of course you can't get a benefit without paying a certain price.  The 
tests show that, on the Emacs repository, a search takes 250 ms instead of 
125 ms with GNU grep, and 100 ms instead of 50 ms with ripgrep.  IMO that 
price is not too high, especially not for a user dialog (I don't see how a 
user could be annoyed, or even notice, that something takes 250 ms instead 
of 125 ms), but it's just my opinion.

>
> Emacs's overhead could alter that picture, however.
>

Indeed.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-07  3:22                   ` Dmitry Gutov
@ 2021-03-07 20:03                     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08  2:48                       ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07 20:03 UTC (permalink / raw)
  To: Dmitry Gutov, juri, 46859

Hi!

>
> Please try out the attached preparation patch.
>
> It improves the performance of the "very long line" case drastically 
> over here, while not doing any truncation yet. Looks like we regressed 
> that case when we added rendering of multiple matches on the same line.
>

Yes, this seems to help a lot. Now the search is down from 11 seconds to
1.09. It is comparable to the other good efforts. 

> We can add the truncation feature on top of it.

I think we should, since moving around in the xref-buffer is still very slow.

>
> Probably also in xref--collect-matches-1 (truncating the value of 
> SUMMARY just before the xref-make-match call).
>
> Alternatively, we could experiment with hiding parts of the long line 
> using some display/visibility features (except the truncate-lines 
> variable, that one keeps things slow). That could be done in 
> xref--insert-xrefs or somewhere nearby. That is trickier, though, given 
> that we'll probably want to unhide it (wholly or partially) when 
> iterating over matches inside.

At this point I'm really thinking that truncating without bothering too
much about losing information is worth it, and the added complexity by
retaining information would only make regressions more feasible. I
assume these files are _actually_ read once every blue moon. To maximize
the speedup should be at a higher priority than retaining the matches,
IMO. In any case, if there is a hit on one of these long lines, the
current efforts will render them as results to the xref
buffer. Searching or editing these files wouldn't be emacs' strength
anyways :)

My proposal for the "best" fix would be:

- truncating long lines by default, both for grep and ripgrep
- adding some variation of the "-M <n>" value for ripgrep by default

What do you think?

--
Theo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-06 22:26                 ` Dmitry Gutov
  2021-03-07  1:29                   ` Dmitry Gutov
  2021-03-07  3:22                   ` Dmitry Gutov
@ 2021-03-07 20:16                   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 20:26                     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08  2:56                     ` Dmitry Gutov
  2 siblings, 2 replies; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07 20:16 UTC (permalink / raw)
  To: Dmitry Gutov, juri, 46859


Hi Dmitry,

> I just got around to testing this properly (sorry),

No worries

> and so far I've been 
> able to reproduce the slow behavior only when there are many matches in 
> the "big long line" file. I'm using a 500KB minified CSS as an example.
>
> When there are only a few matches, the search is relatively 
> instantaneous. So that's a weird mismatch with your reports. If you have 
> some details to add to reproduce the slowdown in the "few matches" case, 
> that could be helpful too.

Hmm. Theres always a possibility of a human error on my part during the
benchmarks, of course!

>
> I'm currently looking at the patch and trying to figure out whether we 
> could apply some smaller change, or a change not in xref--insert-xrefs 
> (which is relatively complex already) with the same benefits.
>

Yeah, I also wanted to not add too much to that function, but I couldn't
get improvements other places :)


> Also:
>
> - Could you explain the change to xref--collect-matches-1 in the most 
> recent patch? In my testing it doesn't move the needle at all, and it 
> seems unnecessary because neither Grep or Ripgrep report matches on the 
> same line separately with the current arguments that we pass to them. 
> But if we did... what's the idea? Skip all subsequent matches, no matter 
> if they're far or close?
>

Yeah, skipping subsequent matches yielded an improvement from ~1.09 to
~1.03 seconds, so not the biggest improvement, but it was consistent.
Thus I kept it. 

> - What do you think about making an effort to actually retain all the 
> matches in the output?

I see why we would want to do that, but as I mentioned in the last mail
I sent, these files are mostly "junk" anyways. However, it is probably
best to be able to retain them if we can. I just think speed should be
more important

> That would mean interpreting the xref-truncate-line-to value (or
> however the var could be renamed) as the maximum number of chars to
> render on the line *per match*. And if there is too much text between
> them, those parts can become "(truncated...)".  Your current
> implementation can cut off valid matches, and we probably want to
> preserve them if feasible. OTOH, the default value could go down to
> 200 with this approach.
>

Yeah, I had an implementation where I "snipped" between matches and
concatenated them together, but that still yielded too large a line for
my emacs on a 3 million char length file, so I scrapped that idea. I
guess it still is possible, though!

> What I mean is, we can provide the "fullest featured" default behavior, 
> one which never omits any valid matches and just truncated the line 
> context around them, and the users who want even faster searches (at the 
> cost of missing matches, esp. in find-replace scenarios) have something 
> else to customize too.

Yeah, I think this is the best approach too. Especially for grep users.

I'll still probably use
(add-to-list 
  'xref-search-program-alist
  '(ripgrep . "xargs -0 rg <C> -nH --no-messages -g '!*/' -e <R> -M 400
  --max-columns-preview | sort -t: -k1,1 -k2n,2"))

Or something to that effect anyways :)

--
Theo





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-07 20:16                   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-07 20:26                     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-08  2:56                     ` Dmitry Gutov
  1 sibling, 0 replies; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-03-07 20:26 UTC (permalink / raw)
  To: Dmitry Gutov, juri, 46859



Forgot to say: I'll probably try to make some new benchmarks with all
the approaches when I get some free time.

--
Theo






^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-07 20:03                     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-08  2:48                       ` Dmitry Gutov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-08  2:48 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

Hi again,

On 07.03.2021 22:03, Theodor Thornhill via Bug reports for GNU Emacs, 
the Swiss army knife of text editors wrote:

>> Please try out the attached preparation patch.
>>
>> It improves the performance of the "very long line" case drastically
>> over here, while not doing any truncation yet. Looks like we regressed
>> that case when we added rendering of multiple matches on the same line.
>>
> 
> Yes, this seems to help a lot. Now the search is down from 11 seconds to
> 1.09. It is comparable to the other good efforts.

Excellent!

I've pushed this change, along with some stuff it depends on, to master 
in commit 8e103ebef1.

>> We can add the truncation feature on top of it.
> 
> I think we should, since moving around in the xref-buffer is still very slow.

I see that too.

>> Probably also in xref--collect-matches-1 (truncating the value of
>> SUMMARY just before the xref-make-match call).
>>
>> Alternatively, we could experiment with hiding parts of the long line
>> using some display/visibility features (except the truncate-lines
>> variable, that one keeps things slow). That could be done in
>> xref--insert-xrefs or somewhere nearby. That is trickier, though, given
>> that we'll probably want to unhide it (wholly or partially) when
>> iterating over matches inside.
> 
> At this point I'm really thinking that truncating without bothering too
> much about losing information is worth it, and the added complexity by
> retaining information would only make regressions more feasible.

With the latest change, retaining that info should be particularly 
difficult: you adjust the SUMMARY values inside xref--collect-matches-1 
using the context information at hand, and that's almost it (mostly note 
to self: also need to update xref--outdated-p accordingly).

> I
> assume these files are _actually_ read once every blue moon. To maximize
> the speedup should be at a higher priority than retaining the matches,
> IMO. In any case, if there is a hit on one of these long lines, the
> current efforts will render them as results to the xref
> buffer. Searching or editing these files wouldn't be emacs' strength
> anyways :)

Depends on the performance improvement multiplier, I suppose.

But I'm inclined to believe that if the user did search those files, and 
did not include them in, say, project-vc-ignores value, they probably 
want to be able to see all matches. Sometimes losing valid hits can be a 
significant problem, and since Xref is implemented in an opaque way it 
is, we should make an effort not to omit information in the name of 
performance, at least while feasible.

> My proposal for the "best" fix would be:
> 
> - truncating long lines by default, both for grep and ripgrep
> - adding some variation of the "-M <n>" value for ripgrep by default
> 
> What do you think?

I think that would be a good non-default option for users who really 
know what they're doing. And it's already available for those who use 
ripgrep.

After all, there can be files out there with some long lines (not 
kilobytes long, probably, but >500 chars? why not) that aren't minified 
CSS or JS. If those were the only problem, we could as well recommend 
everybody add those to their project ignores and be done with it (which 
is what I usually do personally).

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-07 20:16                   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-03-07 20:26                     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-03-08  2:56                     ` Dmitry Gutov
  2021-03-10  2:06                       ` Dmitry Gutov
  1 sibling, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-08  2:56 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

On 07.03.2021 22:16, Theodor Thornhill via Bug reports for GNU Emacs, 
the Swiss army knife of text editors wrote:
>> That would mean interpreting the xref-truncate-line-to value (or
>> however the var could be renamed) as the maximum number of chars to
>> render on the line*per match*. And if there is too much text between
>> them, those parts can become "(truncated...)".  Your current
>> implementation can cut off valid matches, and we probably want to
>> preserve them if feasible. OTOH, the default value could go down to
>> 200 with this approach.
>>
> Yeah, I had an implementation where I "snipped" between matches and
> concatenated them together, but that still yielded too large a line for
> my emacs on a 3 million char length file, so I scrapped that idea. I
> guess it still is possible, though!

It should be able to perform better now, now that xref--insert-xrefs 
doesn't have to delete most of the text its inserted in these scenarios. 
We didn't really anticipate summary lines this long and the memory churn 
that came with them.

If you still get lines that are loo long in these cases, even with all 
extra text snipped away, hiding parts of the summary using text 
properties should be possible. I just tried putting 'invisible' on the 
whole line after column 600, and scrolling became instantaneous again.

As long as we undo these properties (or, perhaps, scroll the visible 
part?) when xref-next-line is called, the user would still be able to 
visit all matches.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-07  8:13                                                               ` Gregory Heytings
@ 2021-03-08  3:24                                                                 ` Dmitry Gutov
  2021-03-08  8:26                                                                   ` Gregory Heytings
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-08  3:24 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

Hi Gregory,

On 07.03.2021 10:13, Gregory Heytings wrote:

> As I said, my tests are performed on a fresly cloned copy of the Emacs 
> repository (~5000 files).  It's not a huge project, but it's not a small 
> one either.

Hmm, both 'find' and 'git ls-files' take a little more than that on the 
Emacs repository.

But my impression on 'find' is skewed because it performs much worse as 
soon as we try to ignore files with it. When no predicates are used, 
it's fairly fast and shouldn't be too much of a problem in this comparison.

>> Because with the results you shown as of yet, the proposed alternative 
>> is twice as slow as the existing code in the average case. Is that 
>> right? I wouldn't like searches that take 200ms now take 400ms.
>>
> 
> Of course you can't get a benefit without paying a certain price.

Well, yes and no. I have just improved performance in the case under 
discussion significantly with no loss in functionality or measurable 
loss of performance in "normal" cases.

I don't mean to be discouraging, but the benefits should be pretty great 
for us to pay the price of 2x slower matching speed.

And it wouldn't be necessary of Grep had an option to limit the 
displayed context around the match without us mucking with the regexp. 
It would solve the issue of incorrect byte position, too.

 > The
 > tests show that, on the Emacs repository, a search takes 250 ms instead
 > of 125 ms with GNU grep, and 100 ms instead of 50 ms with ripgrep.  IMO
 > that price is not too high, especially not for a user dialog (I don't
 > see how a user could be annoyed, or even notice, that something takes
 > 250 ms instead of 125 ms), but it's just my opinion.

The bigger the project, the longer it will take. Emacs is not the 
biggest project size we want to support.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-08  3:24                                                                 ` Dmitry Gutov
@ 2021-03-08  8:26                                                                   ` Gregory Heytings
  2021-03-08 11:47                                                                     ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Gregory Heytings @ 2021-03-08  8:26 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]


>> As I said, my tests are performed on a fresly cloned copy of the Emacs 
>> repository (~5000 files).  It's not a huge project, but it's not a 
>> small one either.
>
> Hmm, both 'find' and 'git ls-files' take a little more than that on the 
> Emacs repository.
>

Not on my (not recent) development laptop...

>
> And it wouldn't be necessary of Grep had an option to limit the 
> displayed context around the match without us mucking with the regexp. 
> It would solve the issue of incorrect byte position, too.
>

I just submitted a feature request for GNU grep: 
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=47001 .  If it's accepted / 
implemented, I'll do the same for ripgrep.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-08  8:26                                                                   ` Gregory Heytings
@ 2021-03-08 11:47                                                                     ` Dmitry Gutov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-08 11:47 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: 46859

On 08.03.2021 10:26, Gregory Heytings wrote:
> 
>>> As I said, my tests are performed on a fresly cloned copy of the 
>>> Emacs repository (~5000 files).  It's not a huge project, but it's 
>>> not a small one either.
>>
>> Hmm, both 'find' and 'git ls-files' take a little more than that on 
>> the Emacs repository.
>>
> 
> Not on my (not recent) development laptop...

~/v/emacs-master (master|…) $ time find . >/dev/null

________________________________________________________
Executed in   31,65 millis    fish           external
    usr time    8,78 millis    0,00 millis    8,78 millis
    sys time   23,05 millis    1,10 millis   21,95 millis

*shrug*

>> And it wouldn't be necessary of Grep had an option to limit the 
>> displayed context around the match without us mucking with the regexp. 
>> It would solve the issue of incorrect byte position, too.
>>
> 
> I just submitted a feature request for GNU grep: 
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=47001 .  If it's accepted / 
> implemented, I'll do the same for ripgrep.

Thank you.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-08  2:56                     ` Dmitry Gutov
@ 2021-03-10  2:06                       ` Dmitry Gutov
  2021-05-17 15:27                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-03-10  2:06 UTC (permalink / raw)
  To: Theodor Thornhill, juri, 46859

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

On 08.03.2021 04:56, Dmitry Gutov wrote:
> If you still get lines that are loo long in these cases, even with all 
> extra text snipped away, hiding parts of the summary using text 
> properties should be possible. I just tried putting 'invisible' on the 
> whole line after column 600, and scrolling became instantaneous again.
> 
> As long as we undo these properties (or, perhaps, scroll the visible 
> part?) when xref-next-line is called, the user would still be able to 
> visit all matches.

Here's an experimental patch that does this.

As long as there are no matches near the end of the long string, 
everything seems snappy. If we do end up scrolling to near its end, 
though, moving around that line becomes slower.

[-- Attachment #2: xref-long-line-visibility-truncation.diff --]
[-- Type: text/x-patch, Size: 2689 bytes --]

diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index c066d9dc02..8decfcbb6c 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -891,6 +891,39 @@ xref--mouse-2
       (xref--search-property 'xref-item))
   (xref-show-location-at-point))
 
+(defcustom xref-truncate-to-width 400
+  "The column to visually \"truncate\" an Xref buffer line."
+  :type 'integer)
+
+(defun xref--apply-truncation ()
+  (add-to-invisibility-spec '(ellipsis . t))
+  (let (;; FIXME: Treat the line number prefix specially.
+        (bol (line-beginning-position))
+        (eol (line-end-position))
+        (inhibit-read-only t)
+        pos)
+    (when (and (> (- eol bol)
+                  xref-truncate-to-width)
+               ;; Either truncation not applied yet, or it hides the current
+               ;; position: need to refresh.
+               (or (and (null (get-text-property (1- eol) 'invisible))
+                        (null (get-text-property bol 'invisible)))
+                   (get-text-property (point) 'invisible)))
+      (cond
+       ((< (- (point) bol) xref-truncate-to-width)
+        (setq pos (+ bol xref-truncate-to-width))
+        (remove-text-properties bol pos '(invisible))
+        (put-text-property pos eol 'invisible 'ellipsis))
+       ((< (- eol (point)) xref-truncate-to-width)
+        (setq pos (- eol xref-truncate-to-width))
+        (remove-text-properties pos eol '(invisible))
+        (put-text-property bol pos 'invisible 'ellipsis))
+       (t
+        (setq pos (- (point) (/ xref-truncate-to-width 2)))
+        (put-text-property bol pos 'invisible 'ellipsis)
+        (remove-text-properties pos (+ pos xref-truncate-to-width) '(invisible))
+        (put-text-property (+ pos xref-truncate-to-width) eol 'invisible 'ellipsis))))))
+
 (defun xref--insert-xrefs (xref-alist)
   "Insert XREF-ALIST in the current-buffer.
 XREF-ALIST is of the form ((GROUP . (XREF ...)) ...), where
@@ -934,6 +967,11 @@ xref--insert-xrefs
                         (setq prev-line line
                               prev-group group))))
            (insert "\n"))
+  (save-excursion
+    (goto-char (point-min))
+    (while (not (eobp))
+      (forward-line 1)
+      (xref--apply-truncation)))
   (run-hooks 'xref-after-update-hook))
 
 (defun xref--analyze (xrefs)
@@ -971,6 +1009,7 @@ xref--show-common-initialize
         (buffer-undo-list t))
     (erase-buffer)
     (xref--insert-xrefs xref-alist)
+    (add-hook 'post-command-hook 'xref--apply-truncation nil t)
     (goto-char (point-min))
     (setq xref--original-window (assoc-default 'window alist)
           xref--original-window-intent (assoc-default 'display-action alist))

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-03-10  2:06                       ` Dmitry Gutov
@ 2021-05-17 15:27                         ` Lars Ingebrigtsen
  2021-05-17 15:44                           ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-17 15:27 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 46859, Theodor Thornhill, juri

Dmitry Gutov <dgutov@yandex.ru> writes:

> Here's an experimental patch that does this.
>
> As long as there are no matches near the end of the long string,
> everything seems snappy. If we do end up scrolling to near its end,
> though, moving around that line becomes slower.

I think that's a fair trade-off.  I only lightly skimmed this long
thread, but this patch was never applied?  (And it was the final message
here.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-05-17 15:27                         ` Lars Ingebrigtsen
@ 2021-05-17 15:44                           ` Dmitry Gutov
  2021-05-17 16:57                             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 74+ messages in thread
From: Dmitry Gutov @ 2021-05-17 15:44 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 46859, Theodor Thornhill, juri

On 17.05.2021 18:27, Lars Ingebrigtsen wrote:
> I think that's a fair trade-off.  I only lightly skimmed this long
> thread, but this patch was never applied?  (And it was the final message
> here.)

I was looking for some user experience feedback (with possible 
subsequent tweaks, etc), but the current behavior is indeed annoying 
enough to install this anyway.

Will do that shortly.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-05-17 15:44                           ` Dmitry Gutov
@ 2021-05-17 16:57                             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-05-18  0:39                               ` Dmitry Gutov
  0 siblings, 1 reply; 74+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-05-17 16:57 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Lars Ingebrigtsen, 46859, juri

Hi and sorry for the late reply,

> I was looking for some user experience feedback (with possible subsequent tweaks, etc), but the current behavior is indeed annoying enough to install this anyway.

Agreed. When testing I found it to be a nice improvement.

> Will do that shortly.

Nice :)

—
Theo






^ permalink raw reply	[flat|nested] 74+ messages in thread

* bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el
  2021-05-17 16:57                             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-05-18  0:39                               ` Dmitry Gutov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Gutov @ 2021-05-18  0:39 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: Lars Ingebrigtsen, 46859-done, juri

Version: 28.1

On 17.05.2021 19:57, Theodor Thornhill wrote:
> Hi and sorry for the late reply,
> 
>> I was looking for some user experience feedback (with possible subsequent tweaks, etc), but the current behavior is indeed annoying enough to install this anyway.
> 
> Agreed. When testing I found it to be a nice improvement.

Very good.

I've pushed an updated version in d83db639d3:

- It make sure not to hide the line number with the ellipsis anymore.
- The option was renamed to xref-truncation-width.
- It can be set to nil to disable the feature.

Please try the new version when you have the time.





^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2021-05-18  0:39 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01 20:40 bug#46859: 28.0.50; [PATCH]: Add option to truncate long lines in xref.el Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-01 22:07 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-02 19:25 ` Juri Linkov
2021-03-02 21:13   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-02 21:37     ` Dmitry Gutov
2021-03-02 21:45       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-02 22:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-02 22:37         ` Dmitry Gutov
2021-03-03 16:13           ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-03 17:29             ` Dmitry Gutov
2021-03-03 19:54               ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-06 22:26                 ` Dmitry Gutov
2021-03-07  1:29                   ` Dmitry Gutov
2021-03-07  3:22                   ` Dmitry Gutov
2021-03-07 20:03                     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-08  2:48                       ` Dmitry Gutov
2021-03-07 20:16                   ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-07 20:26                     ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-03-08  2:56                     ` Dmitry Gutov
2021-03-10  2:06                       ` Dmitry Gutov
2021-05-17 15:27                         ` Lars Ingebrigtsen
2021-05-17 15:44                           ` Dmitry Gutov
2021-05-17 16:57                             ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-05-18  0:39                               ` Dmitry Gutov
2021-03-03  9:05     ` Juri Linkov
2021-03-03  9:52       ` Gregory Heytings
2021-03-03 12:47         ` Dmitry Gutov
2021-03-03 13:53           ` Gregory Heytings
2021-03-03 14:00             ` Dmitry Gutov
2021-03-03 15:04               ` Gregory Heytings
2021-03-03 17:11                 ` Gregory Heytings
2021-03-03 17:26                   ` Dmitry Gutov
2021-03-03 17:42                     ` Gregory Heytings
2021-03-03 19:14                       ` Dmitry Gutov
2021-03-03 19:34                         ` Gregory Heytings
2021-03-03 19:52                           ` Juri Linkov
2021-03-03 20:34                             ` Gregory Heytings
2021-03-04  3:36                               ` Eli Zaretskii
2021-03-04  9:19                                 ` Gregory Heytings
2021-03-04 14:08                                   ` Eli Zaretskii
2021-03-04 14:39                                     ` Gregory Heytings
2021-03-04 15:13                                       ` Eli Zaretskii
2021-03-04 16:47                                         ` Gregory Heytings
2021-03-04 17:13                                           ` Eli Zaretskii
2021-03-04 17:35                                             ` Gregory Heytings
2021-03-04 18:28                                               ` Eli Zaretskii
2021-03-06 12:31                                               ` Dmitry Gutov
2021-03-06 12:37                                                 ` Dmitry Gutov
2021-03-06 12:54                                                   ` Gregory Heytings
2021-03-06 14:26                                                     ` Dmitry Gutov
2021-03-06 22:47                                                       ` Gregory Heytings
2021-03-06 23:00                                                         ` Dmitry Gutov
2021-03-06 23:24                                                           ` Gregory Heytings
2021-03-07  3:08                                                             ` Dmitry Gutov
2021-03-07  8:13                                                               ` Gregory Heytings
2021-03-08  3:24                                                                 ` Dmitry Gutov
2021-03-08  8:26                                                                   ` Gregory Heytings
2021-03-08 11:47                                                                     ` Dmitry Gutov
2021-03-06 12:49                                                 ` Gregory Heytings
2021-03-06 14:07                                                   ` Dmitry Gutov
2021-03-03 20:30                           ` Dmitry Gutov
2021-03-03 21:06                             ` Gregory Heytings
2021-03-06 12:44                               ` Dmitry Gutov
2021-03-06 12:58                                 ` Gregory Heytings
2021-03-06 14:06                                   ` Dmitry Gutov
2021-03-06 22:55                                     ` Gregory Heytings
2021-03-03 19:59                         ` Juri Linkov
2021-03-04  2:50                           ` Dmitry Gutov
2021-03-04  9:24                             ` Juri Linkov
2021-03-04 17:20                               ` Dmitry Gutov
2021-03-04 17:56                                 ` Juri Linkov
2021-03-04 18:57                                   ` Dmitry Gutov
2021-03-06 12:39                                   ` Dmitry Gutov
2021-03-03 16:14       ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).