unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#49127: Performance degradation in encode_coding_object
@ 2021-06-20  6:30 Victor Nawothnig via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-20  9:04 ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Victor Nawothnig via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-06-20  6:30 UTC (permalink / raw)
  To: 49127

Hi,

All of the following applies to Emacs 27.1 and 28.0.50.

Im currently debugging a performance degradation in haskell-mode. When editing code in a terminal frame, over time as I make modifications to source code, redrawing lines during scrolling becomes continuously slower to the point that it sometimes takes up 200ms for a single line to draw. This problem disappears once the GC runs.

With gprof/prof_events I have nailed the problem to be encode_coding_object looping over all markers. In degenerate cases this list can contain millions of markers. Traversing this list is particularly slow because of the indirection being a singly linked list. Based on the fact that a GC remedies this, I’m assuming this list contains mostly  unreachable markers. When stepping through encode_coding_object with GDB after a GC this list of markers shrinks to small double digit numbers from millions.

The source of these markers appears to be looking-at in the font locking code of haskell-mode, this assumption is based on the fact that commenting out the uses of looking-at in haskell-mode prevents the accumulation of markers and thus the slowdown.

One contributing factor to all of this, is that for lsp-mode to perform adequately, one needs a relatively high gc-cons-threshold, which means GCs that would clean up the markers run more rarely, leading to higher accumulation of markers over time.

This problem only triggers in terminal frames, but not in GUI frames. Setting GDB breakpoints suggests that the GUI frame never even calls into encode_coding_object.

So far I’m torn on whether this is a bug in the haskell-mode font locking code or in Emacs. What do you think?

Kind regards,
Victor




^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-06-20  6:30 bug#49127: Performance degradation in encode_coding_object Victor Nawothnig via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-06-20  9:04 ` Eli Zaretskii
  2021-06-24 16:49   ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-06-20  9:04 UTC (permalink / raw)
  To: Victor Nawothnig; +Cc: 49127

> Date: Sun, 20 Jun 2021 08:30:24 +0200
> From:  Victor Nawothnig via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> With gprof/prof_events I have nailed the problem to be encode_coding_object looping over all markers. In degenerate cases this list can contain millions of markers. Traversing this list is particularly slow because of the indirection being a singly linked list. Based on the fact that a GC remedies this, I’m assuming this list contains mostly  unreachable markers. When stepping through encode_coding_object with GDB after a GC this list of markers shrinks to small double digit numbers from millions.
> 
> The source of these markers appears to be looking-at in the font locking code of haskell-mode, this assumption is based on the fact that commenting out the uses of looking-at in haskell-mode prevents the accumulation of markers and thus the slowdown.

Do you understand why using looking-at causes creation of markers?  If
so, can you show the details of why this happens?

> One contributing factor to all of this, is that for lsp-mode to perform adequately, one needs a relatively high gc-cons-threshold, which means GCs that would clean up the markers run more rarely, leading to higher accumulation of markers over time.

Yes, playing with GC threshold is usually a bad idea, but it is hard
to explain to people why, and they keep doing that, to their cost.

> This problem only triggers in terminal frames, but not in GUI frames. Setting GDB breakpoints suggests that the GUI frame never even calls into encode_coding_object.

Can you should a backtrace from the call to encode_coding_object,
including the Lisp backtrace (via the "xbacktrace" command)?

> So far I’m torn on whether this is a bug in the haskell-mode font locking code or in Emacs. What do you think?

Let's revisit this question after we have all the data I requested
above, okay?

Thanks.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-06-20  9:04 ` Eli Zaretskii
@ 2021-06-24 16:49   ` Eli Zaretskii
  2021-07-25  7:10     ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-06-24 16:49 UTC (permalink / raw)
  To: victor.nawothnig; +Cc: 49127

Ping!  Could you please respond to my requests below?  I'd like to
make some progress with this bug report.

> Date: Sun, 20 Jun 2021 12:04:59 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 49127@debbugs.gnu.org
> 
> > Date: Sun, 20 Jun 2021 08:30:24 +0200
> > From:  Victor Nawothnig via "Bug reports for GNU Emacs,
> >  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> > 
> > With gprof/prof_events I have nailed the problem to be encode_coding_object looping over all markers. In degenerate cases this list can contain millions of markers. Traversing this list is particularly slow because of the indirection being a singly linked list. Based on the fact that a GC remedies this, I’m assuming this list contains mostly  unreachable markers. When stepping through encode_coding_object with GDB after a GC this list of markers shrinks to small double digit numbers from millions.
> > 
> > The source of these markers appears to be looking-at in the font locking code of haskell-mode, this assumption is based on the fact that commenting out the uses of looking-at in haskell-mode prevents the accumulation of markers and thus the slowdown.
> 
> Do you understand why using looking-at causes creation of markers?  If
> so, can you show the details of why this happens?
> 
> > One contributing factor to all of this, is that for lsp-mode to perform adequately, one needs a relatively high gc-cons-threshold, which means GCs that would clean up the markers run more rarely, leading to higher accumulation of markers over time.
> 
> Yes, playing with GC threshold is usually a bad idea, but it is hard
> to explain to people why, and they keep doing that, to their cost.
> 
> > This problem only triggers in terminal frames, but not in GUI frames. Setting GDB breakpoints suggests that the GUI frame never even calls into encode_coding_object.
> 
> Can you should a backtrace from the call to encode_coding_object,
> including the Lisp backtrace (via the "xbacktrace" command)?
> 
> > So far I’m torn on whether this is a bug in the haskell-mode font locking code or in Emacs. What do you think?
> 
> Let's revisit this question after we have all the data I requested
> above, okay?
> 
> Thanks.
> 
> 
> 
> 






^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-06-24 16:49   ` Eli Zaretskii
@ 2021-07-25  7:10     ` Eli Zaretskii
  2021-08-15 15:07       ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-07-25  7:10 UTC (permalink / raw)
  To: victor.nawothnig; +Cc: 49127

Ping! Ping!  Please respond, so we could take care of this issue.

To recap: I would like to have a backtrace from the call to
encode_coding_object, including the Lisp backtrace (via the
"xbacktrace" command), so that we could see how this performance issue
happens.

TIA

> Date: Thu, 24 Jun 2021 19:49:41 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 49127@debbugs.gnu.org
> 
> Ping!  Could you please respond to my requests below?  I'd like to
> make some progress with this bug report.
> 
> > Date: Sun, 20 Jun 2021 12:04:59 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: 49127@debbugs.gnu.org
> > 
> > > Date: Sun, 20 Jun 2021 08:30:24 +0200
> > > From:  Victor Nawothnig via "Bug reports for GNU Emacs,
> > >  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> > > 
> > > With gprof/prof_events I have nailed the problem to be encode_coding_object looping over all markers. In degenerate cases this list can contain millions of markers. Traversing this list is particularly slow because of the indirection being a singly linked list. Based on the fact that a GC remedies this, I’m assuming this list contains mostly  unreachable markers. When stepping through encode_coding_object with GDB after a GC this list of markers shrinks to small double digit numbers from millions.
> > > 
> > > The source of these markers appears to be looking-at in the font locking code of haskell-mode, this assumption is based on the fact that commenting out the uses of looking-at in haskell-mode prevents the accumulation of markers and thus the slowdown.
> > 
> > Do you understand why using looking-at causes creation of markers?  If
> > so, can you show the details of why this happens?
> > 
> > > One contributing factor to all of this, is that for lsp-mode to perform adequately, one needs a relatively high gc-cons-threshold, which means GCs that would clean up the markers run more rarely, leading to higher accumulation of markers over time.
> > 
> > Yes, playing with GC threshold is usually a bad idea, but it is hard
> > to explain to people why, and they keep doing that, to their cost.
> > 
> > > This problem only triggers in terminal frames, but not in GUI frames. Setting GDB breakpoints suggests that the GUI frame never even calls into encode_coding_object.
> > 
> > Can you should a backtrace from the call to encode_coding_object,
> > including the Lisp backtrace (via the "xbacktrace" command)?
> > 
> > > So far I’m torn on whether this is a bug in the haskell-mode font locking code or in Emacs. What do you think?
> > 
> > Let's revisit this question after we have all the data I requested
> > above, okay?
> > 
> > Thanks.
> > 
> > 
> > 
> > 
> 
> 
> 
> 
> 





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-07-25  7:10     ` Eli Zaretskii
@ 2021-08-15 15:07       ` Eli Zaretskii
  2021-08-17 12:35         ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-15 15:07 UTC (permalink / raw)
  To: victor.nawothnig; +Cc: 49127

Ping! Ping! Ping! Please respond!

> Date: Sun, 25 Jul 2021 10:10:40 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 49127@debbugs.gnu.org
> 
> Ping! Ping!  Please respond, so we could take care of this issue.
> 
> To recap: I would like to have a backtrace from the call to
> encode_coding_object, including the Lisp backtrace (via the
> "xbacktrace" command), so that we could see how this performance issue
> happens.
> 
> TIA
> 
> > Date: Thu, 24 Jun 2021 19:49:41 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: 49127@debbugs.gnu.org
> > 
> > Ping!  Could you please respond to my requests below?  I'd like to
> > make some progress with this bug report.
> > 
> > > Date: Sun, 20 Jun 2021 12:04:59 +0300
> > > From: Eli Zaretskii <eliz@gnu.org>
> > > Cc: 49127@debbugs.gnu.org
> > > 
> > > > Date: Sun, 20 Jun 2021 08:30:24 +0200
> > > > From:  Victor Nawothnig via "Bug reports for GNU Emacs,
> > > >  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> > > > 
> > > > With gprof/prof_events I have nailed the problem to be encode_coding_object looping over all markers. In degenerate cases this list can contain millions of markers. Traversing this list is particularly slow because of the indirection being a singly linked list. Based on the fact that a GC remedies this, I’m assuming this list contains mostly  unreachable markers. When stepping through encode_coding_object with GDB after a GC this list of markers shrinks to small double digit numbers from millions.
> > > > 
> > > > The source of these markers appears to be looking-at in the font locking code of haskell-mode, this assumption is based on the fact that commenting out the uses of looking-at in haskell-mode prevents the accumulation of markers and thus the slowdown.
> > > 
> > > Do you understand why using looking-at causes creation of markers?  If
> > > so, can you show the details of why this happens?
> > > 
> > > > One contributing factor to all of this, is that for lsp-mode to perform adequately, one needs a relatively high gc-cons-threshold, which means GCs that would clean up the markers run more rarely, leading to higher accumulation of markers over time.
> > > 
> > > Yes, playing with GC threshold is usually a bad idea, but it is hard
> > > to explain to people why, and they keep doing that, to their cost.
> > > 
> > > > This problem only triggers in terminal frames, but not in GUI frames. Setting GDB breakpoints suggests that the GUI frame never even calls into encode_coding_object.
> > > 
> > > Can you should a backtrace from the call to encode_coding_object,
> > > including the Lisp backtrace (via the "xbacktrace" command)?
> > > 
> > > > So far I’m torn on whether this is a bug in the haskell-mode font locking code or in Emacs. What do you think?
> > > 
> > > Let's revisit this question after we have all the data I requested
> > > above, okay?
> > > 
> > > Thanks.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
       [not found] <EC5DED64-8465-45A3-B20C-8D21F70E0A34@acm.org>
@ 2021-08-16 17:43 ` Eli Zaretskii
  2021-08-16 18:06   ` Mattias Engdegård
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-16 17:43 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Mon, 16 Aug 2021 19:32:46 +0200
> Cc: Eli Zaretskii <eliz@gnu.org>, 49127@debbugs.gnu.org
> 
> It looks very much like haskell-mode is producing way too many markers, and that should be easily fixed (I wrote a tentative pull request). However, the code in encode_coding_object appears broken. Look at the first loop in that function:
> 
>   if (EQ (src_object, dst_object))
>     {
>       struct Lisp_Marker *tail;
> 
>       for (tail = BUF_MARKERS (current_buffer); tail; tail = tail->next)
> 	{
> 	  tail->need_adjustment
> 	    = tail->charpos == (tail->insertion_type ? from : to);
> 	  need_marker_adjustment |= tail->need_adjustment;
> 	}
>     }
> 
> I don't know how this could ever work. We loop through the markers in the current buffer?

Yes.  Why do you think this loop is broken?

> It is called from term.c with src_object and dst_object both being Qnil; no buffer is involved at all. At the very least the outer condition should have an "&& BUFFERP (dst_object)" added, and the loop should run through the markers of XBUFFER (dst_object) instead of current_buffer, no?
> 
> `decode_coding_object` seem to get these things right but I only took a superficial look. It's probably copy-paste errors.

It isn't a copy/paste, I think it's a simple omission.

But if you look up-thread, you will see that I asked for xbacktrace
results from the OP, to make sure I completely understand the reasons
for the performance problem.  I never got any responses for that.  So
if you can produce the Lisp backtraces in that case, and later test a
fix, I'm quite sure I know how to fix this problem.  TIA.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-16 17:43 ` Eli Zaretskii
@ 2021-08-16 18:06   ` Mattias Engdegård
  2021-08-16 18:50     ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-16 18:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

16 aug. 2021 kl. 19.43 skrev Eli Zaretskii <eliz@gnu.org>:

>> I don't know how this could ever work. We loop through the markers in the current buffer?
> 
> Yes.  Why do you think this loop is broken?

Because unless I misunderstood the code entirely, the current buffer has nothing to do with the operation at hand.

It's easy to reproduce the original problem: run Emacs in a terminal and make a buffer with many markers. See how the text displays slower with more markers. I've attached a short example; try (make-test-buffer 1000).

The attached patch fixes this problem.


[-- Attachment #2: lus.el --]
[-- Type: application/octet-stream, Size: 350 bytes --]

;;; -*- lexical-binding -*-

(defvar my-markers nil)

(defun make-test-buffer (n)
  (setq my-markers nil)
  (with-current-buffer (get-buffer-create "test-buffer")
    (erase-buffer)
    (dotimes (i 1000)
      (insert (format "%8d " i))
      (dotimes (_ n)
        (push (point-marker) my-markers))
      (insert "abcdefghijklmnopqrstuvwxyz\n"))))


[-- Attachment #3: 0001-Fix-marker-traversion-in-encode_coding_object.patch --]
[-- Type: application/octet-stream, Size: 1289 bytes --]

From d38eace17a24ed2d15a3bf069e1aaf05a495e077 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Mon, 16 Aug 2021 19:57:34 +0200
Subject: [PATCH] Fix marker traversion in encode_coding_object

* src/coding.c (encode_coding_object): Only traverse markers if we are
encoding to and from buffers, and use the right buffer when doing so.

This also fixes a performance problem when running in a terminal and
the displayed buffer has many markers (bug#49127).
Reported by Victor Nawothnig.
---
 src/coding.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/coding.c b/src/coding.c
index 87b55aecc0..599ed48b2e 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -8275,11 +8275,11 @@ encode_coding_object (struct coding_system *coding,
 
   attrs = CODING_ID_ATTRS (coding->id);
 
-  if (EQ (src_object, dst_object))
+  if (EQ (src_object, dst_object) && BUFFERP (dst_object))
     {
       struct Lisp_Marker *tail;
 
-      for (tail = BUF_MARKERS (current_buffer); tail; tail = tail->next)
+      for (tail = BUF_MARKERS (XBUFFER (dst_object)); tail; tail = tail->next)
 	{
 	  tail->need_adjustment
 	    = tail->charpos == (tail->insertion_type ? from : to);
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-16 18:06   ` Mattias Engdegård
@ 2021-08-16 18:50     ` Eli Zaretskii
  2021-08-16 20:04       ` Mattias Engdegård
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-16 18:50 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Mon, 16 Aug 2021 20:06:32 +0200
> Cc: victor.nawothnig@icloud.com, 49127@debbugs.gnu.org
> 
> > Yes.  Why do you think this loop is broken?
> 
> Because unless I misunderstood the code entirely, the current buffer has nothing to do with the operation at hand.

It does, if both src_object and dst_object are the same (current)
buffer.

> It's easy to reproduce the original problem: run Emacs in a terminal and make a buffer with many markers. See how the text displays slower with more markers. I've attached a short example; try (make-test-buffer 1000).

How can we be sure this reproduces the original issue?  The original
issue was reported from a real-life use case, not from a toy program.

> The attached patch fixes this problem.

Thanks, but this is not the whole story with that problem in
encode_coding_object.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-16 18:50     ` Eli Zaretskii
@ 2021-08-16 20:04       ` Mattias Engdegård
  2021-08-17 12:34         ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-16 20:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

16 aug. 2021 kl. 20.50 skrev Eli Zaretskii <eliz@gnu.org>:

> It does, if both src_object and dst_object are the same (current)
> buffer.

There is nothing in that function or in the comment describing it which says that the current buffer must be either. It costs us nothing to use the correct value so that's what we should do.

> How can we be sure this reproduces the original issue?  The original
> issue was reported from a real-life use case, not from a toy program.

Because I have read and tested the original code as well, which is how that toy program that reproduces the problem in a simpler but qualitatively equivalent way came to be. The patch fixes the symptoms of both the toy program and the real-life use case.

> Thanks, but this is not the whole story with that problem in
> encode_coding_object.

What is the whole story then?






^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-16 20:04       ` Mattias Engdegård
@ 2021-08-17 12:34         ` Eli Zaretskii
  2021-08-17 13:06           ` Mattias Engdegård
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-17 12:34 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Mon, 16 Aug 2021 22:04:36 +0200
> Cc: victor.nawothnig@icloud.com, 49127@debbugs.gnu.org
> 
> > Thanks, but this is not the whole story with that problem in
> > encode_coding_object.
> 
> What is the whole story then?

See the change I installed a few minutes ago.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-15 15:07       ` Eli Zaretskii
@ 2021-08-17 12:35         ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-17 12:35 UTC (permalink / raw)
  To: victor.nawothnig; +Cc: 49127-done

> Date: Sun, 15 Aug 2021 18:07:23 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 49127@debbugs.gnu.org
> 
> Ping! Ping! Ping! Please respond!

This should now be fixed on the master branch.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-17 12:34         ` Eli Zaretskii
@ 2021-08-17 13:06           ` Mattias Engdegård
  2021-08-17 14:05             ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-17 13:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

17 aug. 2021 kl. 14.34 skrev Eli Zaretskii <eliz@gnu.org>

> See the change I installed a few minutes ago.

Let's see, you took my patch and fixed one more condition that I missed (line 8301) -- thank you!
Anyway, I'm happy that it got done at last, and it's good that you took the time to write the documenting comment.






^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-17 13:06           ` Mattias Engdegård
@ 2021-08-17 14:05             ` Eli Zaretskii
  2021-08-17 16:07               ` Mattias Engdegård
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-17 14:05 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> Feedback-ID:mattiase@acm.or
> From: Mattias Engdegård <mattiase@acm.org>
> Date: Tue, 17 Aug 2021 15:06:58 +0200
> Cc: victor.nawothnig@icloud.com, 49127@debbugs.gnu.org
> 
> 17 aug. 2021 kl. 14.34 skrev Eli Zaretskii <eliz@gnu.org>
> 
> > See the change I installed a few minutes ago.
> 
> Let's see, you took my patch and fixed one more condition that I missed (line 8301) -- thank you!

No, it's you who took _my_ patch, removed one of its parts, and deleted
the commentary.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-17 14:05             ` Eli Zaretskii
@ 2021-08-17 16:07               ` Mattias Engdegård
  2021-08-17 17:16                 ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-17 16:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

17 aug. 2021 kl. 16.05 skrev Eli Zaretskii <eliz@gnu.org>:

> No, it's you who took _my_ patch, removed one of its parts, and deleted
> the commentary.

Yes, I wanted to have all the credit to my name! All mine!

Seriously, I didn't mean to imply that you did anything improper, and I'm genuinely pleased that we ended up with a good solution.

This business makes me wonder if there are more cases where the linear marker list causes bad performance. Probably not very often but it's not unreasonable for a mode to have many data structures with many (live) markers into the text, and that would lead to text changes going from O(1) to O(text size). In any case, not anything to worry about right now.







^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-17 16:07               ` Mattias Engdegård
@ 2021-08-17 17:16                 ` Eli Zaretskii
  2021-08-18 11:04                   ` Mattias Engdegård
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-17 17:16 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Tue, 17 Aug 2021 18:07:10 +0200
> Cc: victor.nawothnig@icloud.com, 49127@debbugs.gnu.org
> 
> This business makes me wonder if there are more cases where the linear marker list causes bad performance. Probably not very often but it's not unreasonable for a mode to have many data structures with many (live) markers into the text, and that would lead to text changes going from O(1) to O(text size). In any case, not anything to worry about right now.

It's a problem to have many markers in a buffer.  Not only for code
that searches them linearly, but also for stuff like converting
between character and byte positions, something that we do a lot in
the most inner loops of our code.  Lisp programs that produce gobs of
markers should be ideally redesigned not to do so.  Unless we change
the way we store markers to make that a non-issue.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-17 17:16                 ` Eli Zaretskii
@ 2021-08-18 11:04                   ` Mattias Engdegård
  2021-08-18 11:43                     ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-18 11:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

17 aug. 2021 kl. 19.16 skrev Eli Zaretskii <eliz@gnu.org>:

> It's a problem to have many markers in a buffer.  Not only for code
> that searches them linearly, but also for stuff like converting
> between character and byte positions, something that we do a lot in
> the most inner loops of our code.  Lisp programs that produce gobs of
> markers should be ideally redesigned not to do so.

I agree; the haskell-mode usage was misguided in this respect. Marker-intensive modes could still be justified occasionally: consider an index of all definitions and uses of each variable, function and type in a program, produced by an expensive compilation that cannot be re-run for each little change. This could easily amount to thousands of markers in a big file.

>  Unless we change
> the way we store markers to make that a non-issue.

We probably should but I agree that marker reduction should be the first remedy to try.
Improving the GC so that it's cheaper to run frequent collections would also help in the common case where most markers are dead (as in this case).






^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 11:04                   ` Mattias Engdegård
@ 2021-08-18 11:43                     ` Eli Zaretskii
  2021-08-18 12:21                       ` Mattias Engdegård
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-18 11:43 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Wed, 18 Aug 2021 13:04:12 +0200
> Cc: victor.nawothnig@icloud.com, 49127@debbugs.gnu.org
> 
> consider an index of all definitions and uses of each variable, function and type in a program, produced by an expensive compilation that cannot be re-run for each little change. This could easily amount to thousands of markers in a big file.

They could have used text properties instead, no?





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 11:43                     ` Eli Zaretskii
@ 2021-08-18 12:21                       ` Mattias Engdegård
  2021-08-18 13:23                         ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-18 12:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

18 aug. 2021 kl. 13.43 skrev Eli Zaretskii <eliz@gnu.org>:

>> consider an index of all definitions and uses of each variable, function and type in a program, produced by an expensive compilation that cannot be re-run for each little change. This could easily amount to thousands of markers in a big file.
> 
> They could have used text properties instead, no?

Yes and no. That works for the direction from a point in the buffer but not the other way: go to a specific place in the buffer, such as a definition or all references of something. It could be something selected from a tree in the sidebar etc or be implicit (as in "go to the superclass of the class at point").

The alternative is to use plain offsets instead, and update them inside buffer-change callback functions -- but then we are essentially doing exactly what the Emacs marker implementation would be doing, only in Elisp instead of C so probably slower, unless we use more clever data structures.

Emacs isn't badly designed -- the marker implementation is fine if we assume there are only a few of them per buffer, which is mostly the case. But it isn't scalable, and also adversely affected by large amounts of garbage markers.







^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 12:21                       ` Mattias Engdegård
@ 2021-08-18 13:23                         ` Eli Zaretskii
  2021-08-18 13:32                           ` Mattias Engdegård
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-18 13:23 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Wed, 18 Aug 2021 14:21:22 +0200
> Cc: victor.nawothnig@icloud.com, 49127@debbugs.gnu.org
> 
> > They could have used text properties instead, no?
> 
> Yes and no. That works for the direction from a point in the buffer but not the other way: go to a specific place in the buffer, such as a definition or all references of something. It could be something selected from a tree in the sidebar etc or be implicit (as in "go to the superclass of the class at point").

Text property search doesn't fit the bill?

> Emacs isn't badly designed -- the marker implementation is fine if we assume there are only a few of them per buffer, which is mostly the case. But it isn't scalable, and also adversely affected by large amounts of garbage markers.

These situations usually mean we lack some infrastructure, and the
Lisp program uses what's available, with bad results.  A better
solution is to design and implement the missing infrastructure
instead.

The problem with Emacs is not the design, it's that in many cases,
instead of extending the design where something is missing, Lisp
programmers tend to use the existing features outside of their
intended purpose.  IOW, our design is not evolving enough according to
the needs, not in a small measure because those needs are not always
communicated to us.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 13:23                         ` Eli Zaretskii
@ 2021-08-18 13:32                           ` Mattias Engdegård
  2021-08-18 13:39                             ` Eli Zaretskii
  2021-08-18 14:34                             ` Lars Ingebrigtsen
  0 siblings, 2 replies; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-18 13:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

18 aug. 2021 kl. 15.23 skrev Eli Zaretskii <eliz@gnu.org>:

> Text property search doesn't fit the bill?

Oh it does, except that it's linear in the size of the buffer, and that doesn't really scale either. For a single interactive lookup this might not be too bad but there might be programmatic uses that iterate.

> These situations usually mean we lack some infrastructure, and the
> Lisp program uses what's available, with bad results.  A better
> solution is to design and implement the missing infrastructure
> instead.

Could be, but markers is one type of infrastructure, and implementing something else for the same purpose is a bit of a waste compared to just making markers faster.

> The problem with Emacs is not the design, it's that in many cases,
> instead of extending the design where something is missing, Lisp
> programmers tend to use the existing features outside of their
> intended purpose.

Very true. We have probably done our job for the time being, but let's keep our eyes open for uses (legitimate or not) that stress the marker system to the point of disappointment, and consider what to do then.






^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 13:32                           ` Mattias Engdegård
@ 2021-08-18 13:39                             ` Eli Zaretskii
  2021-08-18 13:54                               ` Mattias Engdegård
  2021-08-18 14:34                             ` Lars Ingebrigtsen
  1 sibling, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-18 13:39 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Wed, 18 Aug 2021 15:32:11 +0200
> Cc: victor.nawothnig@icloud.com, 49127@debbugs.gnu.org
> 
> 18 aug. 2021 kl. 15.23 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Text property search doesn't fit the bill?
> 
> Oh it does, except that it's linear in the size of the buffer

It is?  Don't we use the fact that properties are stored in an
interval tree?





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 13:39                             ` Eli Zaretskii
@ 2021-08-18 13:54                               ` Mattias Engdegård
  2021-08-18 13:59                                 ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-18 13:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

18 aug. 2021 kl. 15.39 skrev Eli Zaretskii <eliz@gnu.org>:

> It is?  Don't we use the fact that properties are stored in an
> interval tree?

I could very well be wrong about this, but I believe that the interval tree is indexed by location, so that we can quickly find a property given an offset. Searching for a particular property or property value requires going through all properties.






^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 13:54                               ` Mattias Engdegård
@ 2021-08-18 13:59                                 ` Eli Zaretskii
  2021-08-18 15:24                                   ` Mattias Engdegård
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-18 13:59 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Wed, 18 Aug 2021 15:54:21 +0200
> Cc: victor.nawothnig@icloud.com, 49127@debbugs.gnu.org
> 
> 18 aug. 2021 kl. 15.39 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > It is?  Don't we use the fact that properties are stored in an
> > interval tree?
> 
> I could very well be wrong about this, but I believe that the interval tree is indexed by location, so that we can quickly find a property given an offset. Searching for a particular property or property value requires going through all properties.

next_interval doesn't look like linear search to me.





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 13:32                           ` Mattias Engdegård
  2021-08-18 13:39                             ` Eli Zaretskii
@ 2021-08-18 14:34                             ` Lars Ingebrigtsen
  2021-08-20 23:24                               ` Michael Welsh Duggan
  1 sibling, 1 reply; 27+ messages in thread
From: Lars Ingebrigtsen @ 2021-08-18 14:34 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 49127, victor.nawothnig

Mattias Engdegård <mattiase@acm.org> writes:

> Could be, but markers is one type of infrastructure, and implementing
> something else for the same purpose is a bit of a waste compared to
> just making markers faster.

I'm all for making markers faster -- users do stumble onto this problem,
is my impression.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 13:59                                 ` Eli Zaretskii
@ 2021-08-18 15:24                                   ` Mattias Engdegård
  0 siblings, 0 replies; 27+ messages in thread
From: Mattias Engdegård @ 2021-08-18 15:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49127, victor.nawothnig

18 aug. 2021 kl. 15.59 skrev Eli Zaretskii <eliz@gnu.org>:

> next_interval doesn't look like linear search to me.

That's right, but it doesn't search for a particular property or property value -- it is used by next-property-change which only wants the change of some property. To find the position of a particular property, we would at least need to call something like next-single-property-change, which calls next_interval in a loop.






^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-18 14:34                             ` Lars Ingebrigtsen
@ 2021-08-20 23:24                               ` Michael Welsh Duggan
  2021-08-21  6:34                                 ` Eli Zaretskii
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Welsh Duggan @ 2021-08-20 23:24 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 49127, Mattias Engdegård, victor.nawothnig

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Mattias Engdegård <mattiase@acm.org> writes:
>
>> Could be, but markers is one type of infrastructure, and implementing
>> something else for the same purpose is a bit of a waste compared to
>> just making markers faster.
>
> I'm all for making markers faster -- users do stumble onto this problem,
> is my impression.

Could this be added to etc/TODO?  I have some ideas regarding this, but
I don't know if I will be able to get to it (or get release permission
from my place of work).

-- 
Michael Welsh Duggan
(md5i@md5i.com)





^ permalink raw reply	[flat|nested] 27+ messages in thread

* bug#49127: Performance degradation in encode_coding_object
  2021-08-20 23:24                               ` Michael Welsh Duggan
@ 2021-08-21  6:34                                 ` Eli Zaretskii
  0 siblings, 0 replies; 27+ messages in thread
From: Eli Zaretskii @ 2021-08-21  6:34 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: 49127, mattiase, larsi, victor.nawothnig

> From: Michael Welsh Duggan <mwd@md5i.com>
> Date: Fri, 20 Aug 2021 19:24:05 -0400
> Cc: 49127@debbugs.gnu.org,
>  Mattias Engdegård <mattiase@acm.org>,
>  victor.nawothnig@icloud.com
> 
> > I'm all for making markers faster -- users do stumble onto this problem,
> > is my impression.
> 
> Could this be added to etc/TODO?

Thanks, done.





^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-08-21  6:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-20  6:30 bug#49127: Performance degradation in encode_coding_object Victor Nawothnig via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-06-20  9:04 ` Eli Zaretskii
2021-06-24 16:49   ` Eli Zaretskii
2021-07-25  7:10     ` Eli Zaretskii
2021-08-15 15:07       ` Eli Zaretskii
2021-08-17 12:35         ` Eli Zaretskii
     [not found] <EC5DED64-8465-45A3-B20C-8D21F70E0A34@acm.org>
2021-08-16 17:43 ` Eli Zaretskii
2021-08-16 18:06   ` Mattias Engdegård
2021-08-16 18:50     ` Eli Zaretskii
2021-08-16 20:04       ` Mattias Engdegård
2021-08-17 12:34         ` Eli Zaretskii
2021-08-17 13:06           ` Mattias Engdegård
2021-08-17 14:05             ` Eli Zaretskii
2021-08-17 16:07               ` Mattias Engdegård
2021-08-17 17:16                 ` Eli Zaretskii
2021-08-18 11:04                   ` Mattias Engdegård
2021-08-18 11:43                     ` Eli Zaretskii
2021-08-18 12:21                       ` Mattias Engdegård
2021-08-18 13:23                         ` Eli Zaretskii
2021-08-18 13:32                           ` Mattias Engdegård
2021-08-18 13:39                             ` Eli Zaretskii
2021-08-18 13:54                               ` Mattias Engdegård
2021-08-18 13:59                                 ` Eli Zaretskii
2021-08-18 15:24                                   ` Mattias Engdegård
2021-08-18 14:34                             ` Lars Ingebrigtsen
2021-08-20 23:24                               ` Michael Welsh Duggan
2021-08-21  6:34                                 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).