all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
@ 2020-08-24  8:25 ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2020-08-24 10:37 ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2020-08-24  8:25 UTC (permalink / raw)
  To: 43016

[-- Attachment #1: Type: text/plain, Size: 719 bytes --]

I have a code which pretty prints data in json:

...
(with-temp-buffer
(insert data)
(json-pretty-print-buffer)
(write-file "data.json"))
...

This code runs for several minutes (I didn't time it, but it's about 1-2 minutes) with Emacs 27.1 It takes several seconds with Emacs 26. Data is about 1MB on disk in pretty printed elisp format.

I tried profiling with this result:

- progn 7486 98%
- json-pretty-print-buffer 7438 98%
- json-pretty-print 7438 98%
- replace-region-contents 7438 98%
+ #<compiled 0x3854909> 198 2%
+ insert 43 0%
+ write-file 5 0%

Apparently, replace-region-contents takes 98% of the CPU time.

Using Windows version from GNU ftp :

GNU Emacs 27.1 (build 1, x86_64-w64-mingw32) of 2020-08-12

[-- Attachment #2: Type: text/html, Size: 3216 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24  8:25 bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2020-08-24 10:37 ` Eli Zaretskii
  2020-08-24 12:13   ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2020-08-24 10:37 UTC (permalink / raw)
  To: ljell, Tassilo Horn; +Cc: 43016

> Date: Mon, 24 Aug 2020 08:25:14 +0000
> From: ljell via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> I have a code which pretty prints data in json:
> 
> ...
>   (with-temp-buffer
>     (insert data)
>     (json-pretty-print-buffer)
>     (write-file "data.json"))
> ...
> 
> This code runs for several minutes (I didn't time it, but it's about 1-2 minutes) with Emacs 27.1  It takes
> several seconds with Emacs 26.  Data is about 1MB on disk in pretty printed elisp format.
> 
> I tried profiling with this result:
> 
>           - progn                                                7486  98%
>            - json-pretty-print-buffer                            7438  98%
>             - json-pretty-print                                  7438  98%
>              - replace-region-contents                           7438  98%
>               + #<compiled 0x3854909>                             198   2%
>            + insert                                                43   0%
>            + write-file                                             5   0%
> 
> Apparently, replace-region-contents  takes 98% of the CPU time.

Thank you for your report and the data.  Is it possible to have the
file with which you've seen this problem?

Tassilo, could you please look into this regression?





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 10:37 ` Eli Zaretskii
@ 2020-08-24 12:13   ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2020-08-24 12:29     ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2020-08-24 17:14     ` Tassilo Horn
  0 siblings, 2 replies; 15+ messages in thread
From: ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2020-08-24 12:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43016@debbugs.gnu.org, Tassilo Horn

[-- Attachment #1: Type: text/plain, Size: 525 bytes --]

>
> Thank you for your report and the data. Is it possible to have the
> file with which you've seen this problem?
>

Looks like the problem occurs only with accented characters present, so I created a json file with them. Attached.

Warning: Long lines!

After opening the file, just run M-x json-pretty-print-buffer  on it to see how long it takes.


If you run M-x json-pretty-print-buffer on the already pretty file then it's much faster,so maybe it's related to the single long line generated by json-encode.

[-- Attachment #2: test.rar --]
[-- Type: application/octet-stream, Size: 516805 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 12:13   ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2020-08-24 12:29     ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2020-08-24 17:14     ` Tassilo Horn
  1 sibling, 0 replies; 15+ messages in thread
From: ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2020-08-24 12:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 43016@debbugs.gnu.org, Tassilo Horn


>
> If you run M-x json-pretty-print-buffer on the already pretty file then it's much faster,so maybe it's related to the single long line generated by json-encode.


The json-encode part was missing in my first post, so the relevant code looks like this:

...
  (with-temp-buffer
    (insert (json-encode data))
    (json-pretty-print-buffer)
    (write-file "data.json"))
...


So I insert json encoded lisp data into a temp buffer and then run prettyfing on it, so the json file is more readable and there are no long lines. This code took a few seconds on 26 and it takes much longer with 27.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 12:13   ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2020-08-24 12:29     ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2020-08-24 17:14     ` Tassilo Horn
  2020-08-24 17:21       ` Lars Ingebrigtsen
                         ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Tassilo Horn @ 2020-08-24 17:14 UTC (permalink / raw)
  To: ljell; +Cc: 43016@debbugs.gnu.org, Paul Eggert

[-- Attachment #1: Type: text/plain, Size: 2817 bytes --]

ljell <laszlomail@protonmail.com> writes:

Hi all,

>> Thank you for your report and the data. Is it possible to have the
>> file with which you've seen this problem?
>
> Looks like the problem occurs only with accented characters present,
> so I created a json file with them. Attached.

I can easily reproduce the problem using this file.

Back when I introduced that feature, i.e., that json pretty printing
uses replace-region-contents (which in turn uses
replace-buffer-contents), I added a parameter MAX-SECS to those
functions which should restrict the compareseq call's runtime to that
amount of seconds and give up if it takes longer, in which case the
replacement is just a delete + insert (which hasn't the benefit of
retaining point and marks but is fast).  That's controllable via the
variable json-pretty-print-max-secs whose default is 2.0 seconds.

In 975893b2290 Paul (in Cc) improved that change so that the context
struct given to compareseq doesn't store the MAX-SECS but a timespec
time_limit computed beforehand and used later in the
compareseq_early_abort tests later on saving conversions there.  (The
change looks good to me with my very limited C knowledge.)

The actual problem here is that for the specific test file, there are
(only) 321 compareseq_early_abort tests performed and it seems that the
first 320 are executed almost immediately (before MAX-SECS are over),
then no test is performed for minutes, and then a last test is performed
leading to an early_abort of compareseq followed by delete + insert.

This "early_abort if it takes too long" thingy doesn't work if the
compareseq_early_abort tests aren't performed somewhat regularly.  If
there can be minutes between two consecutive tests like here, then this
whole thing doesn't work out.

replace-region-contents and replace-buffer-contents have another
MAX-COSTS parameter which json.el sets to 64 (with a FIXME since I had
no clue what a sensible value was but extracted that value from
edidfns.c where it was hard-coded before).  I've reduced the value in
the json.el call to 16 for testing and indeed it became faster but still
too slow for the very same reason.  Now there were only 65
compareseq_early_abort tests but again 64 occurred almost immediately
whereas the last one aborting the comparison happened after ~40 seconds.

So basically I'd say the problem is in gnulib's compareseq.  If it can't
be fixed there, I see no other possibility than to stop using
replace-buffer/region-contents in json.el (and wherever it might also be
used).  That would be sad because except for the performance in some
cases, it's very nice. :-(

Maybe Paul has some idea (or knows compareseq better)?

I'm attaching a patch adding some printfs which I used to diagnose the
issue.  (Yes, I'm cannot operate GDB without guidance...)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: replace_buffer_contents_debugging.patch --]
[-- Type: text/x-patch, Size: 1580 bytes --]

diff --git a/src/editfns.c b/src/editfns.c
index 949f3825a3..2037d1a133 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -2000,6 +2000,7 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
   else
     CHECK_FIXNUM (max_costs);
 
+  printf("max_secs: %f\n", XFLOAT_DATA (max_secs));
   struct timespec time_limit = make_timespec (0, -1);
   if (!NILP (max_secs))
     {
@@ -2008,7 +2009,11 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
 			     lisp_time_argument (max_secs)),
 	tmax = make_timespec (TYPE_MAXIMUM (time_t), TIMESPEC_HZ - 1);
       if (timespec_cmp (tlim, tmax) < 0)
-	time_limit = tlim;
+        {
+          time_limit = tlim;
+          printf("time_limit: %lld.%.9ld\n",
+                 (long long)time_limit.tv_sec, time_limit.tv_nsec);
+        }
     }
 
   /* Micro-optimization: Casting to size_t generates much better
@@ -2038,6 +2043,10 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
      later.  */
   bool early_abort = compareseq (0, size_a, 0, size_b, false, &ctx);
 
+  printf("early_abort: %d\n", early_abort);
+  printf("early_abort_tests: %u\n", ctx.early_abort_tests);
+  printf("size_a: %ld, size_b: %ld\n", size_a, size_b);
+
   if (early_abort)
     {
       del_range (min_a, ZV);
@@ -2186,6 +2195,8 @@ buffer_chars_equal (struct context *ctx,
 static bool
 compareseq_early_abort (struct context *ctx)
 {
+  printf("early_abort_tests\n");
+  ctx->early_abort_tests++;
   if (ctx->time_limit.tv_nsec < 0)
     return false;
   return timespec_cmp (ctx->time_limit, current_timespec ()) < 0;

[-- Attachment #3: Type: text/plain, Size: 14 bytes --]


Bye,
Tassilo

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 17:14     ` Tassilo Horn
@ 2020-08-24 17:21       ` Lars Ingebrigtsen
  2020-08-24 17:25         ` Philipp Stephani
  2020-08-24 17:27       ` Eli Zaretskii
  2020-08-24 23:35       ` Paul Eggert
  2 siblings, 1 reply; 15+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-24 17:21 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: ljell, 43016@debbugs.gnu.org, Paul Eggert

Tassilo Horn <tsdh@gnu.org> writes:

> That would be sad because except for the performance in some
> cases, it's very nice. :-(

It is indeed very nice.  My guess is that most of the usages of the
function is for shortish JSON structures.  Perhaps we could just
introduce an arbitrary switchover between using replace-region-contents
and plain replacing -- like a hundred K or something?  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 17:21       ` Lars Ingebrigtsen
@ 2020-08-24 17:25         ` Philipp Stephani
  0 siblings, 0 replies; 15+ messages in thread
From: Philipp Stephani @ 2020-08-24 17:25 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: ljell, Tassilo Horn, 43016@debbugs.gnu.org, Paul Eggert

Am Mo., 24. Aug. 2020 um 19:22 Uhr schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> Tassilo Horn <tsdh@gnu.org> writes:
>
> > That would be sad because except for the performance in some
> > cases, it's very nice. :-(
>
> It is indeed very nice.  My guess is that most of the usages of the
> function is for shortish JSON structures.  Perhaps we could just
> introduce an arbitrary switchover between using replace-region-contents
> and plain replacing -- like a hundred K or something?
>


That's basically what the MAX-SECS and MAX-COST parameters are
supposed to do. We should definitely fix compareseq so that it takes
these costs into account more strictly (e.g. checking at least once
per million iterations or so).





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 17:14     ` Tassilo Horn
  2020-08-24 17:21       ` Lars Ingebrigtsen
@ 2020-08-24 17:27       ` Eli Zaretskii
  2020-08-24 19:15         ` Tassilo Horn
  2020-08-24 23:35       ` Paul Eggert
  2 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2020-08-24 17:27 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: laszlomail, 43016, eggert

> From: Tassilo Horn <tsdh@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,
> 	"43016@debbugs.gnu.org" <43016@debbugs.gnu.org>,
> 	Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 24 Aug 2020 19:14:50 +0200
> 
> So basically I'd say the problem is in gnulib's compareseq.  If it can't
> be fixed there, I see no other possibility than to stop using
> replace-buffer/region-contents in json.el (and wherever it might also be
> used).  That would be sad because except for the performance in some
> cases, it's very nice. :-(

Could we decide whether to use replace-* functions dynamically, based
on the size of the region/buffer being prettified?

Btw, there's another problem with compareseq, see bug#42931.  I guess
we need to add another criterion for early_abort, based on depth of
recursion?

Thanks.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 17:27       ` Eli Zaretskii
@ 2020-08-24 19:15         ` Tassilo Horn
  2020-08-24 19:36           ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Tassilo Horn @ 2020-08-24 19:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: laszlomail, 43016, eggert

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> So basically I'd say the problem is in gnulib's compareseq.  If it
>> can't be fixed there, I see no other possibility than to stop using
>> replace-buffer/region-contents in json.el (and wherever it might also
>> be used).  That would be sad because except for the performance in
>> some cases, it's very nice. :-(
>
> Could we decide whether to use replace-* functions dynamically, based
> on the size of the region/buffer being prettified?

No, the size is just a secondary measure here.  I've successfully and
quickly prettified much larger JSON files.

I've just tried with some other sample json file which is about three
times the size of the file in this report.  That also triggered an
early_abort of compareseq but within MAX-SECS time.  And here I have
almost half a million compareseq_early_abort tests, not just 321.

I've now modified it to approximately the same size as the file of ljell
(the reporter).  Then it doesn't trigger early_abort, is fast, and
slightly below 300.000 early_abort tests are performed.

[-- Attachment #2: sample.json.gz --]
[-- Type: application/gzip, Size: 83898 bytes --]

[-- Attachment #3: Type: text/plain, Size: 436 bytes --]


So the question is why the file in this report with about the same size
and number of changes between minimized and prettified version result in
such strange numbers?

ljell is right, it seems to have to do with the non-ASCII characters.
In my sample.json.gz from above, I've just replaced every "e" with an
"Ê" (except in true/false literals).  When I prettify that, it aborts
early (fast) just after 449 early_abort_tests.

[-- Attachment #4: sample-non-ascii.json.gz --]
[-- Type: application/gzip, Size: 86152 bytes --]

[-- Attachment #5: Type: text/plain, Size: 1231 bytes --]


So just replacing "e" with an "Ê" changed "compares in time with 300.000
early_abort_tests" to "doesn't compare in time and makes only 449
early_abort_tests in that time".  The only difference to ljell's test
file is that with mine, there doesn't seem to be a big gap between the
last early_abort_test returning false and the one returning true.

> Btw, there's another problem with compareseq, see bug#42931.  I guess
> we need to add another criterion for early_abort, based on depth of
> recursion?

Ouch, I can also reproduce that.  But as said by Philipp Stephani in
that report, I'd consider protecting against too deep recursion a job of
compareseq, not of its callers.

And I think the issue in that report would also vanish if compareseq
would somehow ensure that its EARLY_ABORT expression would be evaluated
regularly, i.e., there were no long periods without check.  That's the
thing I also observe with the 18 MB file from bug#42931: gazillions of
early_abort_tests early on, then a long phase with no test, and
eventually the segfault.  (Of course it's possible that other files
would result in a quick segfault due to unbounded recursion where that
wouldn't help either...)

Bye,
Tassilo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 19:15         ` Tassilo Horn
@ 2020-08-24 19:36           ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2020-08-24 19:36 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: laszlomail, 43016, eggert

> From: Tassilo Horn <tsdh@gnu.org>
> Cc: laszlomail@protonmail.com,  43016@debbugs.gnu.org,  eggert@cs.ucla.edu
> Date: Mon, 24 Aug 2020 21:15:42 +0200
> 
> ljell is right, it seems to have to do with the non-ASCII characters.
> In my sample.json.gz from above, I've just replaced every "e" with an
> "Ê" (except in true/false literals).  When I prettify that, it aborts
> early (fast) just after 449 early_abort_tests.
> 
> So just replacing "e" with an "Ê" changed "compares in time with 300.000
> early_abort_tests" to "doesn't compare in time and makes only 449
> early_abort_tests in that time".  The only difference to ljell's test
> file is that with mine, there doesn't seem to be a big gap between the
> last early_abort_test returning false and the one returning true.

I guess it's somehow related to the fact that buffer_chars_equal is
optimized for pure-ASCII buffers, see how we compute a_unibyte and
b_unibyte members of the context struct.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 17:14     ` Tassilo Horn
  2020-08-24 17:21       ` Lars Ingebrigtsen
  2020-08-24 17:27       ` Eli Zaretskii
@ 2020-08-24 23:35       ` Paul Eggert
  2020-08-25  6:10         ` Eli Zaretskii
  2020-08-25 17:30         ` Tassilo Horn
  2 siblings, 2 replies; 15+ messages in thread
From: Paul Eggert @ 2020-08-24 23:35 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: ljell, 43016, Bruno Haible

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

On 8/24/20 10:14 AM, Tassilo Horn wrote:
> The actual problem here is that for the specific test file, there are
> (only) 321 compareseq_early_abort tests performed and it seems that the
> first 320 are executed almost immediately (before MAX-SECS are over),
> then no test is performed for minutes, and then a last test is performed
> leading to an early_abort of compareseq followed by delete + insert.

This is due to the lineage of diffseq; if memory serves, its EARLY_ABORT macro 
was introduced for approximate string comparison in GNU gettext, where 
NOTE_INSERT and NOTE_DELETE are associated with doing something that can cause 
the comparison to fail. (It's not clear to me why EARLY_ABORT was broken out of 
NOTE_INSERT and NOTE_DELETE.)

Emacs uses diffseq differently, on much-larger vectors; and it can be CPU-bound 
within diag, which never calls EARLY_ABORT. Although we could add some 
EARLY_ABORT calls to diag Emacs doesn't need that, as it can check for CPU time 
exhaustion in XREF_YREF_EQUAL and longjmp out if there's a problem. I did that 
by installing the first attached patch into Emacs master, which fixes the 
problem (at least for me). I installed the second attached patch into Emacs 
master to fix some other minor issues I noticed while in the neighborhood.

I suppose the first attached patch might be a candidate for backporting into 
Emacs 27 if this bug is considered to be serious enough.

I plan to follow up on the diffseq stack-overflow issue in Bug#42931 in a 
separate email.

I am cc'ing this email to Bruno Haible to give him a heads-up, as he did the 
diffseq engineering for gettext. Bruno, the bug report thread is here:

https://bugs.gnu.org/43016

[-- Attachment #2: 0001-Fix-replace-region-contents-performance-bug.patch --]
[-- Type: text/x-patch, Size: 4157 bytes --]

From 08a6d14e4116c74284c12dd1319780afbcbbfd1d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 24 Aug 2020 13:12:51 -0700
Subject: [PATCH 1/2] Fix replace-region-contents performance bug

* src/editfns.c (rbc_quitcounter): Remove; the quitcounter
is now part of the context.
(EXTRA_CONTEXT_FIELDS): Remove unused member early_abort_tests.
Add jmp, quitcounter.
(Freplace_buffer_contents): Use setjmp/longjmp to recover from
a compareseq that runs too long.  Omit unnecessary rarely_quit
call.
(buffer_chars_equal): Occasionally check for early abort and
longjmp out if so (Bug#43016).
---
 src/editfns.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/src/editfns.c b/src/editfns.c
index 949f3825a3..a5368c59da 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -1877,9 +1877,6 @@ DEFUN ("compare-buffer-substrings", Fcompare_buffer_substrings, Scompare_buffer_
 #undef EQUAL
 #define USE_HEURISTIC
 
-/* Counter used to rarely_quit in replace-buffer-contents.  */
-static unsigned short rbc_quitcounter;
-
 #define XVECREF_YVECREF_EQUAL(ctx, xoff, yoff)  \
   buffer_chars_equal ((ctx), (xoff), (yoff))
 
@@ -1900,7 +1897,8 @@ #define EXTRA_CONTEXT_FIELDS                    \
   unsigned char *deletions;                     \
   unsigned char *insertions;			\
   struct timespec time_limit;			\
-  unsigned int early_abort_tests;
+  sys_jmp_buf jmp;				\
+  unsigned short quitcounter;
 
 #define NOTE_DELETE(ctx, xoff) set_bit ((ctx)->deletions, (xoff))
 #define NOTE_INSERT(ctx, yoff) set_bit ((ctx)->insertions, (yoff))
@@ -2029,14 +2027,17 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
     .heuristic = true,
     .too_expensive = XFIXNUM (max_costs),
     .time_limit = time_limit,
-    .early_abort_tests = 0
   };
   memclear (ctx.deletions, del_bytes);
   memclear (ctx.insertions, ins_bytes);
 
   /* compareseq requires indices to be zero-based.  We add BEGV back
      later.  */
-  bool early_abort = compareseq (0, size_a, 0, size_b, false, &ctx);
+  bool early_abort;
+  if (! sys_setjmp (ctx.jmp))
+    early_abort = compareseq (0, size_a, 0, size_b, false, &ctx);
+  else
+    early_abort = true;
 
   if (early_abort)
     {
@@ -2046,8 +2047,6 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
       return Qnil;
     }
 
-  rbc_quitcounter = 0;
-
   Fundo_boundary ();
   bool modification_hooks_inhibited = false;
   record_unwind_protect_excursion ();
@@ -2071,8 +2070,7 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
      walk backwards, we don’t have to keep the positions in sync.  */
   while (i >= 0 || j >= 0)
     {
-      /* Allow the user to quit if this gets too slow.  */
-      rarely_quit (++rbc_quitcounter);
+      rarely_quit (++ctx.quitcounter);
 
       /* Check whether there is a change (insertion or deletion)
          before the current position.  */
@@ -2087,8 +2085,6 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
 	  while (j > 0 && bit_is_set (ctx.insertions, j - 1))
             --j;
 
-	  rarely_quit (rbc_quitcounter++);
-
           ptrdiff_t beg_a = min_a + i;
           ptrdiff_t beg_b = min_b + j;
           eassert (beg_a <= end_a);
@@ -2108,7 +2104,6 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
     }
 
   SAFE_FREE_UNBIND_TO (count, Qnil);
-  rbc_quitcounter = 0;
 
   if (modification_hooks_inhibited)
     {
@@ -2155,12 +2150,16 @@ bit_is_set (const unsigned char *a, ptrdiff_t i)
 buffer_chars_equal (struct context *ctx,
                     ptrdiff_t pos_a, ptrdiff_t pos_b)
 {
+  if (!++ctx->quitcounter)
+    {
+      maybe_quit ();
+      if (compareseq_early_abort (ctx))
+	sys_longjmp (ctx->jmp, 1);
+    }
+
   pos_a += ctx->beg_a;
   pos_b += ctx->beg_b;
 
-  /* Allow the user to escape out of a slow compareseq call.  */
-  rarely_quit (++rbc_quitcounter);
-
   ptrdiff_t bpos_a =
     ctx->a_unibyte ? pos_a : buf_charpos_to_bytepos (ctx->buffer_a, pos_a);
   ptrdiff_t bpos_b =
-- 
2.17.1


[-- Attachment #3: 0002-replace-buffer-contents-cleanups.patch --]
[-- Type: text/x-patch, Size: 6040 bytes --]

From e0345b4e86154f42f47a9f7bbbf458a72bf5c06d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 24 Aug 2020 13:12:51 -0700
Subject: [PATCH 2/2] replace-buffer-contents cleanups

* src/editfns.c (NOTE_DELETE, NOTE_INSERT): Avoid unnecessary parens.
(Freplace_buffer_contents): Check args before returning results.
Avoid integer overflow when computing too_expensive, and work even
if MAX-COSTS is bignum.  Call alloca and/or malloc just once, not
three times.
(set_bit, bit_is_set): Simplify micro-optimization by using eassume.
---
 src/editfns.c | 90 ++++++++++++++++++++++++++-------------------------
 1 file changed, 46 insertions(+), 44 deletions(-)

diff --git a/src/editfns.c b/src/editfns.c
index a5368c59da..7e1e24ef16 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -1900,8 +1900,8 @@ #define EXTRA_CONTEXT_FIELDS                    \
   sys_jmp_buf jmp;				\
   unsigned short quitcounter;
 
-#define NOTE_DELETE(ctx, xoff) set_bit ((ctx)->deletions, (xoff))
-#define NOTE_INSERT(ctx, yoff) set_bit ((ctx)->insertions, (yoff))
+#define NOTE_DELETE(ctx, xoff) set_bit ((ctx)->deletions, xoff)
+#define NOTE_INSERT(ctx, yoff) set_bit ((ctx)->insertions, yoff)
 #define EARLY_ABORT(ctx) compareseq_early_abort (ctx)
 
 struct context;
@@ -1954,6 +1954,28 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
   if (a == b)
     error ("Cannot replace a buffer with itself");
 
+  ptrdiff_t too_expensive;
+  if (NILP (max_costs))
+    too_expensive = 1000000;
+  else if (FIXNUMP (max_costs))
+    too_expensive = clip_to_bounds (0, XFIXNUM (max_costs), PTRDIFF_MAX);
+  else
+    {
+      CHECK_INTEGER (max_costs);
+      too_expensive = NILP (Fnatnump (max_costs)) ? 0 : PTRDIFF_MAX;
+    }
+
+  struct timespec time_limit = make_timespec (0, -1);
+  if (!NILP (max_secs))
+    {
+      struct timespec
+	tlim = timespec_add (current_timespec (),
+			     lisp_time_argument (max_secs)),
+	tmax = make_timespec (TYPE_MAXIMUM (time_t), TIMESPEC_HZ - 1);
+      if (timespec_cmp (tlim, tmax) < 0)
+	time_limit = tlim;
+    }
+
   ptrdiff_t min_a = BEGV;
   ptrdiff_t min_b = BUF_BEGV (b);
   ptrdiff_t size_a = ZV - min_a;
@@ -1983,36 +2005,24 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
 
   ptrdiff_t count = SPECPDL_INDEX ();
 
-  /* FIXME: It is not documented how to initialize the contents of the
-     context structure.  This code cargo-cults from the existing
-     caller in src/analyze.c of GNU Diffutils, which appears to
-     work.  */
 
   ptrdiff_t diags = size_a + size_b + 3;
+  ptrdiff_t del_bytes = size_a / CHAR_BIT + 1;
+  ptrdiff_t ins_bytes = size_b / CHAR_BIT + 1;
   ptrdiff_t *buffer;
+  ptrdiff_t bytes_needed;
+  if (INT_MULTIPLY_WRAPV (diags, 2 * sizeof *buffer, &bytes_needed)
+      || INT_ADD_WRAPV (del_bytes + ins_bytes, bytes_needed, &bytes_needed))
+    memory_full (SIZE_MAX);
   USE_SAFE_ALLOCA;
-  SAFE_NALLOCA (buffer, 2, diags);
+  buffer = SAFE_ALLOCA (bytes_needed);
+  unsigned char *deletions_insertions = memset (buffer + 2 * diags, 0,
+						del_bytes + ins_bytes);
 
-  if (NILP (max_costs))
-    XSETFASTINT (max_costs, 1000000);
-  else
-    CHECK_FIXNUM (max_costs);
-
-  struct timespec time_limit = make_timespec (0, -1);
-  if (!NILP (max_secs))
-    {
-      struct timespec
-	tlim = timespec_add (current_timespec (),
-			     lisp_time_argument (max_secs)),
-	tmax = make_timespec (TYPE_MAXIMUM (time_t), TIMESPEC_HZ - 1);
-      if (timespec_cmp (tlim, tmax) < 0)
-	time_limit = tlim;
-    }
-
-  /* Micro-optimization: Casting to size_t generates much better
-     code.  */
-  ptrdiff_t del_bytes = (size_t) size_a / CHAR_BIT + 1;
-  ptrdiff_t ins_bytes = (size_t) size_b / CHAR_BIT + 1;
+  /* FIXME: It is not documented how to initialize the contents of the
+     context structure.  This code cargo-cults from the existing
+     caller in src/analyze.c of GNU Diffutils, which appears to
+     work.  */
   struct context ctx = {
     .buffer_a = a,
     .buffer_b = b,
@@ -2020,16 +2030,14 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
     .beg_b = min_b,
     .a_unibyte = BUF_ZV (a) == BUF_ZV_BYTE (a),
     .b_unibyte = BUF_ZV (b) == BUF_ZV_BYTE (b),
-    .deletions = SAFE_ALLOCA (del_bytes),
-    .insertions = SAFE_ALLOCA (ins_bytes),
+    .deletions = deletions_insertions,
+    .insertions = deletions_insertions + del_bytes,
     .fdiag = buffer + size_b + 1,
     .bdiag = buffer + diags + size_b + 1,
     .heuristic = true,
-    .too_expensive = XFIXNUM (max_costs),
+    .too_expensive = too_expensive,
     .time_limit = time_limit,
   };
-  memclear (ctx.deletions, del_bytes);
-  memclear (ctx.insertions, ins_bytes);
 
   /* compareseq requires indices to be zero-based.  We add BEGV back
      later.  */
@@ -2074,8 +2082,8 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
 
       /* Check whether there is a change (insertion or deletion)
          before the current position.  */
-      if ((i > 0 && bit_is_set (ctx.deletions, i - 1)) ||
-          (j > 0 && bit_is_set (ctx.insertions, j - 1)))
+      if ((i > 0 && bit_is_set (ctx.deletions, i - 1))
+	  || (j > 0 && bit_is_set (ctx.insertions, j - 1)))
 	{
           ptrdiff_t end_a = min_a + i;
           ptrdiff_t end_b = min_b + j;
@@ -2117,21 +2125,15 @@ DEFUN ("replace-buffer-contents", Freplace_buffer_contents,
 static void
 set_bit (unsigned char *a, ptrdiff_t i)
 {
-  eassert (i >= 0);
-  /* Micro-optimization: Casting to size_t generates much better
-     code.  */
-  size_t j = i;
-  a[j / CHAR_BIT] |= (1 << (j % CHAR_BIT));
+  eassume (0 <= i);
+  a[i / CHAR_BIT] |= (1 << (i % CHAR_BIT));
 }
 
 static bool
 bit_is_set (const unsigned char *a, ptrdiff_t i)
 {
-  eassert (i >= 0);
-  /* Micro-optimization: Casting to size_t generates much better
-     code.  */
-  size_t j = i;
-  return a[j / CHAR_BIT] & (1 << (j % CHAR_BIT));
+  eassume (0 <= i);
+  return a[i / CHAR_BIT] & (1 << (i % CHAR_BIT));
 }
 
 /* Return true if the characters at position POS_A of buffer
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 23:35       ` Paul Eggert
@ 2020-08-25  6:10         ` Eli Zaretskii
  2020-08-25 18:26           ` Paul Eggert
  2020-08-25 17:30         ` Tassilo Horn
  1 sibling, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2020-08-25  6:10 UTC (permalink / raw)
  To: Paul Eggert; +Cc: laszlomail, 43016, bruno, tsdh

> Cc: ljell <laszlomail@protonmail.com>, Eli Zaretskii <eliz@gnu.org>,
>  43016@debbugs.gnu.org, Bruno Haible <bruno@clisp.org>
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 24 Aug 2020 16:35:32 -0700
> 
> I suppose the first attached patch might be a candidate for backporting into 
> Emacs 27 if this bug is considered to be serious enough.

Yes, please.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-24 23:35       ` Paul Eggert
  2020-08-25  6:10         ` Eli Zaretskii
@ 2020-08-25 17:30         ` Tassilo Horn
  2020-08-25 18:19           ` Paul Eggert
  1 sibling, 1 reply; 15+ messages in thread
From: Tassilo Horn @ 2020-08-25 17:30 UTC (permalink / raw)
  To: Paul Eggert; +Cc: ljell, 43016, Bruno Haible

Paul Eggert <eggert@cs.ucla.edu> writes:

> by installing the first attached patch into Emacs master, which fixes
> the problem (at least for me). I installed the second attached patch
> into Emacs master to fix some other minor issues I noticed while in
> the neighborhood.

Seems to work fine for me too with all three files of this report and
the large one from bug#42931.  It seems like the early abort will now
happen shortly after MAX-SECS bounds.

Bye,
Tassilo





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-25 17:30         ` Tassilo Horn
@ 2020-08-25 18:19           ` Paul Eggert
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Eggert @ 2020-08-25 18:19 UTC (permalink / raw)
  To: Tassilo Horn; +Cc: ljell, Bruno Haible, 43016-done

On 8/25/20 10:30 AM, Tassilo Horn wrote:

> Seems to work fine for me too with all three files of this report and
> the large one from bug#42931.  It seems like the early abort will now
> happen shortly after MAX-SECS bounds.

Thanks for checking; closing the Emacs bug report.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer
  2020-08-25  6:10         ` Eli Zaretskii
@ 2020-08-25 18:26           ` Paul Eggert
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Eggert @ 2020-08-25 18:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: laszlomail, 43016, bruno, tsdh

On 8/24/20 11:10 PM, Eli Zaretskii wrote:

>> I suppose the first attached patch might be a candidate for backporting into
>> Emacs 27 if this bug is considered to be serious enough.
> 
> Yes, please.

OK, I installed the backported patch into the emacs-27 branch.





^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-25 18:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-24  8:25 bug#43016: replace-region-contents takes a lot of time when called from json-pretty-print-buffer ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
2020-08-24 10:37 ` Eli Zaretskii
2020-08-24 12:13   ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
2020-08-24 12:29     ` ljell via Bug reports for GNU Emacs, the Swiss army knife of text editors
2020-08-24 17:14     ` Tassilo Horn
2020-08-24 17:21       ` Lars Ingebrigtsen
2020-08-24 17:25         ` Philipp Stephani
2020-08-24 17:27       ` Eli Zaretskii
2020-08-24 19:15         ` Tassilo Horn
2020-08-24 19:36           ` Eli Zaretskii
2020-08-24 23:35       ` Paul Eggert
2020-08-25  6:10         ` Eli Zaretskii
2020-08-25 18:26           ` Paul Eggert
2020-08-25 17:30         ` Tassilo Horn
2020-08-25 18:19           ` Paul Eggert

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.