unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: [Emacs-diffs]  master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
       [not found] ` <20170315023159.30AA520CAB@vcs0.savannah.gnu.org>
@ 2017-03-15 11:28   ` Stefan Monnier
  2017-03-15 12:09     ` Noam Postavsky
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2017-03-15 11:28 UTC (permalink / raw)
  To: emacs-devel; +Cc: Noam Postavsky

>     Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>     whitespace again
>
>     Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>     \n as whitespace" from 2016-07-04.  Setting \n to non-whitespace
>     causes the regex engine to backtrack a lot when searching for
>     "\\s-+$" (Bug#26079).

Why do we use syntax-tables?
IOW why do we use \s- rather than something like [\s\t]?


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
  2017-03-15 11:28   ` [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again Stefan Monnier
@ 2017-03-15 12:09     ` Noam Postavsky
  2017-03-15 14:50       ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2017-03-15 12:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs developers

On Wed, Mar 15, 2017 at 7:28 AM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>>     Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>>     whitespace again
>>
>>     Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>>     \n as whitespace" from 2016-07-04.  Setting \n to non-whitespace
>>     causes the regex engine to backtrack a lot when searching for
>>     "\\s-+$" (Bug#26079).
>
> Why do we use syntax-tables?
> IOW why do we use \s- rather than something like [\s\t]?

No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
  2017-03-15 12:09     ` Noam Postavsky
@ 2017-03-15 14:50       ` Stefan Monnier
  2017-03-15 17:05         ` Noam Postavsky
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2017-03-15 14:50 UTC (permalink / raw)
  To: emacs-devel

>>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>>> whitespace again
>>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>>> \n as whitespace" from 2016-07-04.  Setting \n to non-whitespace
>>> causes the regex engine to backtrack a lot when searching for
>>> "\\s-+$" (Bug#26079).
>> Why do we use syntax-tables?
>> IOW why do we use \s- rather than something like [\s\t]?
> No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow.

Slower than "\\s-+$"?


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
  2017-03-15 14:50       ` Stefan Monnier
@ 2017-03-15 17:05         ` Noam Postavsky
  2017-03-15 20:26           ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2017-03-15 17:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs developers

On Wed, Mar 15, 2017 at 10:50 AM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>>>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>>>> whitespace again
>>>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>>>> \n as whitespace" from 2016-07-04.  Setting \n to non-whitespace
>>>> causes the regex engine to backtrack a lot when searching for
>>>> "\\s-+$" (Bug#26079).
>>> Why do we use syntax-tables?
>>> IOW why do we use \s- rather than something like [\s\t]?
>> No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow.
>
> Slower than "\\s-+$"?

No, seems to be twice as fast actually (this is still horribly slow).
On the test file given in Bug#26079[1]

(benchmark 1 '(re-search-forward "[\s\t]+$" nil t)) ;=> Elapsed time: 9.944000s
(benchmark 1 '(save-excursion
        (let ((end-marker nil))
          (goto-char (point-min))
          (with-syntax-table (make-syntax-table (syntax-table))
            (modify-syntax-entry ?\f "_")
            (modify-syntax-entry ?\n "_")
            (re-search-forward "\\s-+$" end-marker t))))) ;=> Elapsed
time: 20.496000s

[1]: https://raw.githubusercontent.com/stakemori/e8theta_degree3/master/results/wt18_17_5/wt18_17_5.org



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
  2017-03-15 17:05         ` Noam Postavsky
@ 2017-03-15 20:26           ` Stefan Monnier
  2017-03-16  0:18             ` Noam Postavsky
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2017-03-15 20:26 UTC (permalink / raw)
  To: emacs-devel

>>>>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>>>>> whitespace again
>>>>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>>>>> \n as whitespace" from 2016-07-04.  Setting \n to non-whitespace
>>>>> causes the regex engine to backtrack a lot when searching for
>>>>> "\\s-+$" (Bug#26079).
>>>> Why do we use syntax-tables?
>>>> IOW why do we use \s- rather than something like [\s\t]?
>>> No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow.
>> Slower than "\\s-+$"?
> No, seems to be twice as fast actually (this is still horribly slow).

I meant

   slower than the best performance of "\\s-+$"?

because "[\s\t]+$" shouldn't backtrack.  Using "\\s-+$" with
a syntax-table that puts \n in the whitespace syntax is indeed asking
for trouble, but I'm surprised "[\s\t]+$" is only twice as fast as this
pathological case (and hence presumably much slower than "\\s-+$" when
\n is not whitespace).



        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
  2017-03-15 20:26           ` Stefan Monnier
@ 2017-03-16  0:18             ` Noam Postavsky
  2017-03-16  2:52               ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2017-03-16  0:18 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 677 bytes --]

On Wed, Mar 15, 2017 at 4:26 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
> because "[\s\t]+$" shouldn't backtrack.  Using "\\s-+$" with
> a syntax-table that puts \n in the whitespace syntax is indeed asking
> for trouble, but I'm surprised "[\s\t]+$" is only twice as fast as this
> pathological case (and hence presumably much slower than "\\s-+$" when
> \n is not whitespace).

Why should one backtrack and not the other? The *2 ratio seems to hold
for both pathologically slow and normal case. Matching against a
sequence of non-trailing whitespace seens to have quadradic complexity
(see attached, timings are from an elisp buffer), does that give you
any hints?

[-- Attachment #2: trailing-ws-regexp-benchmark.el --]
[-- Type: text/x-emacs-lisp, Size: 1952 bytes --]

;; Pathological case, both seem to be quadratic in the length of a whitespace sequence.
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 1000 ?\s) "X")))"Elapsed time: 0.090000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 2000 ?\s) "X")))"Elapsed time: 0.258000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 4000 ?\s) "X")))"Elapsed time: 0.630000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 8000 ?\s) "X")))"Elapsed time: 2.260000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 16000 ?\s) "X")))"Elapsed time: 9.306000s"

(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 1000 ?\s) "X")))"Elapsed time: 0.395000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 2000 ?\s) "X")))"Elapsed time: 0.347000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 4000 ?\s) "X")))"Elapsed time: 1.282000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 8000 ?\s) "X")))"Elapsed time: 5.051000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 16000 ?\s) "X")))"Elapsed time: 20.144000s"

;; Normal case, linear.
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 8000 ?_) "X")))"Elapsed time: 0.001000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 16000 ?_) "X")))"Elapsed time: 0.002000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 32000 ?_) "X")))"Elapsed time: 0.003000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 64000 ?_) "X")))"Elapsed time: 0.007000s"

(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 8000 ?_) "X")))"Elapsed time: 0.004000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 16000 ?_) "X")))"Elapsed time: 0.008000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 32000 ?_) "X")))"Elapsed time: 0.016000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 64000 ?_) "X")))"Elapsed time: 0.031000s"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
  2017-03-16  0:18             ` Noam Postavsky
@ 2017-03-16  2:52               ` Stefan Monnier
  2017-03-16  3:04                 ` Noam Postavsky
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2017-03-16  2:52 UTC (permalink / raw)
  To: emacs-devel

> Why should one backtrack and not the other? The *2 ratio seems to hold
> for both pathologically slow and normal case. Matching against a
> sequence of non-trailing whitespace seens to have quadradic complexity
> (see attached, timings are from an elisp buffer), does that give you
> any hints?

Yes, it's normal: the "search" attempts a "match" from every whitespace.
So if you have N consecutive whitespace chars in the middle of line,
that gives you N attempts to "match" and every attempt takes O(N) steps
to find that the end of the whitespace is not an LF.

I don't understand how "\\s-+$" can be significantly faster than
"[\s\t]+$" in this respect.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
  2017-03-16  2:52               ` Stefan Monnier
@ 2017-03-16  3:04                 ` Noam Postavsky
  2017-03-16  3:08                   ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2017-03-16  3:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Wed, Mar 15, 2017 at 10:52 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
> Yes, it's normal: the "search" attempts a "match" from every whitespace.
> So if you have N consecutive whitespace chars in the middle of line,
> that gives you N attempts to "match" and every attempt takes O(N) steps
> to find that the end of the whitespace is not an LF.
>
> I don't understand how "\\s-+$" can be significantly faster than
> "[\s\t]+$" in this respect.

What?  I though I said "\\s-+" is *slower* (though only by a factor of 2).

Anyway, how about the attached which removes the syntax table stuff.

[-- Attachment #2: 0001-Simplify-delete-trailing-whitespace-by-dropping-synt.patch --]
[-- Type: text/x-patch, Size: 1669 bytes --]

From 2275487232b96ca782ca671d60be002e3948a764 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Wed, 15 Mar 2017 23:01:13 -0400
Subject: [PATCH] Simplify delete-trailing-whitespace by dropping syntax tables

* lisp/simple.el (delete-trailing-whitespace): Just match [[:blank:]]
independently of current syntax table.
---
 lisp/simple.el | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/lisp/simple.el b/lisp/simple.el
index 369fbf7192..2cf8c13225 100644
--- a/lisp/simple.el
+++ b/lisp/simple.el
@@ -630,14 +630,13 @@ delete-trailing-whitespace
     (save-excursion
       (let ((end-marker (and end (copy-marker end))))
         (goto-char (or start (point-min)))
-        (with-syntax-table (make-syntax-table (syntax-table))
-          ;; Don't delete formfeeds, even if they are considered whitespace.
-          (modify-syntax-entry ?\f "_")
-          (while (re-search-forward "\\s-$" end-marker t)
-            (skip-syntax-backward "-" (line-beginning-position))
-            (let ((b (point)) (e (match-end 0)))
-              (when (region-modifiable-p b e)
-                (delete-region b e)))))
+        ;; Note that trying to match all the trailing whitespace with
+        ;; just the regexp can be very slow (Bug#26079).
+        (while (re-search-forward "[[:blank:]]$" end-marker t)
+          (skip-syntax-backward "-" (line-beginning-position))
+          (let ((b (point)) (e (match-end 0)))
+            (when (region-modifiable-p b e)
+              (delete-region b e))))
         (if end
             (set-marker end-marker nil)
           ;; Delete trailing empty lines.
-- 
2.11.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
  2017-03-16  3:04                 ` Noam Postavsky
@ 2017-03-16  3:08                   ` Stefan Monnier
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2017-03-16  3:08 UTC (permalink / raw)
  To: emacs-devel

>> I don't understand how "\\s-+$" can be significantly faster than
>> "[\s\t]+$" in this respect.
> What?  I though I said "\\s-+" is *slower* (though only by a factor of 2).

The bug#26079 seems to indicate that the old Emacs-25 code was a lot faster.

Oh, I see.  It's that the old code used "\\s-$" rather than "\\s-+$".
Yes, that old code does avoid the quadratic effect this way.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-03-16  3:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170315023157.29463.36647@vcs0.savannah.gnu.org>
     [not found] ` <20170315023159.30AA520CAB@vcs0.savannah.gnu.org>
2017-03-15 11:28   ` [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again Stefan Monnier
2017-03-15 12:09     ` Noam Postavsky
2017-03-15 14:50       ` Stefan Monnier
2017-03-15 17:05         ` Noam Postavsky
2017-03-15 20:26           ` Stefan Monnier
2017-03-16  0:18             ` Noam Postavsky
2017-03-16  2:52               ` Stefan Monnier
2017-03-16  3:04                 ` Noam Postavsky
2017-03-16  3:08                   ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).