* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
[not found] ` <20170315023159.30AA520CAB@vcs0.savannah.gnu.org>
@ 2017-03-15 11:28 ` Stefan Monnier
2017-03-15 12:09 ` Noam Postavsky
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2017-03-15 11:28 UTC (permalink / raw)
To: emacs-devel; +Cc: Noam Postavsky
> Recomplexify ‘delete-trailing-whitespace’ by treating \n as
> whitespace again
>
> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace
> causes the regex engine to backtrack a lot when searching for
> "\\s-+$" (Bug#26079).
Why do we use syntax-tables?
IOW why do we use \s- rather than something like [\s\t]?
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
2017-03-15 11:28 ` [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again Stefan Monnier
@ 2017-03-15 12:09 ` Noam Postavsky
2017-03-15 14:50 ` Stefan Monnier
0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2017-03-15 12:09 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Emacs developers
On Wed, Mar 15, 2017 at 7:28 AM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>> whitespace again
>>
>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace
>> causes the regex engine to backtrack a lot when searching for
>> "\\s-+$" (Bug#26079).
>
> Why do we use syntax-tables?
> IOW why do we use \s- rather than something like [\s\t]?
No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
2017-03-15 12:09 ` Noam Postavsky
@ 2017-03-15 14:50 ` Stefan Monnier
2017-03-15 17:05 ` Noam Postavsky
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2017-03-15 14:50 UTC (permalink / raw)
To: emacs-devel
>>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>>> whitespace again
>>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>>> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace
>>> causes the regex engine to backtrack a lot when searching for
>>> "\\s-+$" (Bug#26079).
>> Why do we use syntax-tables?
>> IOW why do we use \s- rather than something like [\s\t]?
> No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow.
Slower than "\\s-+$"?
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
2017-03-15 14:50 ` Stefan Monnier
@ 2017-03-15 17:05 ` Noam Postavsky
2017-03-15 20:26 ` Stefan Monnier
0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2017-03-15 17:05 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Emacs developers
On Wed, Mar 15, 2017 at 10:50 AM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>>>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>>>> whitespace again
>>>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>>>> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace
>>>> causes the regex engine to backtrack a lot when searching for
>>>> "\\s-+$" (Bug#26079).
>>> Why do we use syntax-tables?
>>> IOW why do we use \s- rather than something like [\s\t]?
>> No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow.
>
> Slower than "\\s-+$"?
No, seems to be twice as fast actually (this is still horribly slow).
On the test file given in Bug#26079[1]
(benchmark 1 '(re-search-forward "[\s\t]+$" nil t)) ;=> Elapsed time: 9.944000s
(benchmark 1 '(save-excursion
(let ((end-marker nil))
(goto-char (point-min))
(with-syntax-table (make-syntax-table (syntax-table))
(modify-syntax-entry ?\f "_")
(modify-syntax-entry ?\n "_")
(re-search-forward "\\s-+$" end-marker t))))) ;=> Elapsed
time: 20.496000s
[1]: https://raw.githubusercontent.com/stakemori/e8theta_degree3/master/results/wt18_17_5/wt18_17_5.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
2017-03-15 17:05 ` Noam Postavsky
@ 2017-03-15 20:26 ` Stefan Monnier
2017-03-16 0:18 ` Noam Postavsky
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2017-03-15 20:26 UTC (permalink / raw)
To: emacs-devel
>>>>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as
>>>>> whitespace again
>>>>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating
>>>>> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace
>>>>> causes the regex engine to backtrack a lot when searching for
>>>>> "\\s-+$" (Bug#26079).
>>>> Why do we use syntax-tables?
>>>> IOW why do we use \s- rather than something like [\s\t]?
>>> No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow.
>> Slower than "\\s-+$"?
> No, seems to be twice as fast actually (this is still horribly slow).
I meant
slower than the best performance of "\\s-+$"?
because "[\s\t]+$" shouldn't backtrack. Using "\\s-+$" with
a syntax-table that puts \n in the whitespace syntax is indeed asking
for trouble, but I'm surprised "[\s\t]+$" is only twice as fast as this
pathological case (and hence presumably much slower than "\\s-+$" when
\n is not whitespace).
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
2017-03-15 20:26 ` Stefan Monnier
@ 2017-03-16 0:18 ` Noam Postavsky
2017-03-16 2:52 ` Stefan Monnier
0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2017-03-16 0:18 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 677 bytes --]
On Wed, Mar 15, 2017 at 4:26 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
> because "[\s\t]+$" shouldn't backtrack. Using "\\s-+$" with
> a syntax-table that puts \n in the whitespace syntax is indeed asking
> for trouble, but I'm surprised "[\s\t]+$" is only twice as fast as this
> pathological case (and hence presumably much slower than "\\s-+$" when
> \n is not whitespace).
Why should one backtrack and not the other? The *2 ratio seems to hold
for both pathologically slow and normal case. Matching against a
sequence of non-trailing whitespace seens to have quadradic complexity
(see attached, timings are from an elisp buffer), does that give you
any hints?
[-- Attachment #2: trailing-ws-regexp-benchmark.el --]
[-- Type: text/x-emacs-lisp, Size: 1952 bytes --]
;; Pathological case, both seem to be quadratic in the length of a whitespace sequence.
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 1000 ?\s) "X")))"Elapsed time: 0.090000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 2000 ?\s) "X")))"Elapsed time: 0.258000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 4000 ?\s) "X")))"Elapsed time: 0.630000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 8000 ?\s) "X")))"Elapsed time: 2.260000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 16000 ?\s) "X")))"Elapsed time: 9.306000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 1000 ?\s) "X")))"Elapsed time: 0.395000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 2000 ?\s) "X")))"Elapsed time: 0.347000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 4000 ?\s) "X")))"Elapsed time: 1.282000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 8000 ?\s) "X")))"Elapsed time: 5.051000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 16000 ?\s) "X")))"Elapsed time: 20.144000s"
;; Normal case, linear.
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 8000 ?_) "X")))"Elapsed time: 0.001000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 16000 ?_) "X")))"Elapsed time: 0.002000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 32000 ?_) "X")))"Elapsed time: 0.003000s"
(benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 64000 ?_) "X")))"Elapsed time: 0.007000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 8000 ?_) "X")))"Elapsed time: 0.004000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 16000 ?_) "X")))"Elapsed time: 0.008000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 32000 ?_) "X")))"Elapsed time: 0.016000s"
(benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 64000 ?_) "X")))"Elapsed time: 0.031000s"
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
2017-03-16 0:18 ` Noam Postavsky
@ 2017-03-16 2:52 ` Stefan Monnier
2017-03-16 3:04 ` Noam Postavsky
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2017-03-16 2:52 UTC (permalink / raw)
To: emacs-devel
> Why should one backtrack and not the other? The *2 ratio seems to hold
> for both pathologically slow and normal case. Matching against a
> sequence of non-trailing whitespace seens to have quadradic complexity
> (see attached, timings are from an elisp buffer), does that give you
> any hints?
Yes, it's normal: the "search" attempts a "match" from every whitespace.
So if you have N consecutive whitespace chars in the middle of line,
that gives you N attempts to "match" and every attempt takes O(N) steps
to find that the end of the whitespace is not an LF.
I don't understand how "\\s-+$" can be significantly faster than
"[\s\t]+$" in this respect.
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
2017-03-16 2:52 ` Stefan Monnier
@ 2017-03-16 3:04 ` Noam Postavsky
2017-03-16 3:08 ` Stefan Monnier
0 siblings, 1 reply; 9+ messages in thread
From: Noam Postavsky @ 2017-03-16 3:04 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 603 bytes --]
On Wed, Mar 15, 2017 at 10:52 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
> Yes, it's normal: the "search" attempts a "match" from every whitespace.
> So if you have N consecutive whitespace chars in the middle of line,
> that gives you N attempts to "match" and every attempt takes O(N) steps
> to find that the end of the whitespace is not an LF.
>
> I don't understand how "\\s-+$" can be significantly faster than
> "[\s\t]+$" in this respect.
What? I though I said "\\s-+" is *slower* (though only by a factor of 2).
Anyway, how about the attached which removes the syntax table stuff.
[-- Attachment #2: 0001-Simplify-delete-trailing-whitespace-by-dropping-synt.patch --]
[-- Type: text/x-patch, Size: 1669 bytes --]
From 2275487232b96ca782ca671d60be002e3948a764 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Wed, 15 Mar 2017 23:01:13 -0400
Subject: [PATCH] Simplify delete-trailing-whitespace by dropping syntax tables
* lisp/simple.el (delete-trailing-whitespace): Just match [[:blank:]]
independently of current syntax table.
---
lisp/simple.el | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/lisp/simple.el b/lisp/simple.el
index 369fbf7192..2cf8c13225 100644
--- a/lisp/simple.el
+++ b/lisp/simple.el
@@ -630,14 +630,13 @@ delete-trailing-whitespace
(save-excursion
(let ((end-marker (and end (copy-marker end))))
(goto-char (or start (point-min)))
- (with-syntax-table (make-syntax-table (syntax-table))
- ;; Don't delete formfeeds, even if they are considered whitespace.
- (modify-syntax-entry ?\f "_")
- (while (re-search-forward "\\s-$" end-marker t)
- (skip-syntax-backward "-" (line-beginning-position))
- (let ((b (point)) (e (match-end 0)))
- (when (region-modifiable-p b e)
- (delete-region b e)))))
+ ;; Note that trying to match all the trailing whitespace with
+ ;; just the regexp can be very slow (Bug#26079).
+ (while (re-search-forward "[[:blank:]]$" end-marker t)
+ (skip-syntax-backward "-" (line-beginning-position))
+ (let ((b (point)) (e (match-end 0)))
+ (when (region-modifiable-p b e)
+ (delete-region b e))))
(if end
(set-marker end-marker nil)
;; Delete trailing empty lines.
--
2.11.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again
2017-03-16 3:04 ` Noam Postavsky
@ 2017-03-16 3:08 ` Stefan Monnier
0 siblings, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2017-03-16 3:08 UTC (permalink / raw)
To: emacs-devel
>> I don't understand how "\\s-+$" can be significantly faster than
>> "[\s\t]+$" in this respect.
> What? I though I said "\\s-+" is *slower* (though only by a factor of 2).
The bug#26079 seems to indicate that the old Emacs-25 code was a lot faster.
Oh, I see. It's that the old code used "\\s-$" rather than "\\s-+$".
Yes, that old code does avoid the quadratic effect this way.
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-03-16 3:08 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170315023157.29463.36647@vcs0.savannah.gnu.org>
[not found] ` <20170315023159.30AA520CAB@vcs0.savannah.gnu.org>
2017-03-15 11:28 ` [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again Stefan Monnier
2017-03-15 12:09 ` Noam Postavsky
2017-03-15 14:50 ` Stefan Monnier
2017-03-15 17:05 ` Noam Postavsky
2017-03-15 20:26 ` Stefan Monnier
2017-03-16 0:18 ` Noam Postavsky
2017-03-16 2:52 ` Stefan Monnier
2017-03-16 3:04 ` Noam Postavsky
2017-03-16 3:08 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).