* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again [not found] ` <20170315023159.30AA520CAB@vcs0.savannah.gnu.org> @ 2017-03-15 11:28 ` Stefan Monnier 2017-03-15 12:09 ` Noam Postavsky 0 siblings, 1 reply; 9+ messages in thread From: Stefan Monnier @ 2017-03-15 11:28 UTC (permalink / raw) To: emacs-devel; +Cc: Noam Postavsky > Recomplexify ‘delete-trailing-whitespace’ by treating \n as > whitespace again > > Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating > \n as whitespace" from 2016-07-04. Setting \n to non-whitespace > causes the regex engine to backtrack a lot when searching for > "\\s-+$" (Bug#26079). Why do we use syntax-tables? IOW why do we use \s- rather than something like [\s\t]? Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again 2017-03-15 11:28 ` [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again Stefan Monnier @ 2017-03-15 12:09 ` Noam Postavsky 2017-03-15 14:50 ` Stefan Monnier 0 siblings, 1 reply; 9+ messages in thread From: Noam Postavsky @ 2017-03-15 12:09 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs developers On Wed, Mar 15, 2017 at 7:28 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> Recomplexify ‘delete-trailing-whitespace’ by treating \n as >> whitespace again >> >> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating >> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace >> causes the regex engine to backtrack a lot when searching for >> "\\s-+$" (Bug#26079). > > Why do we use syntax-tables? > IOW why do we use \s- rather than something like [\s\t]? No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again 2017-03-15 12:09 ` Noam Postavsky @ 2017-03-15 14:50 ` Stefan Monnier 2017-03-15 17:05 ` Noam Postavsky 0 siblings, 1 reply; 9+ messages in thread From: Stefan Monnier @ 2017-03-15 14:50 UTC (permalink / raw) To: emacs-devel >>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as >>> whitespace again >>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating >>> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace >>> causes the regex engine to backtrack a lot when searching for >>> "\\s-+$" (Bug#26079). >> Why do we use syntax-tables? >> IOW why do we use \s- rather than something like [\s\t]? > No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow. Slower than "\\s-+$"? Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again 2017-03-15 14:50 ` Stefan Monnier @ 2017-03-15 17:05 ` Noam Postavsky 2017-03-15 20:26 ` Stefan Monnier 0 siblings, 1 reply; 9+ messages in thread From: Noam Postavsky @ 2017-03-15 17:05 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs developers On Wed, Mar 15, 2017 at 10:50 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: >>>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as >>>> whitespace again >>>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating >>>> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace >>>> causes the regex engine to backtrack a lot when searching for >>>> "\\s-+$" (Bug#26079). >>> Why do we use syntax-tables? >>> IOW why do we use \s- rather than something like [\s\t]? >> No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow. > > Slower than "\\s-+$"? No, seems to be twice as fast actually (this is still horribly slow). On the test file given in Bug#26079[1] (benchmark 1 '(re-search-forward "[\s\t]+$" nil t)) ;=> Elapsed time: 9.944000s (benchmark 1 '(save-excursion (let ((end-marker nil)) (goto-char (point-min)) (with-syntax-table (make-syntax-table (syntax-table)) (modify-syntax-entry ?\f "_") (modify-syntax-entry ?\n "_") (re-search-forward "\\s-+$" end-marker t))))) ;=> Elapsed time: 20.496000s [1]: https://raw.githubusercontent.com/stakemori/e8theta_degree3/master/results/wt18_17_5/wt18_17_5.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again 2017-03-15 17:05 ` Noam Postavsky @ 2017-03-15 20:26 ` Stefan Monnier 2017-03-16 0:18 ` Noam Postavsky 0 siblings, 1 reply; 9+ messages in thread From: Stefan Monnier @ 2017-03-15 20:26 UTC (permalink / raw) To: emacs-devel >>>>> Recomplexify ‘delete-trailing-whitespace’ by treating \n as >>>>> whitespace again >>>>> Mostly reverts "Simplify ‘delete-trailing-whitespace’ by not treating >>>>> \n as whitespace" from 2016-07-04. Setting \n to non-whitespace >>>>> causes the regex engine to backtrack a lot when searching for >>>>> "\\s-+$" (Bug#26079). >>>> Why do we use syntax-tables? >>>> IOW why do we use \s- rather than something like [\s\t]? >>> No clue. But (re-search-forward "[\s\t]+$" nil t) is also slow. >> Slower than "\\s-+$"? > No, seems to be twice as fast actually (this is still horribly slow). I meant slower than the best performance of "\\s-+$"? because "[\s\t]+$" shouldn't backtrack. Using "\\s-+$" with a syntax-table that puts \n in the whitespace syntax is indeed asking for trouble, but I'm surprised "[\s\t]+$" is only twice as fast as this pathological case (and hence presumably much slower than "\\s-+$" when \n is not whitespace). Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again 2017-03-15 20:26 ` Stefan Monnier @ 2017-03-16 0:18 ` Noam Postavsky 2017-03-16 2:52 ` Stefan Monnier 0 siblings, 1 reply; 9+ messages in thread From: Noam Postavsky @ 2017-03-16 0:18 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs developers [-- Attachment #1: Type: text/plain, Size: 677 bytes --] On Wed, Mar 15, 2017 at 4:26 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > because "[\s\t]+$" shouldn't backtrack. Using "\\s-+$" with > a syntax-table that puts \n in the whitespace syntax is indeed asking > for trouble, but I'm surprised "[\s\t]+$" is only twice as fast as this > pathological case (and hence presumably much slower than "\\s-+$" when > \n is not whitespace). Why should one backtrack and not the other? The *2 ratio seems to hold for both pathologically slow and normal case. Matching against a sequence of non-trailing whitespace seens to have quadradic complexity (see attached, timings are from an elisp buffer), does that give you any hints? [-- Attachment #2: trailing-ws-regexp-benchmark.el --] [-- Type: text/x-emacs-lisp, Size: 1952 bytes --] ;; Pathological case, both seem to be quadratic in the length of a whitespace sequence. (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 1000 ?\s) "X")))"Elapsed time: 0.090000s" (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 2000 ?\s) "X")))"Elapsed time: 0.258000s" (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 4000 ?\s) "X")))"Elapsed time: 0.630000s" (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 8000 ?\s) "X")))"Elapsed time: 2.260000s" (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 16000 ?\s) "X")))"Elapsed time: 9.306000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 1000 ?\s) "X")))"Elapsed time: 0.395000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 2000 ?\s) "X")))"Elapsed time: 0.347000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 4000 ?\s) "X")))"Elapsed time: 1.282000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 8000 ?\s) "X")))"Elapsed time: 5.051000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 16000 ?\s) "X")))"Elapsed time: 20.144000s" ;; Normal case, linear. (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 8000 ?_) "X")))"Elapsed time: 0.001000s" (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 16000 ?_) "X")))"Elapsed time: 0.002000s" (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 32000 ?_) "X")))"Elapsed time: 0.003000s" (benchmark 1 `(string-match "[\s\t]+$" ,(concat (make-string 64000 ?_) "X")))"Elapsed time: 0.007000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 8000 ?_) "X")))"Elapsed time: 0.004000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 16000 ?_) "X")))"Elapsed time: 0.008000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 32000 ?_) "X")))"Elapsed time: 0.016000s" (benchmark 1 `(string-match "\\s-+$" ,(concat (make-string 64000 ?_) "X")))"Elapsed time: 0.031000s" ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again 2017-03-16 0:18 ` Noam Postavsky @ 2017-03-16 2:52 ` Stefan Monnier 2017-03-16 3:04 ` Noam Postavsky 0 siblings, 1 reply; 9+ messages in thread From: Stefan Monnier @ 2017-03-16 2:52 UTC (permalink / raw) To: emacs-devel > Why should one backtrack and not the other? The *2 ratio seems to hold > for both pathologically slow and normal case. Matching against a > sequence of non-trailing whitespace seens to have quadradic complexity > (see attached, timings are from an elisp buffer), does that give you > any hints? Yes, it's normal: the "search" attempts a "match" from every whitespace. So if you have N consecutive whitespace chars in the middle of line, that gives you N attempts to "match" and every attempt takes O(N) steps to find that the end of the whitespace is not an LF. I don't understand how "\\s-+$" can be significantly faster than "[\s\t]+$" in this respect. Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again 2017-03-16 2:52 ` Stefan Monnier @ 2017-03-16 3:04 ` Noam Postavsky 2017-03-16 3:08 ` Stefan Monnier 0 siblings, 1 reply; 9+ messages in thread From: Noam Postavsky @ 2017-03-16 3:04 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs developers [-- Attachment #1: Type: text/plain, Size: 603 bytes --] On Wed, Mar 15, 2017 at 10:52 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > Yes, it's normal: the "search" attempts a "match" from every whitespace. > So if you have N consecutive whitespace chars in the middle of line, > that gives you N attempts to "match" and every attempt takes O(N) steps > to find that the end of the whitespace is not an LF. > > I don't understand how "\\s-+$" can be significantly faster than > "[\s\t]+$" in this respect. What? I though I said "\\s-+" is *slower* (though only by a factor of 2). Anyway, how about the attached which removes the syntax table stuff. [-- Attachment #2: 0001-Simplify-delete-trailing-whitespace-by-dropping-synt.patch --] [-- Type: text/x-patch, Size: 1669 bytes --] From 2275487232b96ca782ca671d60be002e3948a764 Mon Sep 17 00:00:00 2001 From: Noam Postavsky <npostavs@gmail.com> Date: Wed, 15 Mar 2017 23:01:13 -0400 Subject: [PATCH] Simplify delete-trailing-whitespace by dropping syntax tables * lisp/simple.el (delete-trailing-whitespace): Just match [[:blank:]] independently of current syntax table. --- lisp/simple.el | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/lisp/simple.el b/lisp/simple.el index 369fbf7192..2cf8c13225 100644 --- a/lisp/simple.el +++ b/lisp/simple.el @@ -630,14 +630,13 @@ delete-trailing-whitespace (save-excursion (let ((end-marker (and end (copy-marker end)))) (goto-char (or start (point-min))) - (with-syntax-table (make-syntax-table (syntax-table)) - ;; Don't delete formfeeds, even if they are considered whitespace. - (modify-syntax-entry ?\f "_") - (while (re-search-forward "\\s-$" end-marker t) - (skip-syntax-backward "-" (line-beginning-position)) - (let ((b (point)) (e (match-end 0))) - (when (region-modifiable-p b e) - (delete-region b e))))) + ;; Note that trying to match all the trailing whitespace with + ;; just the regexp can be very slow (Bug#26079). + (while (re-search-forward "[[:blank:]]$" end-marker t) + (skip-syntax-backward "-" (line-beginning-position)) + (let ((b (point)) (e (match-end 0))) + (when (region-modifiable-p b e) + (delete-region b e)))) (if end (set-marker end-marker nil) ;; Delete trailing empty lines. -- 2.11.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again 2017-03-16 3:04 ` Noam Postavsky @ 2017-03-16 3:08 ` Stefan Monnier 0 siblings, 0 replies; 9+ messages in thread From: Stefan Monnier @ 2017-03-16 3:08 UTC (permalink / raw) To: emacs-devel >> I don't understand how "\\s-+$" can be significantly faster than >> "[\s\t]+$" in this respect. > What? I though I said "\\s-+" is *slower* (though only by a factor of 2). The bug#26079 seems to indicate that the old Emacs-25 code was a lot faster. Oh, I see. It's that the old code used "\\s-$" rather than "\\s-+$". Yes, that old code does avoid the quadratic effect this way. Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-03-16 3:08 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20170315023157.29463.36647@vcs0.savannah.gnu.org> [not found] ` <20170315023159.30AA520CAB@vcs0.savannah.gnu.org> 2017-03-15 11:28 ` [Emacs-diffs] master c66aaa6: Recomplexify ‘delete-trailing-whitespace’ by treating \n as whitespace again Stefan Monnier 2017-03-15 12:09 ` Noam Postavsky 2017-03-15 14:50 ` Stefan Monnier 2017-03-15 17:05 ` Noam Postavsky 2017-03-15 20:26 ` Stefan Monnier 2017-03-16 0:18 ` Noam Postavsky 2017-03-16 2:52 ` Stefan Monnier 2017-03-16 3:04 ` Noam Postavsky 2017-03-16 3:08 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).