From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Aleksey Cherepanov Newsgroups: gmane.emacs.bugs Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file Date: Sat, 1 Mar 2014 03:11:41 +0400 Message-ID: <20140228231141.GA20782@openwall.com> References: <20140222160217.GA15616@openwall.com> <83ios72j8b.fsf@gnu.org> <20140222185511.GA23643@openwall.com> <838ut23lo9.fsf@gnu.org> <20140223195659.GA23581@openwall.com> <20140223230251.GA30257@openwall.com> <20140224160317.GA2475@openwall.com> <20140226203202.GA23749@agmartin.aq.upm.es> <20140228114545.GA8669@agmartin.aq.upm.es> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="KsGdsel6WgEHnImy" X-Trace: ger.gmane.org 1393629135 1626 80.91.229.3 (28 Feb 2014 23:12:15 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 28 Feb 2014 23:12:15 +0000 (UTC) Cc: 16800@debbugs.gnu.org To: Agustin Martin Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Mar 01 00:12:22 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WJWbN-0005wH-6x for geb-bug-gnu-emacs@m.gmane.org; Sat, 01 Mar 2014 00:12:21 +0100 Original-Received: from localhost ([::1]:53546 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WJWbM-0000dK-O8 for geb-bug-gnu-emacs@m.gmane.org; Fri, 28 Feb 2014 18:12:20 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43222) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WJWbC-0000d3-QP for bug-gnu-emacs@gnu.org; Fri, 28 Feb 2014 18:12:18 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WJWb5-0008O4-D6 for bug-gnu-emacs@gnu.org; Fri, 28 Feb 2014 18:12:10 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:43864) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WJWb5-0008Nw-8d for bug-gnu-emacs@gnu.org; Fri, 28 Feb 2014 18:12:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WJWb4-00005S-Ko for bug-gnu-emacs@gnu.org; Fri, 28 Feb 2014 18:12:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Aleksey Cherepanov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 28 Feb 2014 23:12:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16800 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 16800-submit@debbugs.gnu.org id=B16800.1393629110317 (code B ref 16800); Fri, 28 Feb 2014 23:12:02 +0000 Original-Received: (at 16800) by debbugs.gnu.org; 28 Feb 2014 23:11:50 +0000 Original-Received: from localhost ([127.0.0.1]:45046 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJWar-000052-OI for submit@debbugs.gnu.org; Fri, 28 Feb 2014 18:11:50 -0500 Original-Received: from mail-lb0-f173.google.com ([209.85.217.173]:52964) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJWao-0008WV-Gr for 16800@debbugs.gnu.org; Fri, 28 Feb 2014 18:11:47 -0500 Original-Received: by mail-lb0-f173.google.com with SMTP id p9so3152115lbv.18 for <16800@debbugs.gnu.org>; Fri, 28 Feb 2014 15:11:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=IKqrCq06YDoP+9ocfuegmp1Hz7qMJLL5xdhFt70tah4=; b=rIK+aF9KDOnwqsRDYJjhur480BYqHNu/Hd3rU7xVFpc8l7MtdnEPfhN3T0h1ocLiS5 9UC86EkkGRsnxf1vAfWwMQDZDq7lMd4s85+AScDciIf3EABjTaUxA2sXMdR2hK3G8taU uJat/q2iZOQ9K+dO6i8KRvhBUeNJQo+AooIV8+jzHdz1cb0aK8yst38nnTi8vQSmGsCq AqN3xJXbWWCu0TWGDKNgokUvcrb585FqmJPe1KPuifqSTyco/rvurXnDUs9AiMxrKG9l 4dsP7o8d+hbBUIZsE5EMc3kx2670NkfANrHvkd8sk9jDkKschosfPoqD74yIWJx1HR3J jejw== X-Received: by 10.112.172.98 with SMTP id bb2mr93887lbc.69.1393629105191; Fri, 28 Feb 2014 15:11:45 -0800 (PST) Original-Received: from openwall.com ([188.123.230.115]) by mx.google.com with ESMTPSA id gi5sm5709086lbc.4.2014.02.28.15.11.43 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 28 Feb 2014 15:11:43 -0800 (PST) Content-Disposition: inline In-Reply-To: <20140228114545.GA8669@agmartin.aq.upm.es> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:86419 Archived-At: --KsGdsel6WgEHnImy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, Agustin! Wow! Your 'if' in 'while's condition is very elegant. Nice! On Fri, Feb 28, 2014 at 12:45:45PM +0100, Agustin Martin wrote: > On Wed, Feb 26, 2014 at 09:32:02PM +0100, Agustin Martin wrote: > > On Mon, Feb 24, 2014 at 08:03:17PM +0400, Aleksey Cherepanov wrote: > > > I played with different (maybe wrong) implementations of > > > flyspell-word-search-backward and measured time against t.txt > > > (produced by the one-liner). All implementations are attached. > > > > [ ... Tons of extensive and impressive debugging ... ] > > > > > We could avoid capturing at all. And it works faster as shown by 4 > > > last functions. > > > > Hi, > > > > Thanks a lot for the extensive debugging and for all the suggestions. I > > have been playing with something based in your last function, but trying > > to get something more compact, see below current status > [ ... ] > > I did some efficiency test and it seemed similar to those of your efficient > > functions. Need to check further for corner cases, bugs, etc ... > > Hi, Aleksey > > Please find attached my first candidate for commit. Is similar to what I > sent before, but needed to add an explicit check for word at eob in > `flyspell-word-search-forward'. > > Will try to have more testing before committing. Seems to work well with the > file generated by your one-liner, even with corner cases like new > misspellings added at bob or eob, but the wider the testing the better. I've wrote a small fuzzer. It is in attach. To run it: $ LANG=C emacs -Q --eval '(load-file "t2.el")' Then C-j to start. It modifies buffer you are in. Your -forward function gets stuck. (kbd "nd SPC and C-a") could repeat it. my-test-forward-agustin-fixed contains fix. It incorporates simplified word-end logic: we slip forward using flyspell-get-word, then we check eobp. Though I did not understand why -backward does not need a similar fix and I got the answer: my mistake with (length word) did not allow one word to be marked as duplicate. (if condition nil ...) could be replaced with (unless condition ...) but I do not know what one is more readable. (kbd "nd SPC and SPC nd C-b") fails to highlight the second "nd" as duplicate. It is a problem with bound equal to (length word) in -backward function. I did not check it when I wrote it. > + (search-forward word (length word) t)))) (search-forward word (1+ (length word)) t)))) One "nd" is colored as duplicate due to -backward function after that fix. I did not touch it yet because it is a time for a break for me. > Hope no one will generate files with words containing something in > OTHERCHARS. Why? Otherchars are not rare as of ' is there for "american" dictionary. So even this email contains such words ("while's"). BTW quite interesting flyspell behaviour could be observed with "met'met'and": if you jump back and forth over this word then met'met is highlighted when you are at the beginning and met'and is highlighted when you are at the end. Also "met'met'and met'and" highlights both met'and as mis-spelled (the second met'and is not marked as duplicate). Are there any variables that could affect search like case-fold-search? My fuzzer does not set them but users could have them set. Thanks! -- Regards, Aleksey Cherepanov --KsGdsel6WgEHnImy Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="t2.el" (require 'cl) (require 'flyspell) (setq my-fuzzer-buffer-name "*temp for fuzzer*") (switch-to-buffer my-fuzzer-buffer-name) (unless (= (point-min) (point-max)) (error "Could not operate on non-empty buffer")) (flyspell-mode 1) (random t) ;; Orig (defun my-test-backward-orig (word bound &optional ignore-case) (save-excursion (let ((r '()) (inhibit-point-motion-hooks t) p) (while (and (not r) (setq p (search-backward word bound t))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (if ignore-case (string-equal (downcase (car lw)) (downcase word)) (string-equal (car lw) word))) (setq r p) (goto-char p)))) r))) (defun my-test-forward-orig (word bound) (save-excursion (let ((r '()) (inhibit-point-motion-hooks t) p) (while (and (not r) (setq p (search-forward word bound t))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (string-equal (car lw) word)) (setq r p) (goto-char (1+ p))))) r))) ;; Agustin Martin (defun my-test-backward-agustin (word bound &optional ignore-case) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) p) (while (and (not r) (setq p (if (re-search-backward word-re bound t) (progn (forward-char) (point)) ;; Check if word is at bob (goto-char (point-min)) (search-forward word (length word) t)))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (if ignore-case (string-equal (downcase (car lw)) (downcase word)) (string-equal (car lw) word))) (setq r p) (goto-char p)))) r))) (defun my-test-forward-agustin (word bound) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (word-end (nth 2 (flyspell-get-word))) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) p) (while (and (not r) (setq p (if (= word-end (point-max)) nil ;; Current word is at e-o-b. No forward search (if (re-search-forward word-re bound t) ;; word-re match ends one char after word (progn (backward-char) (point)) ;; Check above does not match similar word at e-o-b (goto-char (point-max)) (search-backward word (- (point-max) (length word)) t))))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (string-equal (car lw) word)) (setq r p) (goto-char (1+ p))))) r))) ;; Fixed (defun my-test-backward-agustin-fixed (word bound &optional ignore-case) ;; (my-test-backward-agustin word bound ignore-case)) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) p) (while (and (not r) (setq p (if (re-search-backward word-re bound t) (progn (forward-char) (point)) ;; Check if word is at bob (goto-char (point-min)) (search-forward word (1+ (length word)) t)))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (if ignore-case (string-equal (downcase (car lw)) (downcase word)) (string-equal (car lw) word))) (setq r p) (goto-char p)))) r))) (defun my-test-forward-agustin-fixed (word bound) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) p) (flyspell-get-word) (while (and (not r) (setq p (if (eobp) nil ;; Current word is at e-o-b. No forward search (if (re-search-forward word-re bound t) ;; word-re match ends one char after word (progn (backward-char) (point)) ;; Check above does not match similar word at e-o-b (goto-char (point-max)) (and (search-backward word (- (point-max) (length word)) t) (goto-char (point-max))))))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (string-equal (car lw) word)) (setq r p) (goto-char (1+ p))))) r))) (defun my-make-test-macro () (let* ((good "met") (sep "SPC") (bad "nd") (oc "'") (bol "C-a") ;; not really eol but enough (eol "C-e") (parts (list good sep bad oc bol eol)) (len (length parts))) (eval `(kbd ,(mapconcat (lambda (a) (nth (random len) parts)) (make-list (1+ (random 100)) 0) " "))))) ;; nil if everythings is equal, ;; 'badtext if text is not equal, ;; position is the first position with different properties. (defun my-compare-strings-with-properties (a b) (if (string= (car a) (car b)) (let ((len (length (car a))) (pos 0) (badpos nil) (faces1 (cadr a)) (faces2 (cadr b))) (while (and (not badpos) (< pos len)) (unless (equal (nth pos faces1) (nth pos faces2)) (setq badpos pos)) ;; (message ">> %d" pos) (setq pos (1+ pos))) ;; (if badpos ;; (progn ;; (message ":>> faces1 %S" faces1) ;; (message ":>> faces2 %S" faces2))) badpos) 'badtext)) (defun my-try-macro (macro) (let ((strings ;; (message ">> count = %d, macro = %S" count macro) (mapcar (lambda (name) (delete-region (point-min) (point-max)) (letf (((symbol-function 'flyspell-word-search-forward) (intern (concat "my-test-forward-" (symbol-name name)))) ((symbol-function 'flyspell-word-search-backward) (intern (concat "my-test-backward-" (symbol-name name))))) ;; (message ">> pre %S %d" name count) (execute-kbd-macro macro) ;; (message ">> post %S %d" name count) ) (list (buffer-string) (mapcar (lambda (pos) (get-char-property pos 'face)) (number-sequence (point-min) (point-max))))) '(orig new)))) (my-compare-strings-with-properties (car strings) (cadr strings)))) ;; It may not reduce to the minimun in one run. It fails at reductions ;; if 2 or more chars should be removed at the same time. (defun my-reduce (macro) (let ((bad (my-try-macro macro)) (fails 0) newmacro) (if bad (while (< fails 100) (let ((pos (random (length macro)))) (setq newmacro (concat (substring macro 0 pos) (substring macro (1+ pos)))) ;; (message ">> %S" macro) ;; (message ">> %S" newmacro) (if (my-try-macro newmacro) (progn (setq fails 0) (setq macro newmacro)) (setq fails (1+ fails))))) (message ":>> We reduce only faulty macros")) macro)) (defun my-reset-new () (defun my-test-backward-new (word bound &optional ignore-case) (my-test-backward-agustin-fixed word bound ignore-case)) (defun my-test-forward-new (word bound) (my-test-forward-agustin-fixed word bound))) (my-reset-new) (defun my-try-mixed-pairs (macro) (if (my-try-macro macro) (progn (my-reset-new) (defun my-test-backward-new (word bound &optional ignore-case) (my-test-backward-orig word bound ignore-case)) (if (my-try-macro macro) (message ":>> Difference is from -forward function")) (my-reset-new) (defun my-test-forward-new (word bound) (my-test-forward-orig word bound)) (if (my-try-macro macro) (message ":>> Difference is from -backward function"))) (message ":>> We mix pairs only for faulty macros"))) (defun my-fuzz () (interactive) (unless (string= (ispell-get-otherchars) "[']") (error "Unexpected not-casechars value")) (buffer-disable-undo) (unwind-protect (let ((more t) (count 0) (time (current-time))) (while (and more (< count (if my-macro 1 15))) (let* ((macro (or my-macro (my-make-test-macro))) (bad (my-try-macro macro))) (setq more (not bad)) (unless more (message ":>> Bad at %S running %S" bad macro) (my-try-mixed-pairs macro) (message ":>> Reduced macro: %S" (my-reduce macro)))) (setq count (1+ count))) (message ":>> Fuzzing: %d macros are finished in %S" count (subtract-time (current-time) time)) (message ":>> %s" (if more "Without differences" "There are differences"))) (buffer-enable-undo)) nil) (global-set-key (kbd "C-j") 'my-fuzz) (split-window-right) (other-window 1) (view-echo-area-messages) (other-window 1) ;; For manual debug ;; (defun flyspell-word-search-backward (word bound &optional ignore-case) ;; (my-test-backward-agustin-fixed word bound ignore-case)) ;; (defun flyspell-word-search-forward (word bound) ;; (my-test-forward-agustin-fixed word bound)) ;; Define non-nil to run only one test with this macro not randomly (setq my-macro nil) ;; (setq my-macro (kbd "nd SPC and SPC nd C-a")) ;; (setq my-macro (kbd "nd SPC nd C-a")) ;; (setq my-macro (kbd "nd SPC and C-a")) ;; (setq my-macro (kbd "n SPC n C-a")) (setq my-macro (kbd "nd C-e")) --KsGdsel6WgEHnImy--