From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Aleksey Cherepanov Newsgroups: gmane.emacs.bugs Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file Date: Sun, 2 Mar 2014 01:39:06 +0400 Message-ID: <20140301213906.GA13523@openwall.com> References: <83ios72j8b.fsf@gnu.org> <20140222185511.GA23643@openwall.com> <838ut23lo9.fsf@gnu.org> <20140223195659.GA23581@openwall.com> <20140223230251.GA30257@openwall.com> <20140224160317.GA2475@openwall.com> <20140226203202.GA23749@agmartin.aq.upm.es> <20140228114545.GA8669@agmartin.aq.upm.es> <20140228231141.GA20782@openwall.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="liOOAslEiF7prFVr" X-Trace: ger.gmane.org 1393710015 1526 80.91.229.3 (1 Mar 2014 21:40:15 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 1 Mar 2014 21:40:15 +0000 (UTC) Cc: 16800@debbugs.gnu.org To: Agustin Martin Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Mar 01 22:40:23 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WJrdu-0003f5-SW for geb-bug-gnu-emacs@m.gmane.org; Sat, 01 Mar 2014 22:40:23 +0100 Original-Received: from localhost ([::1]:60944 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WJrdu-00041X-Eg for geb-bug-gnu-emacs@m.gmane.org; Sat, 01 Mar 2014 16:40:22 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47711) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WJrdk-0003lg-79 for bug-gnu-emacs@gnu.org; Sat, 01 Mar 2014 16:40:19 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WJrdc-00027G-Om for bug-gnu-emacs@gnu.org; Sat, 01 Mar 2014 16:40:12 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:45852) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WJrdc-00026U-LE for bug-gnu-emacs@gnu.org; Sat, 01 Mar 2014 16:40:04 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WJrdb-0007d4-Fl for bug-gnu-emacs@gnu.org; Sat, 01 Mar 2014 16:40:03 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Aleksey Cherepanov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 01 Mar 2014 21:40:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16800 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 16800-submit@debbugs.gnu.org id=B16800.139370995629255 (code B ref 16800); Sat, 01 Mar 2014 21:40:02 +0000 Original-Received: (at 16800) by debbugs.gnu.org; 1 Mar 2014 21:39:16 +0000 Original-Received: from localhost ([127.0.0.1]:47034 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJrco-0007bm-IL for submit@debbugs.gnu.org; Sat, 01 Mar 2014 16:39:15 -0500 Original-Received: from mail-lb0-f176.google.com ([209.85.217.176]:61720) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WJrck-0007bb-7d for 16800@debbugs.gnu.org; Sat, 01 Mar 2014 16:39:11 -0500 Original-Received: by mail-lb0-f176.google.com with SMTP id 10so3601972lbg.7 for <16800@debbugs.gnu.org>; Sat, 01 Mar 2014 13:39:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=9m1HZPjZMlr+lhVUxZn1QmTp35QfMVwnJRS8oSz0/zY=; b=JCEY91xGZqpk+hiLKkPhOqEWC0soSg2f6BMkUFn/m8z4pJQNtV+nlVtksZqSuNb4Ac /qms56erFUYpTW+HR3t4Y/efnrUxrdUvMUerVEY3m9VLd8zqdWnJT6AVLwOD9BYaBPUz gp/nrjlV9eelLfMndk8h3Ep5BhOEsUTCFgfWfOo2aq6YnZJV1UOxGEZ/bzSK7LkEZKAd 5EMz1Wq3iHmiv5biCIk3m656AIMTRoiZcnyAWISMYgs0BLnSTHYbq1Ng9e5fFI2ZG3/U /cphHxyB6drmMC0XX7/1ELoQb3gz9YWhgJMG8sMQ74/8gJpvw0LVkc6LdIlpLzlexfew nOrA== X-Received: by 10.112.14.1 with SMTP id l1mr16290224lbc.39.1393709949000; Sat, 01 Mar 2014 13:39:09 -0800 (PST) Original-Received: from openwall.com ([188.123.230.115]) by mx.google.com with ESMTPSA id pz10sm9517213lbb.10.2014.03.01.13.39.07 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 01 Mar 2014 13:39:08 -0800 (PST) Content-Disposition: inline In-Reply-To: <20140228231141.GA20782@openwall.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:86459 Archived-At: --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sat, Mar 01, 2014 at 03:11:41AM +0400, Aleksey Cherepanov wrote: > I've wrote a small fuzzer. It is in attach. To run it: > $ LANG=C emacs -Q --eval '(load-file "t2.el")' > Then C-j to start. It modifies buffer you are in. New version is attached. M-j tries last macro or macro specified in my-macro variable. For manual experiments C-o and C-u C-o defines flyspell-word-search-* as my-test-*-(orig|new). Though I improved output so C-j should be enough. > > Hope no one will generate files with words containing something in > > OTHERCHARS. > > Why? > > Otherchars are not rare as of ' is there for "american" dictionary. So > even this email contains such words ("while's"). > > BTW quite interesting flyspell behaviour could be observed with > "met'met'and": if you jump back and forth over this word then met'met > is highlighted when you are at the beginning and met'and is > highlighted when you are at the end. > > Also "met'met'and met'and" highlights both met'and as mis-spelled (the > second met'and is not marked as duplicate). I think original search of "n'n" against "n'n'n'n" finds only (n'n)'(n'n) but not n'(n'n)'n. Our search marks the first word as duplicate running (kbd "n'n SPC en'n'n C-a") while original search does not. What behaviour is preferable? Should the first word of "n'n en'n'n" be marked as duplicate? > Are there any variables that could affect search like > case-fold-search? My fuzzer does not set them but users could have > them set. Also my fuzzer does not try bounds for the search. But we will be in trouble if the search bound is at word bound because we want one more char. Though we could extend bound by 1 char to solve that. Now only forward search is enabled in my fuzzer. Setup it at the end of file as you need. I've implemented a variant of forward search using regexp. It seems that forward search does not get slow from the group in regexp. I did not measured well though. The function is shorter with regexp. Maybe we should make a correct variant before fast one... %-) Also forward search works a bit faster in general. So we could try to implement backward search though forward search. I've removed (goto-char (1+ p)) to not fail on (kbd "nd SPC d'nd SPC nd SPC met C-a"). At the moment the fuzzer could pass several thousands of tests well. You need to wait for fails or improve test generator. Thanks! -- Regards, Aleksey Cherepanov --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="t2.el" Content-Transfer-Encoding: quoted-printable (require 'cl) (require 'flyspell) (setq my-fuzzer-buffer-name "*temp for fuzzer*") (switch-to-buffer my-fuzzer-buffer-name) (unless (=3D (point-min) (point-max)) (error "Could not operate on non-empty buffer")) (flyspell-mode 1) (random t) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;= ;;;;; ;; Implementations ;; Orig (defun my-test-backward-orig (word bound &optional ignore-case) (save-excursion (let ((r '()) (inhibit-point-motion-hooks t) p) (while (and (not r) (setq p (search-backward word bound t))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (if ignore-case (string-equal (downcase (car lw)) (downcase word)) (string-equal (car lw) word))) (setq r p) (goto-char p)))) r))) (defun my-test-forward-orig (word bound) (save-excursion (let ((r '()) (inhibit-point-motion-hooks t) p) (while (and (not r) (setq p (search-forward word bound t))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (string-equal (car lw) word)) (setq r p) (goto-char (1+ p))))) r))) ;; Agustin Martin (defun my-test-backward-agustin (word bound &optional ignore-case) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) p) (while (and (not r) (setq p (if (re-search-backward word-re bound t) (progn (forward-char) (point)) ;; Check if word is at bob (goto-char (point-min)) (search-forward word (length word) t)))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (if ignore-case (string-equal (downcase (car lw)) (downcase word)) (string-equal (car lw) word))) (setq r p) (goto-char p)))) r))) (defun my-test-forward-agustin (word bound) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (word-end (nth 2 (flyspell-get-word))) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) p) (while (and (not r) (setq p (if (=3D word-end (point-max)) nil ;; Current word is at e-o-b. No forward search (if (re-search-forward word-re bound t) ;; word-re match ends one char after word (progn (backward-char) (point)) ;; Check above does not match similar word at e-o= -b (goto-char (point-max)) (search-backward word (- (point-max) (length word)) t))))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (string-equal (car lw) word)) (setq r p) (goto-char (1+ p))))) r))) ;; Fixed (defun my-test-backward-agustin-fixed (word bound &optional ignore-case) ;; (my-test-backward-agustin word bound ignore-case)) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) p) (while (and (not r) (setq p (if (re-search-backward word-re bound t) (progn (forward-char) (point)) ;; Check if word is at bob (goto-char (point-min)) (search-forward word (+ (point-min) (length word)) t)))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (if ignore-case (string-equal (downcase (car lw)) (downcase word)) (string-equal (car lw) word))) (setq r p) (goto-char p)))) r))) (defun my-test-forward-agustin-fixed (word bound) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) p) (flyspell-get-word) (while (and (not r) (setq p (if (eobp) nil ;; Current word is at e-o-b. No forward sear= ch (if (re-search-forward word-re bound t) ;; word-re match ends one char after word (progn (backward-char) (point)) ;; Check above does not match similar word at e-= o-b (goto-char (point-max)) (and (search-backward word (- (point-max) (length word)) t) (goto-char (point-max))))))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (string-equal (car lw) word)) (setq r p) ;; We don't need to move forward due to additional char ;; before word in regexp ;; (goto-char (1+ p)) ))) r))) ;; With eob in regexp (defun my-test-forward-eob (word bound) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) "\\(?:" flyspell-not-casechars "\\|\\'\\)")) p) (while (and (not r) (setq p (and (re-search-forward word-re bound t) (if (eobp) (point) (backward-char) (point))))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (string-equal (car lw) word)) (setq r p) ;; We don't need to move forward due to additional char ;; before word in regexp ;; (goto-char (1+ p)) ))) r))) ;; End of Implementations ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;= ;;;;; ;; Fuzzer (defun my-make-test-macro () (let* ((good "met") (sep "SPC") (bad "nd") (oc "'") (bol "C-a") ;; not really eol but enough (eol "C-e") (parts (list good sep bad oc bol eol)) (len (length parts))) (eval `(kbd ,(mapconcat (lambda (a) (nth (random len) parts)) (make-list (1+ (random 100)) 0) " "))))) ;; nil if everythings is equal, ;; 'badtext if text is not equal, ;; position is the first position with different properties. (defun my-compare-strings-with-properties (a b) (if (string=3D (car a) (car b)) (let ((len (length (car a))) (pos 0) (badpos nil) (faces1 (cadr a)) (faces2 (cadr b))) (while (and (not badpos) (< pos len)) (unless (equal (nth pos faces1) (nth pos faces2)) (setq badpos pos)) ;; (message ">> %d" pos) (setq pos (1+ pos))) (if my-show-faces (if badpos (progn (message ":>> faces1 %S" faces1) (message ":>> faces2 %S" faces2)) (message ":>> No diff"))) badpos) 'badtext)) (defun my-make-string-with-faces (a) (let ((str (car a)) (faces (cadr a))) (mapcar (lambda (pos) (set-text-properties pos (1+ pos) `(fontified t font-lock-face ,(nth pos f= aces)) str)) (number-sequence 0 (1- (length str)))) str)) (defun my-make-strings-with-faces (a b) (concat "\n:>> orig:" (my-make-string-with-faces a) "\n:>> new:" (my-make-string-with-faces b) "\n")) (setq my-show-nice-faces nil) (defun my-try-macro (macro) (let ((strings ;; (message ">> count =3D %d, macro =3D %S" count macro) (mapcar (lambda (name) (delete-region (point-min) (point-max)) (letf (((symbol-function 'flyspell-word-search-forward) (intern (concat "my-test-forward-" (symbol-name name)))) ((symbol-function 'flyspell-word-search-backward) (intern (concat "my-test-backward-" (symbol-name name))= ))) ;; (message ">> pre %S %d" name count) (execute-kbd-macro macro) ;; (message ">> post %S %d" name count) ) (list (buffer-string) (mapcar (lambda (pos) (get-char-property pos 'face)) (number-sequence (point-min) (point-max))))) '(orig new)))) (let ((bad (my-compare-strings-with-properties (car strings) (cadr stri= ngs)))) (if (and bad my-show-nice-faces) (with-current-buffer "*Messages*" (insert (my-make-strings-with-faces (car strings) (cadr strings= ))))) bad))) ;; It may not reduce to the minimun in one run. It fails at reductions ;; if 2 or more chars should be removed at the same time. (defun my-reduce (macro) (let ((bad (my-try-macro macro)) (fails 0) newmacro) (if bad (while (< fails 100) (let ((pos (random (length macro)))) (setq newmacro (concat (substring macro 0 pos) (substring macro (1+= pos)))) ;; (message ">> %S" macro) ;; (message ">> %S" newmacro) (if (my-try-macro newmacro) (progn (setq fails 0) (setq macro newmacro)) (setq fails (1+ fails))))) (message ":>> We reduce only faulty macros")) macro)) (defun my-try-mixed-pairs (macro) (unwind-protect (if (my-try-macro macro) (progn (my-reset-new) (defun my-test-backward-new (word bound &optional ignore-case) (my-test-backward-orig word bound ignore-case)) (if (my-try-macro macro) (message ":>> Difference is from -forward function")) (my-reset-new) (defun my-test-forward-new (word bound) (my-test-forward-orig word bound)) (if (my-try-macro macro) (message ":>> Difference is from -backward function"))) (message ":>> We mix pairs only for faulty macros")) (my-reset-new))) (defun my-fuzz () (interactive) (unless (string=3D (ispell-get-otherchars) "[']") (error "Unexpected not-casechars value")) (buffer-disable-undo) (unwind-protect (let ((more t) (count 0) (update-step 100) (time (current-time))) (while (and more (< count (if my-macro 1 1000))) (let* ((macro (or my-macro (my-make-test-macro))) (bad (let ((my-show-nice-faces t)) (my-try-macro macro)))) (setq more (not bad)) (unless more (if (numberp bad) (message ":>> pos :%s^" (make-string bad ? ))) (message ":>> Bad at %S running %S" bad macro) (my-try-mixed-pairs macro) (setq my-macro-last (my-reduce macro)) (message ":>> Reduced macro: %S" my-macro-last)) (if (=3D 0 (% count update-step)) (message ":>> In progress, count =3D %d (shows between ever= y %d)" count update-step))) (setq count (1+ count))) (message ":>> Fuzzing: %d macros are finished in %S" count (subtract-time (current-time) time)) (message ":>> %s" (if more "Without differences" "There are differe= nces"))) (buffer-enable-undo)) nil) (global-set-key (kbd "C-j") 'my-fuzz) ;; use -orig with prefix arg, ;; use -new without prefix arg (defun my-choose-flyspell-funcs (arg) (interactive "P") (if arg (progn (defun flyspell-word-search-backward (word bound &optional ignore-c= ase) (my-test-backward-orig word bound ignore-case)) (defun flyspell-word-search-forward (word bound) (my-test-forward-orig word bound)) (message ">> Using orig")) (defun flyspell-word-search-backward (word bound &optional ignore-case) (my-test-backward-new word bound ignore-case)) (defun flyspell-word-search-forward (word bound) (my-test-forward-new word bound)) (message ">> Using new"))) (global-set-key (kbd "C-o") 'my-choose-flyspell-funcs) (setq my-macro-last nil) (setq my-show-faces nil) (defun my-show-faces-func () (interactive) (let ((macro (or my-macro my-macro-last))) (if macro (let ((my-show-faces t)) (my-try-macro macro)) (error "No macro specified")))) (global-set-key (kbd "M-j") 'my-show-faces-func) (split-window-right) (other-window 1) (view-echo-area-messages) (other-window 1) ;; For manual debug ;; (defun flyspell-word-search-backward (word bound &optional ignore-case) ;; (my-test-backward-agustin-fixed word bound ignore-case)) ;; (defun flyspell-word-search-forward (word bound) ;; (my-test-forward-agustin-fixed word bound)) ;; Change this to use other functions instead of -agustin-fixed (defun my-reset-new () (defun my-test-backward-new (word bound &optional ignore-case) ;; (my-test-backward-agustin-fixed word bound ignore-case)) (my-test-backward-orig word bound ignore-case)) (defun my-test-forward-new (word bound) ;; (my-test-forward-agustin-fixed word bound))) ;; (my-test-forward-agustin word bound))) (my-test-forward-eob word bound))) (my-reset-new) ;; Define non-nil to run only one test with this macro not randomly (setq my-macro nil) ;; (setq my-macro (kbd "nd SPC and SPC nd C-a")) ;; (setq my-macro (kbd "nd SPC nd C-a")) ;; (setq my-macro (kbd "nd SPC and C-a")) ;; (setq my-macro (kbd "n SPC n C-a")) ;; (setq my-macro (kbd "nd C-e")) ;; (setq my-macro "'nd end'nd'nd=01nd=01") ;; (setq my-macro "'n en'n'n=01n=01") (setq my-macro ;; "'nd end'nd'nd=01nd=01" ;; "'n en'n'n=01n=01" ;; "n'n en'n'n=01" "a n'n en'n'n=01" ;; "n'n n'n'n=01" ;; "n'n n'n'n=02=02=02=02=02=02=02=02" ;; "d'nd=01ndmet =05met ndmet=01" ;; "d'n=01nd =05d nd=01" ;; "nd d'nd nd=01" ;; "nd d'nd nd met=01" ) --liOOAslEiF7prFVr--