unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
To: Agustin Martin <agustin.martin@hispalinux.es>
Cc: 16800@debbugs.gnu.org
Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file
Date: Sat, 1 Mar 2014 03:11:41 +0400	[thread overview]
Message-ID: <20140228231141.GA20782@openwall.com> (raw)
In-Reply-To: <20140228114545.GA8669@agmartin.aq.upm.es>

[-- Attachment #1: Type: text/plain, Size: 3303 bytes --]

Hi, Agustin!

Wow! Your 'if' in 'while's condition is very elegant. Nice!

On Fri, Feb 28, 2014 at 12:45:45PM +0100, Agustin Martin wrote:
> On Wed, Feb 26, 2014 at 09:32:02PM +0100, Agustin Martin wrote:
> > On Mon, Feb 24, 2014 at 08:03:17PM +0400, Aleksey Cherepanov wrote:
> > > I played with different (maybe wrong) implementations of
> > > flyspell-word-search-backward and measured time against t.txt
> > > (produced by the one-liner). All implementations are attached.
> > 
> > [ ... Tons of extensive and impressive debugging ... ]
> > 
> > > We could avoid capturing at all. And it works faster as shown by 4
> > > last functions.
> > 
> > Hi, 
> > 
> > Thanks a lot for the extensive debugging and for all the suggestions. I
> > have been playing with something based in your last function, but trying
> > to get something more compact, see below current status
> [ ... ]
> > I did some efficiency test and it seemed similar to those of your efficient
> > functions. Need to check further for corner cases, bugs, etc ...
> 
> Hi, Aleksey
> 
> Please find attached my first candidate for commit. Is similar to what I
> sent before, but needed to add an explicit check for word at eob in
> `flyspell-word-search-forward'. 
> 
> Will try to have more testing before committing. Seems to work well with the
> file generated by your one-liner, even with corner cases like new
> misspellings added at bob or eob, but the wider the testing the better.

I've wrote a small fuzzer. It is in attach. To run it:
$ LANG=C emacs -Q --eval '(load-file "t2.el")'
Then C-j to start. It modifies buffer you are in.

Your -forward function gets stuck. (kbd "nd SPC and C-a") could repeat
it. my-test-forward-agustin-fixed contains fix. It incorporates
simplified word-end logic: we slip forward using flyspell-get-word,
then we check eobp. Though I did not understand why -backward does not
need a similar fix and I got the answer: my mistake with (length word)
did not allow one word to be marked as duplicate.

(if condition nil ...) could be replaced with (unless condition ...)
but I do not know what one is more readable.


(kbd "nd SPC and SPC nd C-b") fails to highlight the second "nd" as
duplicate. It is a problem with bound equal to (length word) in
-backward function. I did not check it when I wrote it.
> +			 (search-forward word (length word) t))))
                         (search-forward word (1+ (length word)) t))))

One "nd" is colored as duplicate due to -backward function after that
fix. I did not touch it yet because it is a time for a break for me.

> Hope no one will generate files with words containing something in
> OTHERCHARS.

Why?

Otherchars are not rare as of ' is there for "american" dictionary. So
even this email contains such words ("while's").

BTW quite interesting flyspell behaviour could be observed with
"met'met'and": if you jump back and forth over this word then met'met
is highlighted when you are at the beginning and met'and is
highlighted when you are at the end.

Also "met'met'and met'and" highlights both met'and as mis-spelled (the
second met'and is not marked as duplicate).


Are there any variables that could affect search like
case-fold-search? My fuzzer does not set them but users could have
them set.

Thanks!

-- 
Regards,
Aleksey Cherepanov

[-- Attachment #2: t2.el --]
[-- Type: text/plain, Size: 10856 bytes --]

(require 'cl)
(require 'flyspell)

(setq my-fuzzer-buffer-name "*temp for fuzzer*")
(switch-to-buffer my-fuzzer-buffer-name)
(unless (= (point-min) (point-max))
  (error "Could not operate on non-empty buffer"))

(flyspell-mode 1)

(random t)

;; Orig
(defun my-test-backward-orig (word bound &optional ignore-case)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-backward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))
(defun my-test-forward-orig (word bound)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-forward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Agustin Martin
(defun my-test-backward-agustin (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (length word) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin (word bound)
  (save-excursion
    (let* ((r '())
          (inhibit-point-motion-hooks t)
          (word-end (nth 2 (flyspell-get-word)))
          (flyspell-not-casechars (flyspell-get-not-casechars))
          (word-re (concat flyspell-not-casechars
                           (regexp-quote word)
                           flyspell-not-casechars))
          p)
      (while
         (and (not r)
              (setq p (if (= word-end (point-max))
                          nil ;; Current word is at e-o-b. No forward search
                        (if (re-search-forward word-re bound t)
                            ;; word-re match ends one char after word
                            (progn (backward-char) (point))
                          ;; Check above does not match similar word at e-o-b
                          (goto-char (point-max))
                          (search-backward word (- (point-max)
                                                   (length word)) t)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Fixed
(defun my-test-backward-agustin-fixed (word bound &optional ignore-case)
  ;; (my-test-backward-agustin word bound ignore-case))
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (1+ (length word)) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin-fixed (word bound)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat flyspell-not-casechars
                            (regexp-quote word)
                            flyspell-not-casechars))
           p)
      (flyspell-get-word)
      (while
          (and (not r)
               (setq p (if (eobp)
                           nil ;; Current word is at e-o-b. No forward search
                         (if (re-search-forward word-re bound t)
                             ;; word-re match ends one char after word
                             (progn (backward-char) (point))
                           ;; Check above does not match similar word at e-o-b
                           (goto-char (point-max))
                           (and (search-backward word (- (point-max)
                                                         (length word)) t)
                                (goto-char (point-max)))))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

(defun my-make-test-macro ()
  (let* ((good "met")
         (sep "SPC")
         (bad "nd")
         (oc "'")
         (bol "C-a")
         ;; not really eol but enough
         (eol "C-e")
         (parts (list good sep bad oc bol eol))
         (len (length parts)))
    (eval `(kbd ,(mapconcat (lambda (a)
                              (nth (random len) parts))
                            (make-list (1+ (random 100)) 0)
                            " ")))))

;; nil if everythings is equal,
;; 'badtext if text is not equal,
;; position is the first position with different properties.
(defun my-compare-strings-with-properties (a b)
  (if (string= (car a) (car b))
      (let ((len (length (car a)))
            (pos 0)
            (badpos nil)
            (faces1 (cadr a))
            (faces2 (cadr b)))
        (while (and (not badpos)
                    (< pos len))
          (unless (equal (nth pos faces1)
                         (nth pos faces2))
              (setq badpos pos))
          ;; (message ">> %d" pos)
          (setq pos (1+ pos)))
        ;; (if badpos
        ;;     (progn
        ;;       (message ":>> faces1 %S" faces1)
        ;;       (message ":>> faces2 %S" faces2)))
        badpos)
    'badtext))

(defun my-try-macro (macro)
  (let ((strings
         ;; (message ">> count = %d, macro = %S" count macro)
         (mapcar
          (lambda (name)
            (delete-region (point-min) (point-max))
            (letf (((symbol-function 'flyspell-word-search-forward)
                    (intern (concat "my-test-forward-" (symbol-name name))))
                   ((symbol-function 'flyspell-word-search-backward)
                    (intern (concat "my-test-backward-" (symbol-name name)))))
              ;; (message ">> pre %S %d" name count)
              (execute-kbd-macro macro)
              ;; (message ">> post %S %d" name count)
              )
            (list (buffer-string)
                  (mapcar (lambda (pos)
                            (get-char-property pos 'face))
                          (number-sequence (point-min) (point-max)))))
          '(orig new))))
    (my-compare-strings-with-properties (car strings) (cadr strings))))

;; It may not reduce to the minimun in one run. It fails at reductions
;; if 2 or more chars should be removed at the same time.
(defun my-reduce (macro)
  (let ((bad (my-try-macro macro))
        (fails 0)
        newmacro)
    (if bad
      (while (< fails 100)
      (let ((pos (random (length macro))))
        (setq newmacro (concat (substring macro 0 pos) (substring macro (1+ pos))))
        ;; (message ">> %S" macro)
        ;; (message ">> %S" newmacro)
        (if (my-try-macro newmacro)
            (progn
              (setq fails 0)
              (setq macro newmacro))
          (setq fails (1+ fails)))))
      (message ":>>   We reduce only faulty macros"))
    macro))

(defun my-reset-new ()
  (defun my-test-backward-new (word bound &optional ignore-case)
    (my-test-backward-agustin-fixed word bound ignore-case))
  (defun my-test-forward-new (word bound)
    (my-test-forward-agustin-fixed word bound)))
(my-reset-new)

(defun my-try-mixed-pairs (macro)
  (if (my-try-macro macro)
      (progn
        (my-reset-new)
        (defun my-test-backward-new (word bound &optional ignore-case)
          (my-test-backward-orig word bound ignore-case))
        (if (my-try-macro macro)
            (message ":>> Difference is from -forward function"))
        (my-reset-new)
        (defun my-test-forward-new (word bound)
          (my-test-forward-orig word bound))
        (if (my-try-macro macro)
            (message ":>> Difference is from -backward function")))
    (message ":>>   We mix pairs only for faulty macros")))

(defun my-fuzz ()
  (interactive)
  (unless (string= (ispell-get-otherchars) "[']")
    (error "Unexpected not-casechars value"))
  (buffer-disable-undo)
  (unwind-protect
      (let ((more t)
            (count 0)
            (time (current-time)))
        (while (and more
                    (< count (if my-macro 1 15)))
          (let* ((macro (or my-macro (my-make-test-macro)))
                 (bad (my-try-macro macro)))
            (setq more (not bad))
            (unless more
              (message ":>> Bad at %S running %S" bad macro)
              (my-try-mixed-pairs macro)
              (message ":>> Reduced macro: %S" (my-reduce macro))))
          (setq count (1+ count)))
        (message ":>> Fuzzing: %d macros are finished in %S"
                 count
                 (subtract-time (current-time) time))
        (message ":>> %s" (if more "Without differences" "There are differences")))
    (buffer-enable-undo))
  nil)
(global-set-key (kbd "C-j") 'my-fuzz)

(split-window-right)
(other-window 1)
(view-echo-area-messages)
(other-window 1)

;; For manual debug

;; (defun flyspell-word-search-backward (word bound &optional ignore-case)
;;   (my-test-backward-agustin-fixed word bound ignore-case))
;; (defun flyspell-word-search-forward (word bound)
;;   (my-test-forward-agustin-fixed word bound))

;; Define non-nil to run only one test with this macro not randomly
(setq my-macro nil)

;; (setq my-macro (kbd "nd SPC and SPC nd C-a"))
;; (setq my-macro (kbd "nd SPC nd C-a"))

;; (setq my-macro (kbd "nd SPC and C-a"))

;; (setq my-macro (kbd "n SPC n C-a"))

(setq my-macro (kbd "nd C-e"))


  parent reply	other threads:[~2014-02-28 23:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18 20:56 bug#16800: 24.3; flyspell works slow on very short words at the end of big file Aleksey Cherepanov
2014-02-21 10:15 ` Eli Zaretskii
2014-02-21 14:38   ` Agustin Martin
2014-02-21 15:12     ` Eli Zaretskii
2014-02-21 15:21       ` Eli Zaretskii
2014-02-22 12:44       ` Aleksey Cherepanov
2014-02-22 13:10         ` Eli Zaretskii
2014-02-22 16:02           ` Aleksey Cherepanov
2014-02-22 16:41             ` Eli Zaretskii
2014-02-22 18:55               ` Aleksey Cherepanov
2014-02-22 20:16                 ` Aleksey Cherepanov
2014-02-22 21:03                 ` Eli Zaretskii
2014-02-23  1:26                   ` Agustin Martin
2014-02-23 18:36                     ` Eli Zaretskii
2014-02-23 19:56                     ` Aleksey Cherepanov
2014-02-23 23:02                       ` Aleksey Cherepanov
2014-02-24 16:03                         ` Aleksey Cherepanov
2014-02-26 20:32                           ` Agustin Martin
2014-02-28 11:45                             ` Agustin Martin
2014-02-28 11:51                               ` Eli Zaretskii
2014-03-01 21:44                                 ` Aleksey Cherepanov
2014-03-02  3:56                                   ` Eli Zaretskii
2014-03-09 17:36                                     ` Agustin Martin
2014-03-09 18:02                                       ` Aleksey Cherepanov
2014-03-09 18:24                                         ` Eli Zaretskii
2014-02-28 23:11                               ` Aleksey Cherepanov [this message]
2014-03-01 10:33                                 ` Aleksey Cherepanov
2014-03-01 15:50                                   ` Aleksey Cherepanov
2014-03-01 21:39                                 ` Aleksey Cherepanov
2014-03-09 17:25                                 ` Agustin Martin
2015-03-06 21:46                                   ` Agustin Martin
2015-03-07  8:09                                     ` Eli Zaretskii
2014-02-23 20:39                     ` Aleksey Cherepanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140228231141.GA20782@openwall.com \
    --to=aleksey.4erepanov@gmail.com \
    --cc=16800@debbugs.gnu.org \
    --cc=agustin.martin@hispalinux.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).