unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
To: Agustin Martin <agustin.martin@hispalinux.es>
Cc: 16800@debbugs.gnu.org
Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file
Date: Sun, 2 Mar 2014 01:39:06 +0400	[thread overview]
Message-ID: <20140301213906.GA13523@openwall.com> (raw)
In-Reply-To: <20140228231141.GA20782@openwall.com>

[-- Attachment #1: Type: text/plain, Size: 2420 bytes --]

On Sat, Mar 01, 2014 at 03:11:41AM +0400, Aleksey Cherepanov wrote:
> I've wrote a small fuzzer. It is in attach. To run it:
> $ LANG=C emacs -Q --eval '(load-file "t2.el")'
> Then C-j to start. It modifies buffer you are in.

New version is attached.

M-j tries last macro or macro specified in my-macro variable.
For manual experiments C-o and C-u C-o defines flyspell-word-search-*
as my-test-*-(orig|new). Though I improved output so C-j should be
enough.

> > Hope no one will generate files with words containing something in
> > OTHERCHARS.
> 
> Why?
> 
> Otherchars are not rare as of ' is there for "american" dictionary. So
> even this email contains such words ("while's").
> 
> BTW quite interesting flyspell behaviour could be observed with
> "met'met'and": if you jump back and forth over this word then met'met
> is highlighted when you are at the beginning and met'and is
> highlighted when you are at the end.
> 
> Also "met'met'and met'and" highlights both met'and as mis-spelled (the
> second met'and is not marked as duplicate).

I think original search of "n'n" against "n'n'n'n" finds only
(n'n)'(n'n) but not n'(n'n)'n. Our search marks the first word as
duplicate running (kbd "n'n SPC en'n'n C-a") while original search
does not. What behaviour is preferable? Should the first word of "n'n
en'n'n" be marked as duplicate?

> Are there any variables that could affect search like
> case-fold-search? My fuzzer does not set them but users could have
> them set.

Also my fuzzer does not try bounds for the search. But we will be in
trouble if the search bound is at word bound because we want one more
char. Though we could extend bound by 1 char to solve that.


Now only forward search is enabled in my fuzzer. Setup it at the end
of file as you need.

I've implemented a variant of forward search using regexp. It seems
that forward search does not get slow from the group in regexp. I did
not measured well though. The function is shorter with regexp. Maybe
we should make a correct variant before fast one... %-)

Also forward search works a bit faster in general. So we could try to
implement backward search though forward search.

I've removed
	    (goto-char (1+ p))
to not fail on (kbd "nd SPC d'nd SPC nd SPC met C-a").

At the moment the fuzzer could pass several thousands of tests well.
You need to wait for fails or improve test generator.

Thanks!

-- 
Regards,
Aleksey Cherepanov

[-- Attachment #2: t2.el --]
[-- Type: text/plain, Size: 15532 bytes --]

(require 'cl)
(require 'flyspell)

(setq my-fuzzer-buffer-name "*temp for fuzzer*")
(switch-to-buffer my-fuzzer-buffer-name)
(unless (= (point-min) (point-max))
  (error "Could not operate on non-empty buffer"))

(flyspell-mode 1)

(random t)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Implementations

;; Orig
(defun my-test-backward-orig (word bound &optional ignore-case)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-backward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))
(defun my-test-forward-orig (word bound)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-forward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Agustin Martin
(defun my-test-backward-agustin (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (length word) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin (word bound)
  (save-excursion
    (let* ((r '())
          (inhibit-point-motion-hooks t)
          (word-end (nth 2 (flyspell-get-word)))
          (flyspell-not-casechars (flyspell-get-not-casechars))
          (word-re (concat flyspell-not-casechars
                           (regexp-quote word)
                           flyspell-not-casechars))
          p)
      (while
         (and (not r)
              (setq p (if (= word-end (point-max))
                          nil ;; Current word is at e-o-b. No forward search
                        (if (re-search-forward word-re bound t)
                            ;; word-re match ends one char after word
                            (progn (backward-char) (point))
                          ;; Check above does not match similar word at e-o-b
                          (goto-char (point-max))
                          (search-backward word (- (point-max)
                                                   (length word)) t)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Fixed
(defun my-test-backward-agustin-fixed (word bound &optional ignore-case)
  ;; (my-test-backward-agustin word bound ignore-case))
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (+ (point-min)
                                                 (length word)) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin-fixed (word bound)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat flyspell-not-casechars
                            (regexp-quote word)
                            flyspell-not-casechars))
           p)
      (flyspell-get-word)
      (while
          (and (not r)
               (setq p (if (eobp)
                           nil ;; Current word is at e-o-b. No forward search
                         (if (re-search-forward word-re bound t)
                             ;; word-re match ends one char after word
                             (progn (backward-char) (point))
                           ;; Check above does not match similar word at e-o-b
                           (goto-char (point-max))
                           (and (search-backward word (- (point-max)
                                                         (length word)) t)
                                (goto-char (point-max)))))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
            ;; We don't need to move forward due to additional char
            ;; before word in regexp
	    ;; (goto-char (1+ p))
            )))
      r)))

;; With eob in regexp
(defun my-test-forward-eob (word bound)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat flyspell-not-casechars
                            (regexp-quote word)
                            "\\(?:" flyspell-not-casechars "\\|\\'\\)"))
           p)
      (while
          (and (not r)
               (setq p (and
                        (re-search-forward word-re bound t)
                        (if (eobp)
                            (point)
                          (backward-char)
                          (point)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
            ;; We don't need to move forward due to additional char
            ;; before word in regexp
	    ;; (goto-char (1+ p))
            )))
      r)))

;; End of Implementations
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Fuzzer

(defun my-make-test-macro ()
  (let* ((good "met")
         (sep "SPC")
         (bad "nd")
         (oc "'")
         (bol "C-a")
         ;; not really eol but enough
         (eol "C-e")
         (parts (list good sep bad oc bol eol))
         (len (length parts)))
    (eval `(kbd ,(mapconcat (lambda (a)
                              (nth (random len) parts))
                            (make-list (1+ (random 100)) 0)
                            " ")))))

;; nil if everythings is equal,
;; 'badtext if text is not equal,
;; position is the first position with different properties.
(defun my-compare-strings-with-properties (a b)
  (if (string= (car a) (car b))
      (let ((len (length (car a)))
            (pos 0)
            (badpos nil)
            (faces1 (cadr a))
            (faces2 (cadr b)))
        (while (and (not badpos)
                    (< pos len))
          (unless (equal (nth pos faces1)
                         (nth pos faces2))
              (setq badpos pos))
          ;; (message ">> %d" pos)
          (setq pos (1+ pos)))
        (if my-show-faces
            (if badpos
                (progn
                  (message ":>> faces1 %S" faces1)
                  (message ":>> faces2 %S" faces2))
              (message ":>> No diff")))
        badpos)
    'badtext))

(defun my-make-string-with-faces (a)
  (let ((str (car a))
        (faces (cadr a)))
    (mapcar (lambda (pos)
              (set-text-properties pos
                                   (1+ pos)
                                   `(fontified t font-lock-face ,(nth pos faces))
                                   str))
            (number-sequence 0 (1- (length str))))
    str))
(defun my-make-strings-with-faces (a b)
  (concat "\n:>> orig:"
          (my-make-string-with-faces a)
          "\n:>>  new:"
          (my-make-string-with-faces b)
          "\n"))

(setq my-show-nice-faces nil)

(defun my-try-macro (macro)
  (let ((strings
         ;; (message ">> count = %d, macro = %S" count macro)
         (mapcar
          (lambda (name)
            (delete-region (point-min) (point-max))
            (letf (((symbol-function 'flyspell-word-search-forward)
                    (intern (concat "my-test-forward-" (symbol-name name))))
                   ((symbol-function 'flyspell-word-search-backward)
                    (intern (concat "my-test-backward-" (symbol-name name)))))
              ;; (message ">> pre %S %d" name count)
              (execute-kbd-macro macro)
              ;; (message ">> post %S %d" name count)
              )
            (list (buffer-string)
                  (mapcar (lambda (pos)
                            (get-char-property pos 'face))
                          (number-sequence (point-min) (point-max)))))
          '(orig new))))
    (let ((bad (my-compare-strings-with-properties (car strings) (cadr strings))))
      (if (and bad my-show-nice-faces)
          (with-current-buffer "*Messages*"
            (insert (my-make-strings-with-faces (car strings) (cadr strings)))))
      bad)))

;; It may not reduce to the minimun in one run. It fails at reductions
;; if 2 or more chars should be removed at the same time.
(defun my-reduce (macro)
  (let ((bad (my-try-macro macro))
        (fails 0)
        newmacro)
    (if bad
      (while (< fails 100)
      (let ((pos (random (length macro))))
        (setq newmacro (concat (substring macro 0 pos) (substring macro (1+ pos))))
        ;; (message ">> %S" macro)
        ;; (message ">> %S" newmacro)
        (if (my-try-macro newmacro)
            (progn
              (setq fails 0)
              (setq macro newmacro))
          (setq fails (1+ fails)))))
      (message ":>>   We reduce only faulty macros"))
    macro))

(defun my-try-mixed-pairs (macro)
  (unwind-protect
      (if (my-try-macro macro)
          (progn
            (my-reset-new)
            (defun my-test-backward-new (word bound &optional ignore-case)
              (my-test-backward-orig word bound ignore-case))
            (if (my-try-macro macro)
                (message ":>> Difference is from -forward function"))
            (my-reset-new)
            (defun my-test-forward-new (word bound)
              (my-test-forward-orig word bound))
            (if (my-try-macro macro)
                (message ":>> Difference is from -backward function")))
        (message ":>>   We mix pairs only for faulty macros"))
    (my-reset-new)))

(defun my-fuzz ()
  (interactive)
  (unless (string= (ispell-get-otherchars) "[']")
    (error "Unexpected not-casechars value"))
  (buffer-disable-undo)
  (unwind-protect
      (let ((more t)
            (count 0)
            (update-step 100)
            (time (current-time)))
        (while (and more
                    (< count (if my-macro 1 1000)))
          (let* ((macro (or my-macro (my-make-test-macro)))
                 (bad (let ((my-show-nice-faces t))
                        (my-try-macro macro))))
            (setq more (not bad))
            (unless more
              (if (numberp bad)
                  (message ":>> pos :%s^" (make-string bad ? )))
              (message ":>> Bad at %S running %S" bad macro)
              (my-try-mixed-pairs macro)
              (setq my-macro-last (my-reduce macro))
              (message ":>> Reduced macro: %S" my-macro-last))
            (if (= 0 (% count update-step))
                (message ":>> In progress, count = %d (shows between every %d)" count update-step)))
          (setq count (1+ count)))
        (message ":>> Fuzzing: %d macros are finished in %S"
                 count
                 (subtract-time (current-time) time))
        (message ":>> %s" (if more "Without differences" "There are differences")))
    (buffer-enable-undo))
  nil)
(global-set-key (kbd "C-j") 'my-fuzz)

;; use -orig with prefix arg,
;; use -new without prefix arg
(defun my-choose-flyspell-funcs (arg)
  (interactive "P")
  (if arg
      (progn
        (defun flyspell-word-search-backward (word bound &optional ignore-case)
          (my-test-backward-orig word bound ignore-case))
        (defun flyspell-word-search-forward (word bound)
          (my-test-forward-orig word bound))
        (message ">> Using orig"))
    (defun flyspell-word-search-backward (word bound &optional ignore-case)
      (my-test-backward-new word bound ignore-case))
    (defun flyspell-word-search-forward (word bound)
      (my-test-forward-new word bound))
    (message ">> Using new")))
(global-set-key (kbd "C-o") 'my-choose-flyspell-funcs)

(setq my-macro-last nil)

(setq my-show-faces nil)
(defun my-show-faces-func ()
  (interactive)
  (let ((macro (or my-macro my-macro-last)))
    (if macro
        (let ((my-show-faces t))
          (my-try-macro macro))
      (error "No macro specified"))))
(global-set-key (kbd "M-j") 'my-show-faces-func)

(split-window-right)
(other-window 1)
(view-echo-area-messages)
(other-window 1)

;; For manual debug

;; (defun flyspell-word-search-backward (word bound &optional ignore-case)
;;   (my-test-backward-agustin-fixed word bound ignore-case))
;; (defun flyspell-word-search-forward (word bound)
;;   (my-test-forward-agustin-fixed word bound))


;; Change this to use other functions instead of -agustin-fixed
(defun my-reset-new ()
  (defun my-test-backward-new (word bound &optional ignore-case)
    ;; (my-test-backward-agustin-fixed word bound ignore-case))
    (my-test-backward-orig word bound ignore-case))
  (defun my-test-forward-new (word bound)
    ;; (my-test-forward-agustin-fixed word bound)))
    ;; (my-test-forward-agustin word bound)))
    (my-test-forward-eob word bound)))
(my-reset-new)


;; Define non-nil to run only one test with this macro not randomly
(setq my-macro nil)

;; (setq my-macro (kbd "nd SPC and SPC nd C-a"))
;; (setq my-macro (kbd "nd SPC nd C-a"))

;; (setq my-macro (kbd "nd SPC and C-a"))

;; (setq my-macro (kbd "n SPC n C-a"))

;; (setq my-macro (kbd "nd C-e"))

;; (setq my-macro "'nd end'nd'nd\x01nd\x01")
;; (setq my-macro "'n en'n'n\x01n\x01")

(setq my-macro
      ;; "'nd end'nd'nd\x01nd\x01"
      ;; "'n en'n'n\x01n\x01"
      ;; "n'n en'n'n\x01"
      "a n'n en'n'n\x01"
      ;; "n'n n'n'n\x01"
      ;; "n'n n'n'n\x02\x02\x02\x02\x02\x02\x02\x02"
      ;; "d'nd\x01ndmet \x05met ndmet\x01"
      ;; "d'n\x01nd \x05d nd\x01"
      ;; "nd d'nd nd\x01"
      ;; "nd d'nd nd met\x01"
      )


  parent reply	other threads:[~2014-03-01 21:39 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18 20:56 bug#16800: 24.3; flyspell works slow on very short words at the end of big file Aleksey Cherepanov
2014-02-21 10:15 ` Eli Zaretskii
2014-02-21 14:38   ` Agustin Martin
2014-02-21 15:12     ` Eli Zaretskii
2014-02-21 15:21       ` Eli Zaretskii
2014-02-22 12:44       ` Aleksey Cherepanov
2014-02-22 13:10         ` Eli Zaretskii
2014-02-22 16:02           ` Aleksey Cherepanov
2014-02-22 16:41             ` Eli Zaretskii
2014-02-22 18:55               ` Aleksey Cherepanov
2014-02-22 20:16                 ` Aleksey Cherepanov
2014-02-22 21:03                 ` Eli Zaretskii
2014-02-23  1:26                   ` Agustin Martin
2014-02-23 18:36                     ` Eli Zaretskii
2014-02-23 19:56                     ` Aleksey Cherepanov
2014-02-23 23:02                       ` Aleksey Cherepanov
2014-02-24 16:03                         ` Aleksey Cherepanov
2014-02-26 20:32                           ` Agustin Martin
2014-02-28 11:45                             ` Agustin Martin
2014-02-28 11:51                               ` Eli Zaretskii
2014-03-01 21:44                                 ` Aleksey Cherepanov
2014-03-02  3:56                                   ` Eli Zaretskii
2014-03-09 17:36                                     ` Agustin Martin
2014-03-09 18:02                                       ` Aleksey Cherepanov
2014-03-09 18:24                                         ` Eli Zaretskii
2014-02-28 23:11                               ` Aleksey Cherepanov
2014-03-01 10:33                                 ` Aleksey Cherepanov
2014-03-01 15:50                                   ` Aleksey Cherepanov
2014-03-01 21:39                                 ` Aleksey Cherepanov [this message]
2014-03-09 17:25                                 ` Agustin Martin
2015-03-06 21:46                                   ` Agustin Martin
2015-03-07  8:09                                     ` Eli Zaretskii
2014-02-23 20:39                     ` Aleksey Cherepanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140301213906.GA13523@openwall.com \
    --to=aleksey.4erepanov@gmail.com \
    --cc=16800@debbugs.gnu.org \
    --cc=agustin.martin@hispalinux.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).