all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Agustin Martin <agustimartin@gmail.com>
To: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>, 16800@debbugs.gnu.org
Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file
Date: Sun, 9 Mar 2014 18:25:46 +0100	[thread overview]
Message-ID: <CAKy3oZrvUFJYUJ5dC-rWpdF=Hk99UgXp8rsTwX9NfjgJbYWkyg@mail.gmail.com> (raw)
In-Reply-To: <20140228231141.GA20782@openwall.com>

[-- Attachment #1: Type: text/plain, Size: 5304 bytes --]

2014-03-01 0:11 GMT+01:00 Aleksey Cherepanov <aleksey.4erepanov@gmail.com>:

> On Fri, Feb 28, 2014 at 12:45:45PM +0100, Agustin Martin wrote:
> >
> > Please find attached my first candidate for commit. Is similar to what I
> > sent before, but needed to add an explicit check for word at eob in
> > `flyspell-word-search-forward'.
> >
> > Will try to have more testing before committing. Seems to work well with
> the
> > file generated by your one-liner, even with corner cases like new
> > misspellings added at bob or eob, but the wider the testing the better.
>
> I've wrote a small fuzzer. It is in attach. To run it:
> $ LANG=C emacs -Q --eval '(load-file "t2.el")'
> Then C-j to start. It modifies buffer you are in.
>
> Your -forward function gets stuck. (kbd "nd SPC and C-a") could repeat
> it. my-test-forward-agustin-fixed contains fix. It incorporates
> simplified word-end logic: we slip forward using flyspell-get-word,
> then we check eobp. Though I did not understand why -backward does not
> need a similar fix and I got the answer: my mistake with (length word)
> did not allow one word to be marked as duplicate.
>

Hi,

Sorry for the delay, I am rather busy lately.

The problem is that final condition can cause an endless loop. I'd prefer
to use a flag to make sure that test is passed only once as much, see below


> (if condition nil ...) could be replaced with (unless condition ...)
> but I do not know what one is more readable.
>

While there was `nil' in the version I attached, the actual code in my
personal debug code was not exactly `nil' but something like

(progn (message "word %s is at eob" word) nil)

Together with being more readable, that is why the structure was left
there. I have to thing about this, but with the full word-re test we may
even get rid of that test and all the flyspell.-get-word call, but I need
some time to test all that. If still needed (unless ...) may be used in the
final version.


> (kbd "nd SPC and SPC nd C-b") fails to highlight the second "nd" as
> duplicate. It is a problem with bound equal to (length word) in
> -backward function. I did not check it when I wrote it.
> > +                      (search-forward word (length word) t))))
>                          (search-forward word (1+ (length word)) t))))


One "nd" is colored as duplicate due to -backward function after that
> fix. I did not touch it yet because it is a time for a break for me.
>

FIxed in my copy. Later noticed that backward function has some other
glitches, like flyspell-get-word looking at wrong word. Fixed in my copy,
but I will look at it later.

> Hope no one will generate files with words containing something in
>  > OTHERCHARS.
>
> Why?
>

Sorry for not being clear, I was thinking about a really rare corner case
current code may not handle well

Write 'nd-et' some thousand times and then write 'et' in a language where
'nd' and 'et' are misspellings and '-' is part of otherchars. I'd expect to
have a lot of false positives in search.


> BTW quite interesting flyspell behaviour could be observed with
> "met'met'and": if you jump back and forth over this word then met'met
> is highlighted when you are at the beginning and met'and is
> highlighted when you are at the end.
>
> Also "met'met'and met'and" highlights both met'and as mis-spelled (the
> second met'and is not marked as duplicate).
>

Funny :-)

I do not expect to have time to check everything shortly. Also, note that
Emacs 24.4 is on the way, so trunk is frozen for anything potentially
problematic. Unless I consider the code absolutely rock solid, I'd prefer
to wait until 24.4 is released.

Sorry I could not yet play with your fuzzer, will try next week. Thanks for
all the work you are putting here.

This is what I am using to break the loop in the forward function (see
`keep'). min limit in last condition should also be adjusted to bound if
not unlimited.

;; --------------------------- 8<
----------------------------------------------------------
(defun flyspell-word-search-forward (word bound)
  (save-excursion
    (let* ((r '())
       (inhibit-point-motion-hooks t)
       (word-end (nth 2 (flyspell-get-word)))
       (flyspell-not-casechars (flyspell-get-not-casechars))
       (word-re (concat flyspell-not-casechars
                (regexp-quote word)
                flyspell-not-casechars))
       (keep t) ;; Control flag to exit loop below after check word at eob.
       p)
      (while
      (and (not r)
           keep
           (setq p (if (= word-end (point-max))
               nil ;; Current word is at e-o-b. No forward search
             (if (re-search-forward word-re bound t)
                 ;; word-re match ends one char after word
                 (progn (backward-char) (point) )
               ;; Check above does not match similar word at e-o-b
               (goto-char (point-max))
               (setq keep nil) ;; Ensure this is last iteration
               (search-backward word (- (point-max)
                            (length word)) t)))))
    (let ((lw (flyspell-get-word)))
      (if (and (consp lw) (string-equal (car lw) word))
          (setq r p)
        (goto-char (1+ p)))))
      r)))
;; --------------------------- 8<
----------------------------------------------------------

Regards,

-- 
Agustin

[-- Attachment #2: Type: text/html, Size: 7231 bytes --]

  parent reply	other threads:[~2014-03-09 17:25 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18 20:56 bug#16800: 24.3; flyspell works slow on very short words at the end of big file Aleksey Cherepanov
2014-02-21 10:15 ` Eli Zaretskii
2014-02-21 14:38   ` Agustin Martin
2014-02-21 15:12     ` Eli Zaretskii
2014-02-21 15:21       ` Eli Zaretskii
2014-02-22 12:44       ` Aleksey Cherepanov
2014-02-22 13:10         ` Eli Zaretskii
2014-02-22 16:02           ` Aleksey Cherepanov
2014-02-22 16:41             ` Eli Zaretskii
2014-02-22 18:55               ` Aleksey Cherepanov
2014-02-22 20:16                 ` Aleksey Cherepanov
2014-02-22 21:03                 ` Eli Zaretskii
2014-02-23  1:26                   ` Agustin Martin
2014-02-23 18:36                     ` Eli Zaretskii
2014-02-23 19:56                     ` Aleksey Cherepanov
2014-02-23 23:02                       ` Aleksey Cherepanov
2014-02-24 16:03                         ` Aleksey Cherepanov
2014-02-26 20:32                           ` Agustin Martin
2014-02-28 11:45                             ` Agustin Martin
2014-02-28 11:51                               ` Eli Zaretskii
2014-03-01 21:44                                 ` Aleksey Cherepanov
2014-03-02  3:56                                   ` Eli Zaretskii
2014-03-09 17:36                                     ` Agustin Martin
2014-03-09 18:02                                       ` Aleksey Cherepanov
2014-03-09 18:24                                         ` Eli Zaretskii
2014-02-28 23:11                               ` Aleksey Cherepanov
2014-03-01 10:33                                 ` Aleksey Cherepanov
2014-03-01 15:50                                   ` Aleksey Cherepanov
2014-03-01 21:39                                 ` Aleksey Cherepanov
2014-03-09 17:25                                 ` Agustin Martin [this message]
2015-03-06 21:46                                   ` Agustin Martin
2015-03-07  8:09                                     ` Eli Zaretskii
2014-02-23 20:39                     ` Aleksey Cherepanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKy3oZrvUFJYUJ5dC-rWpdF=Hk99UgXp8rsTwX9NfjgJbYWkyg@mail.gmail.com' \
    --to=agustimartin@gmail.com \
    --cc=16800@debbugs.gnu.org \
    --cc=aleksey.4erepanov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.