From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Agustin Martin Newsgroups: gmane.emacs.bugs Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file Date: Sun, 9 Mar 2014 18:25:46 +0100 Message-ID: References: <20140222160217.GA15616@openwall.com> <83ios72j8b.fsf@gnu.org> <20140222185511.GA23643@openwall.com> <838ut23lo9.fsf@gnu.org> <20140223195659.GA23581@openwall.com> <20140223230251.GA30257@openwall.com> <20140224160317.GA2475@openwall.com> <20140226203202.GA23749@agmartin.aq.upm.es> <20140228114545.GA8669@agmartin.aq.upm.es> <20140228231141.GA20782@openwall.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=089e0115fb16070dd304f42fc723 X-Trace: ger.gmane.org 1394385974 16941 80.91.229.3 (9 Mar 2014 17:26:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 9 Mar 2014 17:26:14 +0000 (UTC) To: Aleksey Cherepanov , 16800@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Mar 09 18:26:21 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WMhUT-00089P-9R for geb-bug-gnu-emacs@m.gmane.org; Sun, 09 Mar 2014 18:26:21 +0100 Original-Received: from localhost ([::1]:44848 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WMhUS-0002bR-Sg for geb-bug-gnu-emacs@m.gmane.org; Sun, 09 Mar 2014 13:26:20 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60473) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WMhUJ-0002bG-IT for bug-gnu-emacs@gnu.org; Sun, 09 Mar 2014 13:26:17 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WMhUD-0000jI-CU for bug-gnu-emacs@gnu.org; Sun, 09 Mar 2014 13:26:11 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:57135) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WMhUD-0000jD-8Y for bug-gnu-emacs@gnu.org; Sun, 09 Mar 2014 13:26:05 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WMhUC-00009Z-2w for bug-gnu-emacs@gnu.org; Sun, 09 Mar 2014 13:26:04 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Agustin Martin Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 09 Mar 2014 17:26:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16800 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 16800-submit@debbugs.gnu.org id=B16800.1394385956566 (code B ref 16800); Sun, 09 Mar 2014 17:26:03 +0000 Original-Received: (at 16800) by debbugs.gnu.org; 9 Mar 2014 17:25:56 +0000 Original-Received: from localhost ([127.0.0.1]:58317 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WMhU3-000093-65 for submit@debbugs.gnu.org; Sun, 09 Mar 2014 13:25:56 -0400 Original-Received: from mail-la0-f49.google.com ([209.85.215.49]:53561) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WMhTv-00008h-RK for 16800@debbugs.gnu.org; Sun, 09 Mar 2014 13:25:49 -0400 Original-Received: by mail-la0-f49.google.com with SMTP id mc6so4036956lab.36 for <16800@debbugs.gnu.org>; Sun, 09 Mar 2014 10:25:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Y8jpy6RxV2capscZbmxEoVPijP00mii1rt4ySL05ORs=; b=NqVHd2RTN58RTte9zDSUNGxre1zPCh5D4kC3iTY4dBBtBa44pd7MV2vw90K7slmND/ ZLzIMZWS++WyiM3xAHPD2mt/FbSI1wCfWNwnne9b5xylqmQ1iOpzWYR7G+1p3OgcSvXS lTlKdWndhO704rwNtRry3C7dqPr6ZgFsv0+ZEjD25DXKE2xgNM0RfEfy3rXD5WnnWR0m MCib/DnN1Q1jpZh7OCRF0EzXlASRNTm0OJU2v6ZCoiGfPHvfJYmozVEiS7Swb4diMlqB tyXJxmIsSdPh52sfZnFkbpbpleTrWejGTaQEt+AMjkEZ4HDLq/mXKSNzuHCeaDFnC4gv ZXGQ== X-Received: by 10.112.142.161 with SMTP id rx1mr18941781lbb.33.1394385946609; Sun, 09 Mar 2014 10:25:46 -0700 (PDT) Original-Received: by 10.112.201.165 with HTTP; Sun, 9 Mar 2014 10:25:46 -0700 (PDT) In-Reply-To: <20140228231141.GA20782@openwall.com> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:86684 Archived-At: --089e0115fb16070dd304f42fc723 Content-Type: text/plain; charset=ISO-8859-1 2014-03-01 0:11 GMT+01:00 Aleksey Cherepanov : > On Fri, Feb 28, 2014 at 12:45:45PM +0100, Agustin Martin wrote: > > > > Please find attached my first candidate for commit. Is similar to what I > > sent before, but needed to add an explicit check for word at eob in > > `flyspell-word-search-forward'. > > > > Will try to have more testing before committing. Seems to work well with > the > > file generated by your one-liner, even with corner cases like new > > misspellings added at bob or eob, but the wider the testing the better. > > I've wrote a small fuzzer. It is in attach. To run it: > $ LANG=C emacs -Q --eval '(load-file "t2.el")' > Then C-j to start. It modifies buffer you are in. > > Your -forward function gets stuck. (kbd "nd SPC and C-a") could repeat > it. my-test-forward-agustin-fixed contains fix. It incorporates > simplified word-end logic: we slip forward using flyspell-get-word, > then we check eobp. Though I did not understand why -backward does not > need a similar fix and I got the answer: my mistake with (length word) > did not allow one word to be marked as duplicate. > Hi, Sorry for the delay, I am rather busy lately. The problem is that final condition can cause an endless loop. I'd prefer to use a flag to make sure that test is passed only once as much, see below > (if condition nil ...) could be replaced with (unless condition ...) > but I do not know what one is more readable. > While there was `nil' in the version I attached, the actual code in my personal debug code was not exactly `nil' but something like (progn (message "word %s is at eob" word) nil) Together with being more readable, that is why the structure was left there. I have to thing about this, but with the full word-re test we may even get rid of that test and all the flyspell.-get-word call, but I need some time to test all that. If still needed (unless ...) may be used in the final version. > (kbd "nd SPC and SPC nd C-b") fails to highlight the second "nd" as > duplicate. It is a problem with bound equal to (length word) in > -backward function. I did not check it when I wrote it. > > + (search-forward word (length word) t)))) > (search-forward word (1+ (length word)) t)))) One "nd" is colored as duplicate due to -backward function after that > fix. I did not touch it yet because it is a time for a break for me. > FIxed in my copy. Later noticed that backward function has some other glitches, like flyspell-get-word looking at wrong word. Fixed in my copy, but I will look at it later. > Hope no one will generate files with words containing something in > > OTHERCHARS. > > Why? > Sorry for not being clear, I was thinking about a really rare corner case current code may not handle well Write 'nd-et' some thousand times and then write 'et' in a language where 'nd' and 'et' are misspellings and '-' is part of otherchars. I'd expect to have a lot of false positives in search. > BTW quite interesting flyspell behaviour could be observed with > "met'met'and": if you jump back and forth over this word then met'met > is highlighted when you are at the beginning and met'and is > highlighted when you are at the end. > > Also "met'met'and met'and" highlights both met'and as mis-spelled (the > second met'and is not marked as duplicate). > Funny :-) I do not expect to have time to check everything shortly. Also, note that Emacs 24.4 is on the way, so trunk is frozen for anything potentially problematic. Unless I consider the code absolutely rock solid, I'd prefer to wait until 24.4 is released. Sorry I could not yet play with your fuzzer, will try next week. Thanks for all the work you are putting here. This is what I am using to break the loop in the forward function (see `keep'). min limit in last condition should also be adjusted to bound if not unlimited. ;; --------------------------- 8< ---------------------------------------------------------- (defun flyspell-word-search-forward (word bound) (save-excursion (let* ((r '()) (inhibit-point-motion-hooks t) (word-end (nth 2 (flyspell-get-word))) (flyspell-not-casechars (flyspell-get-not-casechars)) (word-re (concat flyspell-not-casechars (regexp-quote word) flyspell-not-casechars)) (keep t) ;; Control flag to exit loop below after check word at eob. p) (while (and (not r) keep (setq p (if (= word-end (point-max)) nil ;; Current word is at e-o-b. No forward search (if (re-search-forward word-re bound t) ;; word-re match ends one char after word (progn (backward-char) (point) ) ;; Check above does not match similar word at e-o-b (goto-char (point-max)) (setq keep nil) ;; Ensure this is last iteration (search-backward word (- (point-max) (length word)) t))))) (let ((lw (flyspell-get-word))) (if (and (consp lw) (string-equal (car lw) word)) (setq r p) (goto-char (1+ p))))) r))) ;; --------------------------- 8< ---------------------------------------------------------- Regards, -- Agustin --089e0115fb16070dd304f42fc723 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
2014= -03-01 0:11 GMT+01:00 Aleksey Cherepanov <aleksey.4erepanov@gmai= l.com>:
On Fri, Feb 28, 2014 at 12:45:45PM +0100, Agustin Martin wrote:
>
> Please find attached my first candidate for commit. Is similar to what= I
> sent before, but needed to add an explicit check for word at eob in > `flyspell-word-search-forward'.
>
> Will try to have more testing before committing. Seems to work well wi= th the
> file generated by your one-liner, even with corner cases like new
> misspellings added at bob or eob, but the wider the testing the better= .

I've wrote a small fuzzer. It is in attach. To run it:
$ LANG=3DC emacs -Q --eval '(load-file "t2.el")'
Then C-j to start. It modifies buffer you are in.

Your -forward function gets stuck. (kbd "nd SPC and C-a") could r= epeat
it. my-test-forward-agustin-fixed contains fix. It incorporates
simplified word-end logic: we slip forward using flyspell-get-word,
then we check eobp. Though I did not understand why -backward does not
need a similar fix and I got the answer: my mistake with (length word)
did not allow one word to be marked as duplicate.

=
Hi,

Sorry for the delay, I am rather busy lately.

The problem is that final condition can cause an end= less loop. I'd prefer to use a flag to make sure that test is passed on= ly once as much, see below
=A0
(= if condition nil ...) could be replaced with (unless condition ...)
but I do not know what one is more readable.

While there was `nil' in the version I attached, the actual code = in my personal debug code was not exactly `nil' but something like

(progn (message "word %s is at eob" word) nil)
=
Together with being more readable, that is why the structure= was left there. I have to thing about this, but with the full word-re test= we may even get rid of that test and all the flyspell.-get-word call, but = I need some time to test all that. If still needed (unless ...) may be used= in the final version.
=A0
(kbd "nd SPC and SPC nd C-b") fails to highlight the second "= ;nd" as
duplicate. It is a problem with bound equal to (length word) in
-backward function. I did not check it when I wrote it.
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(search-forward word (len= gth word) t))))
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(search-forward word (1+= (length word)) t))))

One "nd" is colored as duplicate due to -backward function after = that
fix. I did not touch it yet because it is a time for a break for me.

FIxed in my copy. Later noticed that=20 backward function has some other glitches, like flyspell-get-word=20 looking at wrong word. Fixed in my copy, but I will look at it later.
=
> Hope no on= e will generate files with words containing something in
> OTHERCHARS.

Why?

Sorry for not being clear, I= was thinking about a really rare corner case current code may not handle w= ell

Write 'nd-et' some thousand times and then wr= ite 'et' in a language where 'nd' and 'et' are miss= pellings and '-' is part of otherchars. I'd expect to have a lo= t of false positives in search.
=A0
BTW quite interesting flyspell behaviour could be observed with
"met'met'and": if you jump back and forth over this word = then met'met
is highlighted when you are at the beginning and met'and is
highlighted when you are at the end.

Also "met'met'and met'and" highlights both met'an= d as mis-spelled (the
second met'and is not marked as duplicate).

Funny :-)

I do not expect to have time to check = everything shortly. Also, note that Emacs 24.4 is on the way, so trunk is f= rozen for anything potentially problematic. Unless I consider the code abso= lutely rock solid, I'd prefer to wait until 24.4 is released.

Sorry I could not yet play with your fuzzer, will try next w= eek. Thanks for all the work you are putting here.

=
This is what I am using to break the loop in the forward function (see= `keep'). min limit in last condition should also be adjusted to bound = if not unlimited.

;; --------------------------- 8< ----------------------------------= ------------------------
(defun flyspell-word-search-forward= (word bound)
=A0 (save-excursion
=A0=A0=A0 (let* ((r '())
=A0= =A0=A0 =A0=A0 (inhibit-point-motion-hooks t)
=A0=A0=A0 =A0=A0 (word-end (nth 2 (flyspell-get-word)))
=A0=A0=A0 =A0=A0= (flyspell-not-casechars (flyspell-get-not-casechars))
=A0=A0=A0 =A0=A0 = (word-re (concat flyspell-not-casechars
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 (regexp-quote word)
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 fl= yspell-not-casechars))
=A0=A0=A0 =A0=A0 (keep t) ;; Control flag to exit loop below after check wo= rd at eob.
=A0=A0=A0 =A0=A0 p)
=A0=A0=A0=A0=A0 (while
=A0=A0=A0 = =A0 (and (not r)
=A0=A0=A0 =A0=A0=A0=A0=A0=A0 keep
=A0=A0=A0 =A0=A0= =A0=A0=A0=A0 (setq p (if (=3D word-end (point-max))
=A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0 nil ;; Current word is at e-o-b. No forward search
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0(if (re-search-forward word-re bound t)=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0=A0 ;; word-re match ends one char = after word
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0=A0 (progn (backward-c= har) (point) )
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0 ;; Check above does = not match similar word at e-o-b
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0 (goto-char (point-max))
=A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0 (setq keep nil) ;; Ensure this is last iteration=
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0 (search-backward word (- (point-ma= x)
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0= (length word)) t)))))
=A0=A0=A0 (let ((lw (flyspell-get-word)))
=A0=A0=A0 =A0 (if (and (consp lw) (string-equal (car lw) word))
=A0=A0= =A0 =A0=A0=A0=A0=A0 (setq r p)
=A0=A0=A0 =A0=A0=A0 (goto-char (1+ p)))))=
=A0=A0=A0=A0=A0 r)))
;; --------------------------- 8< ----------= ------------------------------------------------

Regards,

--
Agustin

--089e0115fb16070dd304f42fc723--