From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Aleksey Cherepanov Newsgroups: gmane.emacs.bugs Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file Date: Sun, 23 Feb 2014 23:56:59 +0400 Message-ID: <20140223195659.GA23581@openwall.com> References: <83ob204vrv.fsf@gnu.org> <20140221143855.GA6018@agmartin.aq.upm.es> <83k3co4hzd.fsf@gnu.org> <20140222124413.GA4971@openwall.com> <83vbw72t05.fsf@gnu.org> <20140222160217.GA15616@openwall.com> <83ios72j8b.fsf@gnu.org> <20140222185511.GA23643@openwall.com> <838ut23lo9.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1393185492 23810 80.91.229.3 (23 Feb 2014 19:58:12 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 23 Feb 2014 19:58:12 +0000 (UTC) Cc: 16800@debbugs.gnu.org To: Agustin Martin Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Feb 23 20:58:19 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WHfBq-0002zy-OP for geb-bug-gnu-emacs@m.gmane.org; Sun, 23 Feb 2014 20:58:18 +0100 Original-Received: from localhost ([::1]:54143 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHfBq-0003kL-64 for geb-bug-gnu-emacs@m.gmane.org; Sun, 23 Feb 2014 14:58:18 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46132) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHfBh-0003jg-9R for bug-gnu-emacs@gnu.org; Sun, 23 Feb 2014 14:58:15 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WHfBb-0003hG-E5 for bug-gnu-emacs@gnu.org; Sun, 23 Feb 2014 14:58:09 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:35766) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHfBb-0003hA-AN for bug-gnu-emacs@gnu.org; Sun, 23 Feb 2014 14:58:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WHfBa-0007Us-NW for bug-gnu-emacs@gnu.org; Sun, 23 Feb 2014 14:58:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Aleksey Cherepanov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 23 Feb 2014 19:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16800 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 16800-submit@debbugs.gnu.org id=B16800.139318543328730 (code B ref 16800); Sun, 23 Feb 2014 19:58:02 +0000 Original-Received: (at 16800) by debbugs.gnu.org; 23 Feb 2014 19:57:13 +0000 Original-Received: from localhost ([127.0.0.1]:36948 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WHfAm-0007TJ-8v for submit@debbugs.gnu.org; Sun, 23 Feb 2014 14:57:12 -0500 Original-Received: from mail-la0-f50.google.com ([209.85.215.50]:38857) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WHfAj-0007Sz-1c for 16800@debbugs.gnu.org; Sun, 23 Feb 2014 14:57:10 -0500 Original-Received: by mail-la0-f50.google.com with SMTP id y1so631788lam.37 for <16800@debbugs.gnu.org>; Sun, 23 Feb 2014 11:57:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=3hZJBPbMpa61YYp350tKJ2aV2neKhwAM05H7kk/ZNCw=; b=pMLuSvUlN5/6kvZsgm8KaHDqNKPEm+wf4+lLOOFNnTseemFTheRhY0PNKmaAS9Pbzg QmaC1cEvG/ezWXJ41jTwK84AZLRBEDk8Ro8yzPo61fcqmQcq0yiI//qNQd8BnCQbkjaS uaAMDbLPr3+kc14IsMeX+QkIYencqJ8NcWJgUvtdTCH46rWg6eawezVs3VIfWqm63c1l zKQptuaJUcARea53echH9q9uCkSv/XBOqjm4CXc8CT5SlWnoSKsucaKbXX8KLFmWaNN0 dZLXtuMEu0z5J9Ru13Ug/wGW4xcBEqUmnLOosRPqfKs3e92IEaKmDQKw501fVwwIc/HT wecw== X-Received: by 10.112.134.134 with SMTP id pk6mr9414741lbb.85.1393185422714; Sun, 23 Feb 2014 11:57:02 -0800 (PST) Original-Received: from openwall.com ([188.123.230.115]) by mx.google.com with ESMTPSA id yq2sm22012891lab.3.2014.02.23.11.57.01 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sun, 23 Feb 2014 11:57:01 -0800 (PST) Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:86081 Archived-At: On Sun, Feb 23, 2014 at 02:26:00AM +0100, Agustin Martin wrote: > 2014-02-22 22:03 GMT+01:00 Eli Zaretskii : > > > > Date: Sat, 22 Feb 2014 22:55:11 +0400 > > > From: Aleksey Cherepanov > > > > > > > > Emacs words are language sensitive too. > > > > > > > > But not in the same way as ispell/flyspell is. The CASECHARS, > > > > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only > > > > taken into account by ispell/flyspell. > > > > > > I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...) > > > So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is > > > one word for emacs and two words with garbage around for flyspell. I > > > think my solution fails in such case. > > > > It's more complex than that: with some languages, and at least with > > aspell, we take these parameters from the dictionary. So they cannot > > be known in advance in some cases. > > > > Hi, > > Not yet sure if I am missing something important, but I am playing with a > regexp search in flyspell-word-search-* functions based on what flyspell > thinks is the word to spellcheck (`word') and what thinks should not be > part of a word (`NOTCASECHARS'). Since no OTHERCHARS is used there may be > some intermediate matches being false positives that will be discarded once > flyspell-word checks them. > > I have tested this in Alekseys's file and is apparently working well and in > this particular case with much better efficiency. Need to think about more > ad-hoc situations where it may fail or slow down things. Suggestions for > possible failures are welcome. > > Patch is attached. I did the tests against an old and patched version of > flyspell.el (that shipped with Debian stable) and built the patch for it. > Should apply and work similarly in trunk's flyspell.el. > > --- flyspell.el.orig 2014-02-23 02:17:03.680107519 +0100 > +++ flyspell.el 2014-02-23 02:50:50.634625248 +0100 > @@ -1050,8 +1050,19 @@ > (save-excursion > (let ((r '()) > (inhibit-point-motion-hooks t) > + (flyspell-not-casechars (flyspell-get-not-casechars)) I'd move concat here too so it is out of inner loop. > p) > - (while (and (not r) (setq p (search-backward word bound t))) > + (while > + (and (not r) > + (setq p > + (re-search-backward > + (concat > + "\\(" flyspell-not-casechars "\\|\\b\\)" I think \b here could be replaced with \` (beginning of buffer). I think it is the only boundary we need that is not described by not-casechars, word sequence. Similarly \' (end of buffer) could be used for forward search. Also not capturing group ("\\(?:") could be used because we do not need a match data of the first group. It should work faster but I don't really know. Maybe it would be faster to not capture word but capture one char or void but I doubt the difference would be noticable. > + "\\(" word "\\)" I think regexp-quote around the word is necessary here. > + flyspell-not-casechars > + ) > + bound t))) > + (goto-char (match-beginning 2)) s/2/1/ if the first group is not capturing. > (let ((lw (flyspell-get-word))) > (if (and (consp lw) > (if ignore-case > @@ -1068,8 +1079,19 @@ > (save-excursion > (let ((r '()) > (inhibit-point-motion-hooks t) > + (flyspell-not-casechars (flyspell-get-not-casechars)) concat here as above. > p) > - (while (and (not r) (setq p (search-forward word bound t))) > + (while > + (and (not r) > + (setq p > + (re-search-forward > + (concat > + flyspell-not-casechars > + "\\(" word "\\)" regexp-quote as above. > + "\\(" flyspell-not-casechars "\\|\\b\\)" I think \b could be replaced by \' here as described above. The second group could be not capturing here. > + ) > + bound t))) > + (goto-char (match-beginning 1)) I guess match-end should here. > (let ((lw (flyspell-get-word))) > (if (and (consp lw) (string-equal (car lw) word)) > (setq r p) I guess that \b would work faster than the group so we could have 'if' statement around the whole loop that has one implementation with \b for case when casechars are "[[:alpha:]]" and not-casechars are "[^[:alpha:]]" and another implementation as above for other cases. But it seems cumbersome. Thanks! -- Regards, Aleksey Cherepanov