From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Agustin Martin Newsgroups: gmane.emacs.bugs Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file Date: Sun, 23 Feb 2014 02:26:00 +0100 Message-ID: References: <85zjlo5ecy.fsf@gmail.com> <83ob204vrv.fsf@gnu.org> <20140221143855.GA6018@agmartin.aq.upm.es> <83k3co4hzd.fsf@gnu.org> <20140222124413.GA4971@openwall.com> <83vbw72t05.fsf@gnu.org> <20140222160217.GA15616@openwall.com> <83ios72j8b.fsf@gnu.org> <20140222185511.GA23643@openwall.com> <838ut23lo9.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=001a1136c86ce00c1004f308bc55 X-Trace: ger.gmane.org 1393118829 3404 80.91.229.3 (23 Feb 2014 01:27:09 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 23 Feb 2014 01:27:09 +0000 (UTC) Cc: Aleksey Cherepanov To: 16800@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Feb 23 02:27:17 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WHNqe-0003mM-Po for geb-bug-gnu-emacs@m.gmane.org; Sun, 23 Feb 2014 02:27:17 +0100 Original-Received: from localhost ([::1]:51372 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHNqd-0000iJ-T4 for geb-bug-gnu-emacs@m.gmane.org; Sat, 22 Feb 2014 20:27:15 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50891) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHNqV-0000iC-O0 for bug-gnu-emacs@gnu.org; Sat, 22 Feb 2014 20:27:12 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WHNqQ-0003L8-Tj for bug-gnu-emacs@gnu.org; Sat, 22 Feb 2014 20:27:07 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:34972) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WHNqQ-0003Kx-PG for bug-gnu-emacs@gnu.org; Sat, 22 Feb 2014 20:27:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WHNqQ-000132-CD for bug-gnu-emacs@gnu.org; Sat, 22 Feb 2014 20:27:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Agustin Martin Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 23 Feb 2014 01:27:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16800 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 16800-submit@debbugs.gnu.org id=B16800.13931187703954 (code B ref 16800); Sun, 23 Feb 2014 01:27:02 +0000 Original-Received: (at 16800) by debbugs.gnu.org; 23 Feb 2014 01:26:10 +0000 Original-Received: from localhost ([127.0.0.1]:36154 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WHNpZ-00011g-Ac for submit@debbugs.gnu.org; Sat, 22 Feb 2014 20:26:09 -0500 Original-Received: from mail-la0-f46.google.com ([209.85.215.46]:64318) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WHNpW-000118-W3 for 16800@debbugs.gnu.org; Sat, 22 Feb 2014 20:26:07 -0500 Original-Received: by mail-la0-f46.google.com with SMTP id b8so3968850lan.19 for <16800@debbugs.gnu.org>; Sat, 22 Feb 2014 17:26:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=llbQGLA6310BQB2spz72BZcFGZBZAVWscyljwg65cx0=; b=Xyl2Dvc+KcQmGjtuzM9ODebZh8YnoAJdUR19eQ1laqkJnWqOhLsKgqZOj9f2K1CHJ8 hP76i+r6UD76Obl0sM8AEEN4dEhrTRUM4H79Lk2Mb0hE1e+lgtG7iCDLr77IjKCBSVHw 2nTk5Rig32ktspufQ83fw7Ucd4r8EINN56qd0+5yKoYfQu//2wtziqcbR1Osr45zmCz6 D/TxOLv3NAuYQONZw/XzOtpPG37n3vXs9/85bnUuD6Mq6q3MH0tF1CQNZcQya5bNN2R4 dXvQnZxcFCY4mesGDdWpOGtue51Sl+riKRX99f2kzrHc36AlbuLSlCs3fI/Lf5Met82f T5dw== X-Received: by 10.152.190.69 with SMTP id go5mr8031621lac.79.1393118760923; Sat, 22 Feb 2014 17:26:00 -0800 (PST) Original-Received: by 10.112.44.163 with HTTP; Sat, 22 Feb 2014 17:26:00 -0800 (PST) In-Reply-To: <838ut23lo9.fsf@gnu.org> X-Google-Sender-Auth: k2E8KPPifoPzmrXnolMW-qhiyKo X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:86051 Archived-At: --001a1136c86ce00c1004f308bc55 Content-Type: multipart/alternative; boundary=001a1136c86ce00c0d04f308bc53 --001a1136c86ce00c0d04f308bc53 Content-Type: text/plain; charset=ISO-8859-1 2014-02-22 22:03 GMT+01:00 Eli Zaretskii : > > Date: Sat, 22 Feb 2014 22:55:11 +0400 > > From: Aleksey Cherepanov > > > > > > Emacs words are language sensitive too. > > > > > > But not in the same way as ispell/flyspell is. The CASECHARS, > > > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only > > > taken into account by ispell/flyspell. > > > > I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...) > > So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is > > one word for emacs and two words with garbage around for flyspell. I > > think my solution fails in such case. > > It's more complex than that: with some languages, and at least with > aspell, we take these parameters from the dictionary. So they cannot > be known in advance in some cases. > Hi, Not yet sure if I am missing something important, but I am playing with a regexp search in flyspell-word-search-* functions based on what flyspell thinks is the word to spellcheck (`word') and what thinks should not be part of a word (`NOTCASECHARS'). Since no OTHERCHARS is used there may be some intermediate matches being false positives that will be discarded once flyspell-word checks them. I have tested this in Alekseys's file and is apparently working well and in this particular case with much better efficiency. Need to think about more ad-hoc situations where it may fail or slow down things. Suggestions for possible failures are welcome. Patch is attached. I did the tests against an old and patched version of flyspell.el (that shipped with Debian stable) and built the patch for it. Should apply and work similarly in trunk's flyspell.el. -- Agustin --001a1136c86ce00c0d04f308bc53 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
2014= -02-22 22:03 GMT+01:00 Eli Zaretskii <eliz@gnu.org>:
> Date: Sat, 22 Feb 2014 22:55:11 +0400
> From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
>
> > > Emacs words are language sensitive too.
> >
> > But not in the same way as ispell/flyspell is. =A0The CASECHARS,<= br> > > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are on= ly
> > taken into account by ispell/flyspell.
>
> I think one could define a dictionary like: ("my" "[a]&= quot; "[^a]" "" ...)
> So the only letter for flyspell words is "a". That way "= ;qqaaqqaaqq" is
> one word for emacs and two words with garbage around for flyspell. I > think my solution fails in such case.

It's more complex than that: with some languages, and at least wi= th
aspell, we take these parameters from the dictionary. =A0So they cannot
be known in advance in some cases.

Hi,<= br>

Not yet sure if=A0 I am missing something impo= rtant, but I am playing with a regexp search in flyspell-word-search-* func= tions based on what flyspell thinks is the word to spellcheck (`word') = and what thinks should not be part of a word (`NOTCASECHARS'). Since no= OTHERCHARS is used there may be some intermediate matches being false posi= tives that will be discarded once flyspell-word checks them.

I have tested this in Alekseys's file and is apparently = working well and in this particular case with much better efficiency. Need = to think about more ad-hoc situations where it may fail or slow down things= . Suggestions for possible failures are welcome.

Patch is attached. I did t= he tests against an old and patched version of flyspell.el (that shipped wi= th Debian stable) and built the patch for it. Should apply and work similar= ly in trunk's flyspell.el.

--
Agustin
--001a1136c86ce00c0d04f308bc53-- --001a1136c86ce00c1004f308bc55 Content-Type: text/plain; charset=US-ASCII; name="flyspell.el_flyspell-word-search.2.diff" Content-Disposition: attachment; filename="flyspell.el_flyspell-word-search.2.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hrzooe670 LS0tIGZseXNwZWxsLmVsLm9yaWcJMjAxNC0wMi0yMyAwMjoxNzowMy42ODAxMDc1MTkgKzAxMDAK KysrIGZseXNwZWxsLmVsCTIwMTQtMDItMjMgMDI6NTA6NTAuNjM0NjI1MjQ4ICswMTAwCkBAIC0x MDUwLDggKzEwNTAsMTkgQEAKICAgKHNhdmUtZXhjdXJzaW9uCiAgICAgKGxldCAoKHIgJygpKQog CSAgKGluaGliaXQtcG9pbnQtbW90aW9uLWhvb2tzIHQpCisJICAoZmx5c3BlbGwtbm90LWNhc2Vj aGFycyAoZmx5c3BlbGwtZ2V0LW5vdC1jYXNlY2hhcnMpKQogCSAgcCkKLSAgICAgICh3aGlsZSAo YW5kIChub3QgcikgKHNldHEgcCAoc2VhcmNoLWJhY2t3YXJkIHdvcmQgYm91bmQgdCkpKQorICAg ICAgKHdoaWxlIAorCSAgKGFuZCAobm90IHIpIAorCSAgICAgICAoc2V0cSBwIAorCQkgICAgIChy ZS1zZWFyY2gtYmFja3dhcmQKKwkJICAgICAgKGNvbmNhdAorCQkgICAgICAgIlxcKCIgZmx5c3Bl bGwtbm90LWNhc2VjaGFycyAiXFx8XFxiXFwpIgorCQkgICAgICAgIlxcKCIgd29yZCAiXFwpIgor CQkgICAgICAgZmx5c3BlbGwtbm90LWNhc2VjaGFycworCQkgICAgICAgKQorCQkgICAgICBib3Vu ZCB0KSkpCisJKGdvdG8tY2hhciAobWF0Y2gtYmVnaW5uaW5nIDIpKQogCShsZXQgKChsdyAoZmx5 c3BlbGwtZ2V0LXdvcmQpKSkKIAkgIChpZiAoYW5kIChjb25zcCBsdykKIAkJICAgKGlmIGlnbm9y ZS1jYXNlCkBAIC0xMDY4LDggKzEwNzksMTkgQEAKICAgKHNhdmUtZXhjdXJzaW9uCiAgICAgKGxl dCAoKHIgJygpKQogCSAgKGluaGliaXQtcG9pbnQtbW90aW9uLWhvb2tzIHQpCisJICAoZmx5c3Bl bGwtbm90LWNhc2VjaGFycyAoZmx5c3BlbGwtZ2V0LW5vdC1jYXNlY2hhcnMpKQogCSAgcCkKLSAg ICAgICh3aGlsZSAoYW5kIChub3QgcikgKHNldHEgcCAoc2VhcmNoLWZvcndhcmQgd29yZCBib3Vu ZCB0KSkpCisgICAgICAod2hpbGUgCisJICAoYW5kIChub3QgcikgCisJICAgICAgIChzZXRxIHAg CisJCSAgICAgKHJlLXNlYXJjaC1mb3J3YXJkIAorCQkgICAgICAoY29uY2F0CisJCSAgICAgICBm bHlzcGVsbC1ub3QtY2FzZWNoYXJzCisJCSAgICAgICAiXFwoIiB3b3JkICJcXCkiCisJCSAgICAg ICAiXFwoIiBmbHlzcGVsbC1ub3QtY2FzZWNoYXJzICJcXHxcXGJcXCkiCisJCSAgICAgICApCisJ CSAgICAgIGJvdW5kIHQpKSkKKwkoZ290by1jaGFyIChtYXRjaC1iZWdpbm5pbmcgMSkpCiAJKGxl dCAoKGx3IChmbHlzcGVsbC1nZXQtd29yZCkpKQogCSAgKGlmIChhbmQgKGNvbnNwIGx3KSAoc3Ry aW5nLWVxdWFsIChjYXIgbHcpIHdvcmQpKQogCSAgICAgIChzZXRxIHIgcCkK --001a1136c86ce00c1004f308bc55--