unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
@ 2014-02-18 20:56 Aleksey Cherepanov
  2014-02-21 10:15 ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-18 20:56 UTC (permalink / raw)
  To: 16800

Package: emacs
Version: 24.3
Severity: normal

Dear Maintainers,

It is a copy of bug #739412 in Debian. Debian uses bug tracker
similar to this one. The bug on web:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=739412
Address to continue thread there:
739412@bugs.debian.org

   * What led up to the situation?

I faced a problem editing my big .org file (2mb+) with flyspell-mode
enabled. I edit it every day, regularly mistype and get words of one
or two letters that are wrong in Russian and cause flyspell work slow.

This one-liner produces "good" file to reproduce the bug.
perl -e 'print(((join " ", ("met and") x 10) . "\n") x 30000)' > t.txt

Typing "nd" at the end of file gives a huge pause even on a fast
computer. But "mw" or "md" does not give pauses because they are not
substrings in this file. It is repeatable with emacs -Q.

   * What exactly did you do (or not do) that was effective (or
     ineffective)?

 So exact sequence is
$ emacs --version
GNU Emacs 24.3.1
$ emacs23 --version
GNU Emacs 23.4.1
$ perl -e 'print(((join " ", ("met and") x 10) . "\n") x 30000)' > t.txt
$ LANG=C emacs -Q t.txt
 Then in emacs:
M-x flyspell-mode RET
M->
nd SPC

'emacs23 -Q t.txt' works the same way. LANG=C affects regular words
because default dictionary is Russian on my system so without LANG=C
all words ("met" and "and") are considered misspelled. But it does not
affect huge pause at the end.

   * What was the outcome of this action?

Huge pause when emacs does not react on keys except C-g. Word "nd" is
colored as misspelled after the pause. C-g stops emacs internal
thinking and I could work without waiting but word "nd" is not colored
as misspelled word.

   * What outcome did you expect instead?

I expect it to work as fast as with other words like "md" or "mw" that
does not produce a pause and are colored immediately.


I tried to patch flyspell-word-search-backward and
flyspell-word-search-forward functions from flyspell.el replacing
search-backward with word-search-backward and search-forward with
word-search-forward (perl -pe 's/\(search-/(word-search-/' ). It
solved the problem but I do not know what it broke.

I expect problems with this solution because I do not know if
flyspell's meaning of word is the same as emacs' one. I think it is
described in flyspell-get-word function that is called after search-*
in the patched functions.

flyspell-duplicate-distance variable on its own could mitigate the
problem but it changes the behaviour so I do not want to use this
variable.

Thanks!

-- 
Regards,
Aleksey Cherepanov


In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, GTK+ Version 3.8.6)
 of 2013-12-23 on brahms, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.11405000
System Description:	Debian GNU/Linux testing (jessie)

Configured using:
 `configure '--build' 'x86_64-linux-gnu' '--build' 'x86_64-linux-gnu'
 '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib'
 '--localstatedir=/var/lib' '--infodir=/usr/share/info'
 '--mandir=/usr/share/man' '--with-pop=yes'
 '--enable-locallisppath=/etc/emacs24:/etc/emacs:/usr/local/share/emacs/24.3/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.3/site-lisp:/usr/share/emacs/site-lisp'
 '--with-crt-dir=/usr/lib/x86_64-linux-gnu' '--with-x=yes'
 '--with-x-toolkit=gtk3' '--with-toolkit-scroll-bars'
 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fstack-protector
 --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wall'
 'LDFLAGS=-Wl,-z,relro' 'CPPFLAGS=-D_FORTIFY_SOURCE=2''

Important settings:
  value of $LANG: ru_RU.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-18 20:56 bug#16800: 24.3; flyspell works slow on very short words at the end of big file Aleksey Cherepanov
@ 2014-02-21 10:15 ` Eli Zaretskii
  2014-02-21 14:38   ` Agustin Martin
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-02-21 10:15 UTC (permalink / raw)
  To: Aleksey Cherepanov; +Cc: 16800

> From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> Date: Wed, 19 Feb 2014 00:56:45 +0400
> 
> I faced a problem editing my big .org file (2mb+) with flyspell-mode
> enabled. I edit it every day, regularly mistype and get words of one
> or two letters that are wrong in Russian and cause flyspell work slow.
> 
> This one-liner produces "good" file to reproduce the bug.
> perl -e 'print(((join " ", ("met and") x 10) . "\n") x 30000)' > t.txt
> 
> Typing "nd" at the end of file gives a huge pause even on a fast
> computer. But "mw" or "md" does not give pauses because they are not
> substrings in this file. It is repeatable with emacs -Q.

This seems to be due to the Flyspell's feature of recognizing
duplicates of mis-spelled words, and, if found, highlighting such
duplicates in a different face.  If you customize the variable
flyspell-duplicate-distance to some small value (or even zero), the
delay goes away.  Evidently, with the default value of -1, Flyspell
searches all the way to the beginning of the giant buffer, looking for
a duplicate of "nd".

Interestingly, I don't see this when the speller is Ispell, but I do
see it with Hunspell.  Not sure how using Ispell avoids this problem.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-21 10:15 ` Eli Zaretskii
@ 2014-02-21 14:38   ` Agustin Martin
  2014-02-21 15:12     ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Agustin Martin @ 2014-02-21 14:38 UTC (permalink / raw)
  To: Aleksey Cherepanov; +Cc: 16800

On Fri, Feb 21, 2014 at 12:15:00PM +0200, Eli Zaretskii wrote:
> > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> > Date: Wed, 19 Feb 2014 00:56:45 +0400
> > 
> > I faced a problem editing my big .org file (2mb+) with flyspell-mode
> > enabled. I edit it every day, regularly mistype and get words of one
> > or two letters that are wrong in Russian and cause flyspell work slow.
> > 
> > This one-liner produces "good" file to reproduce the bug.
> > perl -e 'print(((join " ", ("met and") x 10) . "\n") x 30000)' > t.txt
> > 
> > Typing "nd" at the end of file gives a huge pause even on a fast
> > computer. But "mw" or "md" does not give pauses because they are not
> > substrings in this file. It is repeatable with emacs -Q.
> 
> This seems to be due to the Flyspell's feature of recognizing
> duplicates of mis-spelled words, and, if found, highlighting such
> duplicates in a different face.  If you customize the variable
> flyspell-duplicate-distance to some small value (or even zero), the
> delay goes away.  Evidently, with the default value of -1, Flyspell
> searches all the way to the beginning of the giant buffer, looking for
> a duplicate of "nd".
> 
> Interestingly, I don't see this when the speller is Ispell, but I do
> see it with Hunspell.  Not sure how using Ispell avoids this problem.

Hi,

On the other hand, I can reproduce this also with ispell, as well as with
aspell and hunspell.

On Wed, Feb 19, 2014 at 12:56:45AM +0400, Aleksey Cherepanov wrote:
> flyspell-duplicate-distance variable on its own could mitigate the
> problem but it changes the behaviour so I do not want to use this  
> variable.

For the records, I was playing with a customized value of 50000 for that
distance and even if there is still a minor delay it is reasonable. I am
in a fast box, do not know in other boxes.

> I tried to patch flyspell-word-search-backward and
> flyspell-word-search-forward functions from flyspell.el replacing
> search-backward with word-search-backward and search-forward with
> word-search-forward (perl -pe 's/\(search-/(word-search-/' ). It
> solved the problem but I do not know what it broke.
> 
> I expect problems with this solution because I do not know if
> flyspell's meaning of word is the same as emacs' one. I think it is
> described in flyspell-get-word function that is called after search-*
> in the patched functions.

I have never played with Emacs syntax tables, but  I'd expect differences
only if there is a mismatch between chars in OTHERCHARS and non
alphabetic chars that Emacs considers as possible parts of a word. 

Regards,

-- 
Agustin






^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-21 14:38   ` Agustin Martin
@ 2014-02-21 15:12     ` Eli Zaretskii
  2014-02-21 15:21       ` Eli Zaretskii
  2014-02-22 12:44       ` Aleksey Cherepanov
  0 siblings, 2 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-02-21 15:12 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800, aleksey.4erepanov

> Date: Fri, 21 Feb 2014 15:38:55 +0100
> From: Agustin Martin <agustin.martin@hispalinux.es>
> Cc: 16800@debbugs.gnu.org
> 
> On Wed, Feb 19, 2014 at 12:56:45AM +0400, Aleksey Cherepanov wrote:
> > flyspell-duplicate-distance variable on its own could mitigate the
> > problem but it changes the behaviour so I do not want to use this  
> > variable.

What behavior does it change?  Do you really care to have a
mis-spelled word be highlighted in a different face just because
there's an identical mis-spelling half a megabyte away?

> For the records, I was playing with a customized value of 50000 for that
> distance and even if there is still a minor delay it is reasonable. I am
> in a fast box, do not know in other boxes.

I would suggest to change the default to something finite, like 20000
perhaps.  Having it set to -1 by default is IMO unwieldy, since
buffers can be very large.

> > I tried to patch flyspell-word-search-backward and
> > flyspell-word-search-forward functions from flyspell.el replacing
> > search-backward with word-search-backward and search-forward with
> > word-search-forward (perl -pe 's/\(search-/(word-search-/' ). It
> > solved the problem but I do not know what it broke.

And this doesn't change behavior?  See below.

> > I expect problems with this solution because I do not know if
> > flyspell's meaning of word is the same as emacs' one. I think it is
> > described in flyspell-get-word function that is called after search-*
> > in the patched functions.
> 
> I have never played with Emacs syntax tables, but  I'd expect differences
> only if there is a mismatch between chars in OTHERCHARS and non
> alphabetic chars that Emacs considers as possible parts of a word. 

The effect depends on the language, I think.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-21 15:12     ` Eli Zaretskii
@ 2014-02-21 15:21       ` Eli Zaretskii
  2014-02-22 12:44       ` Aleksey Cherepanov
  1 sibling, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-02-21 15:21 UTC (permalink / raw)
  To: agustin.martin; +Cc: 16800, aleksey.4erepanov

> Date: Fri, 21 Feb 2014 17:12:54 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 16800@debbugs.gnu.org, aleksey.4erepanov@gmail.com
> 
> > > I expect problems with this solution because I do not know if
> > > flyspell's meaning of word is the same as emacs' one. I think it is
> > > described in flyspell-get-word function that is called after search-*
> > > in the patched functions.
> > 
> > I have never played with Emacs syntax tables, but  I'd expect differences
> > only if there is a mismatch between chars in OTHERCHARS and non
> > alphabetic chars that Emacs considers as possible parts of a word. 
> 
> The effect depends on the language, I think.

Actually, having looked at flyspell-word-search-backward, it is quite
clear to me that replacing that with word-search-backward will change
behavior: for example, it will disregard the current spelling
language.  In general, flyspell-word-search-backward or its
replacement absolutely must agree with flyspell-word about what is a
"word", because words is what flyspell feeds to the speller.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-21 15:12     ` Eli Zaretskii
  2014-02-21 15:21       ` Eli Zaretskii
@ 2014-02-22 12:44       ` Aleksey Cherepanov
  2014-02-22 13:10         ` Eli Zaretskii
  1 sibling, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-22 12:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 16800, Agustin Martin

On Fri, Feb 21, 2014 at 05:12:54PM +0200, Eli Zaretskii wrote:
> > Date: Fri, 21 Feb 2014 15:38:55 +0100
> > From: Agustin Martin <agustin.martin@hispalinux.es>
> > Cc: 16800@debbugs.gnu.org
> > 
> > On Wed, Feb 19, 2014 at 12:56:45AM +0400, Aleksey Cherepanov wrote:
> > > flyspell-duplicate-distance variable on its own could mitigate the
> > > problem but it changes the behaviour so I do not want to use this  
> > > variable.
> 
> What behavior does it change?  Do you really care to have a
> mis-spelled word be highlighted in a different face just because
> there's an identical mis-spelling half a megabyte away?

Yes, as a user I really care about this and as a programmer I believe
the bug could be solved well.

Also GNU coding standards say to avoid arbitrary limits (parts 2.1
and 4.2).
http://www.gnu.org/prep/standards/standards.html

But I could accept that this bug has low severity because there is
flyspell-duplicate-distance variable that could be used as a
workaround.

> > For the records, I was playing with a customized value of 50000 for that
> > distance and even if there is still a minor delay it is reasonable. I am
> > in a fast box, do not know in other boxes.
> 
> I would suggest to change the default to something finite, like 20000
> perhaps.  Having it set to -1 by default is IMO unwieldy, since
> buffers can be very large.
> 
> > > I tried to patch flyspell-word-search-backward and
> > > flyspell-word-search-forward functions from flyspell.el replacing
> > > search-backward with word-search-backward and search-forward with
> > > word-search-forward (perl -pe 's/\(search-/(word-search-/' ). It
> > > solved the problem but I do not know what it broke.
> 
> And this doesn't change behavior?  See below.

No, it seems that my setup works the same. See below.

> > > I expect problems with this solution because I do not know if
> > > flyspell's meaning of word is the same as emacs' one. I think it is
> > > described in flyspell-get-word function that is called after search-*
> > > in the patched functions.
> > 
> > I have never played with Emacs syntax tables, but  I'd expect differences
> > only if there is a mismatch between chars in OTHERCHARS and non
> > alphabetic chars that Emacs considers as possible parts of a word. 
> 
> The effect depends on the language, I think.

If I'd believe that it is a right solution I'd send a patch.

The difference is in word bounds. We are in trouble if flyspell's word
on its ends does not have ends of emacs' word. If flyspell's word has
ends of emacs' word on its ends and even contain them inside then we
are ok (try to search "a b" over "aa bb a b aa bb"). So could ends of
flyspell's word do not match with ends of emacs' word?

So I think word search instead of regular search does not change
behaviour of my setup with EN and RU languages (both at the same time)
and excplicitly specified ispell-dictionary-alist (I used some popular
instructions as is to setup it so long time ago).
ispell-dictionary-alist contains character sets used by
flyspell-get-word.

As an alternative I think we could generate regexps on the fly with
flyspell's word boundaries around words and search them. It would be
like
(re-search-forward (concat "\\<" (regexp-quote word) "\\>") bound t)
instead of
(word-search-forward word bound t)
but with flyspell's word boundaries instead of "\\<" and "\\>".

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-22 12:44       ` Aleksey Cherepanov
@ 2014-02-22 13:10         ` Eli Zaretskii
  2014-02-22 16:02           ` Aleksey Cherepanov
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-02-22 13:10 UTC (permalink / raw)
  To: Aleksey Cherepanov; +Cc: 16800, agustin.martin

> Date: Sat, 22 Feb 2014 16:44:13 +0400
> From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> Cc: Agustin Martin <agustin.martin@hispalinux.es>, 16800@debbugs.gnu.org
> 
> Also GNU coding standards say to avoid arbitrary limits (parts 2.1
> and 4.2).
> http://www.gnu.org/prep/standards/standards.html

This limit is not arbitrary.

> > > > I tried to patch flyspell-word-search-backward and
> > > > flyspell-word-search-forward functions from flyspell.el replacing
> > > > search-backward with word-search-backward and search-forward with
> > > > word-search-forward (perl -pe 's/\(search-/(word-search-/' ). It
> > > > solved the problem but I do not know what it broke.
> > 
> > And this doesn't change behavior?  See below.
> 
> No, it seems that my setup works the same. See below.

Your setup _might_ work the same, especially if you don't mix
different languages in the same buffer.  But in general, your change
does affect behavior.

> The difference is in word bounds. We are in trouble if flyspell's word
> on its ends does not have ends of emacs' word. If flyspell's word has
> ends of emacs' word on its ends and even contain them inside then we
> are ok (try to search "a b" over "aa bb a b aa bb"). So could ends of
> flyspell's word do not match with ends of emacs' word?

Yes, definitely.  See what flyspell-get-word does to find where the
word begins and where it ends.  Flyspell's "words" are
language-sensitive, whereas Emacs's words are not.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-22 13:10         ` Eli Zaretskii
@ 2014-02-22 16:02           ` Aleksey Cherepanov
  2014-02-22 16:41             ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-22 16:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 16800, agustin.martin

On Sat, Feb 22, 2014 at 03:10:02PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 22 Feb 2014 16:44:13 +0400
> > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> > Cc: Agustin Martin <agustin.martin@hispalinux.es>, 16800@debbugs.gnu.org
> > 
> > Also GNU coding standards say to avoid arbitrary limits (parts 2.1
> > and 4.2).
> > http://www.gnu.org/prep/standards/standards.html
> 
> This limit is not arbitrary.

Anyway it is a limit that could be avoided.

> > > > > I tried to patch flyspell-word-search-backward and
> > > > > flyspell-word-search-forward functions from flyspell.el replacing
> > > > > search-backward with word-search-backward and search-forward with
> > > > > word-search-forward (perl -pe 's/\(search-/(word-search-/' ). It
> > > > > solved the problem but I do not know what it broke.
> > > 
> > > And this doesn't change behavior?  See below.
> > 
> > No, it seems that my setup works the same. See below.
> 
> Your setup _might_ work the same, especially if you don't mix
> different languages in the same buffer.  But in general, your change
> does affect behavior.

I mix languages. I am pretty sure that my setup works the same.

BTW solution around reduction of jump points does not not affect
faces: "nd" or "badnd" at the end of "good badnd good " does not call
spell check on the first "badnd".

> > The difference is in word bounds. We are in trouble if flyspell's word
> > on its ends does not have ends of emacs' word. If flyspell's word has
> > ends of emacs' word on its ends and even contain them inside then we
> > are ok (try to search "a b" over "aa bb a b aa bb"). So could ends of
> > flyspell's word do not match with ends of emacs' word?
> 
> Yes, definitely.  See what flyspell-get-word does to find where the
> word begins and where it ends.  Flyspell's "words" are
> language-sensitive, whereas Emacs's words are not.

I saw this function.

Emacs words are language sensitive too. Emacs jumps through all ends
of words from RU and EN languages even if the words are not separated.

Example:
Word of 3 parts: English "asdf", Russian "фыва" (execute-kbd-macro
(kbd "C-q 02104 RET C-q 02113 RET C-q 02062 RET C-q 02060 RET")),
English "asdf".

asdfфываasdf
^   ^   ^   ^
b   b   b
    f   f   f

M-b and C-M-r \< jumps through positions marked by b. M-f and
C-M-r \> jumps through positions marked by f. It is one word for
flyspell in my setup and in LANG=C emacs -Q.

But I do not know if it is applicable to other languages and/or other
setups. How could it be improved? Other solutions?

I'd propose as a variant to use emacs' words for flyspell and vice
versa but it would a bad idea. My emacs jumps over asdf'asdf in one
hop and similar behaviour could be done for other languages with their
respective 'otherchars' but it would be inconvenient for some users
including me (python-mode has _ as a part of word by default, both
questions how to enable it everywhere and how to disable it in
python-mode exist).

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-22 16:02           ` Aleksey Cherepanov
@ 2014-02-22 16:41             ` Eli Zaretskii
  2014-02-22 18:55               ` Aleksey Cherepanov
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-02-22 16:41 UTC (permalink / raw)
  To: Aleksey Cherepanov; +Cc: 16800, agustin.martin

> Date: Sat, 22 Feb 2014 20:02:17 +0400
> From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> Cc: agustin.martin@hispalinux.es, 16800@debbugs.gnu.org
> 
> > Your setup _might_ work the same, especially if you don't mix
> > different languages in the same buffer.  But in general, your change
> > does affect behavior.
> 
> I mix languages. I am pretty sure that my setup works the same.

Not in general, it isn't.  See below.

> BTW solution around reduction of jump points does not not affect
> faces: "nd" or "badnd" at the end of "good badnd good " does not call
> spell check on the first "badnd".

Not sure I understand what you are saying here.  What "first badnd"?
you have only one in this example.

> Emacs words are language sensitive too.

But not in the same way as ispell/flyspell is.  The CASECHARS,
NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only
taken into account by ispell/flyspell.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-22 16:41             ` Eli Zaretskii
@ 2014-02-22 18:55               ` Aleksey Cherepanov
  2014-02-22 20:16                 ` Aleksey Cherepanov
  2014-02-22 21:03                 ` Eli Zaretskii
  0 siblings, 2 replies; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-22 18:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 16800, agustin.martin

On Sat, Feb 22, 2014 at 06:41:08PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 22 Feb 2014 20:02:17 +0400
> > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> > Cc: agustin.martin@hispalinux.es, 16800@debbugs.gnu.org
> > 
> > > Your setup _might_ work the same, especially if you don't mix
> > > different languages in the same buffer.  But in general, your change
> > > does affect behavior.
> > 
> > I mix languages. I am pretty sure that my setup works the same.
> 
> Not in general, it isn't.  See below.

I agree.

Oh, not even for my setup. But for my setup together with my files.
I've got an example.

> > BTW solution around reduction of jump points does not not affect
> > faces: "nd" or "badnd" at the end of "good badnd good " does not call
> > spell check on the first "badnd".
> 
> Not sure I understand what you are saying here.  What "first badnd"?
> you have only one in this example.

"nd" does not cause spell check of "badnd". Another "badnd" at the end
does not cause spell check of the first "badnd".

> > Emacs words are language sensitive too.
> 
> But not in the same way as ispell/flyspell is.  The CASECHARS,
> NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only
> taken into account by ispell/flyspell.

I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...)
So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is
one word for emacs and two words with garbage around for flyspell. I
think my solution fails in such case.

So flyspell's set should be consisted of full emacs categories to make
my solution work. Code for emacs word boundaries is in category.h,
macro WORD_BOUNDARY_P. We could use regular search for bad setups and
word search for good setups. Though it does not seem trivial to check
if flyspell's dictionary setup is good for my solution.

Russian alphabet is not a full emacs (Unicode, I guess) category. The
full category is Cyrillic script (or even wider). My solution does not
work if there is a letter from the complement (for instance, Lje
02131) right near my mis-spelling word. So I was wrong about the
behaviour: it is not the same, I just do not see differences in my
files.

We could mix: regular search for short distance and word search for
longer distance. Though it seems ugly for me.

I still think that we could make regexps with word boundaries
according to flyspell's meaning of word.

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-22 18:55               ` Aleksey Cherepanov
@ 2014-02-22 20:16                 ` Aleksey Cherepanov
  2014-02-22 21:03                 ` Eli Zaretskii
  1 sibling, 0 replies; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-22 20:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 16800, agustin.martin

On Sat, Feb 22, 2014 at 10:55:11PM +0400, Aleksey Cherepanov wrote:
> On Sat, Feb 22, 2014 at 06:41:08PM +0200, Eli Zaretskii wrote:
> > > Date: Sat, 22 Feb 2014 20:02:17 +0400
> > > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> > > Cc: agustin.martin@hispalinux.es, 16800@debbugs.gnu.org
> > > 
> > > > Your setup _might_ work the same, especially if you don't mix
> > > > different languages in the same buffer.  But in general, your change
> > > > does affect behavior.
> > > 
> > > I mix languages. I am pretty sure that my setup works the same.
> > 
> > Not in general, it isn't.  See below.
> 
> I agree.
> 
> Oh, not even for my setup. But for my setup together with my files.
> I've got an example.
> 
> > > BTW solution around reduction of jump points does not not affect
> > > faces: "nd" or "badnd" at the end of "good badnd good " does not call
> > > spell check on the first "badnd".
> > 
> > Not sure I understand what you are saying here.  What "first badnd"?
> > you have only one in this example.
> 
> "nd" does not cause spell check of "badnd". Another "badnd" at the end
> does not cause spell check of the first "badnd".
> 
> > > Emacs words are language sensitive too.
> > 
> > But not in the same way as ispell/flyspell is.  The CASECHARS,
> > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only
> > taken into account by ispell/flyspell.
> 
> I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...)
> So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is
> one word for emacs and two words with garbage around for flyspell. I
> think my solution fails in such case.
> 
> So flyspell's set should be consisted of full emacs categories to make
> my solution work. Code for emacs word boundaries is in category.h,
> macro WORD_BOUNDARY_P. We could use regular search for bad setups and
> word search for good setups. Though it does not seem trivial to check
> if flyspell's dictionary setup is good for my solution.
> 
> Russian alphabet is not a full emacs (Unicode, I guess) category. The
> full category is Cyrillic script (or even wider). My solution does not
> work if there is a letter from the complement (for instance, Lje
> 02131) right near my mis-spelling word. So I was wrong about the
> behaviour: it is not the same, I just do not see differences in my
> files.

Oh, my setup is wrong. Default setup uses
"[[:alpha:]]"  ; casechars
"[^[:alpha:]]" ; not-casechars
due to ispell-set-spellchecker-params function:
    ;; If Emacs flavor supports [:alpha:] use it for global dicts.  If
    ;; spellchecker also supports UTF-8 via command-line option use it
    ;; in communication.  This does not affect definitions in your
    ;; init file.

My solution should work well with such setup.

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-22 18:55               ` Aleksey Cherepanov
  2014-02-22 20:16                 ` Aleksey Cherepanov
@ 2014-02-22 21:03                 ` Eli Zaretskii
  2014-02-23  1:26                   ` Agustin Martin
  1 sibling, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-02-22 21:03 UTC (permalink / raw)
  To: Aleksey Cherepanov; +Cc: 16800, agustin.martin

> Date: Sat, 22 Feb 2014 22:55:11 +0400
> From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> Cc: agustin.martin@hispalinux.es, 16800@debbugs.gnu.org
> 
> > > BTW solution around reduction of jump points does not not affect
> > > faces: "nd" or "badnd" at the end of "good badnd good " does not call
> > > spell check on the first "badnd".
> > 
> > Not sure I understand what you are saying here.  What "first badnd"?
> > you have only one in this example.
> 
> "nd" does not cause spell check of "badnd". Another "badnd" at the end
> does not cause spell check of the first "badnd".

Of course, it isn't; why should it?

> > > Emacs words are language sensitive too.
> > 
> > But not in the same way as ispell/flyspell is.  The CASECHARS,
> > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only
> > taken into account by ispell/flyspell.
> 
> I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...)
> So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is
> one word for emacs and two words with garbage around for flyspell. I
> think my solution fails in such case.

It's more complex than that: with some languages, and at least with
aspell, we take these parameters from the dictionary.  So they cannot
be known in advance in some cases.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-22 21:03                 ` Eli Zaretskii
@ 2014-02-23  1:26                   ` Agustin Martin
  2014-02-23 18:36                     ` Eli Zaretskii
                                       ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Agustin Martin @ 2014-02-23  1:26 UTC (permalink / raw)
  To: 16800; +Cc: Aleksey Cherepanov


[-- Attachment #1.1: Type: text/plain, Size: 1733 bytes --]

2014-02-22 22:03 GMT+01:00 Eli Zaretskii <eliz@gnu.org>:

> > Date: Sat, 22 Feb 2014 22:55:11 +0400
> > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> >
> > > > Emacs words are language sensitive too.
> > >
> > > But not in the same way as ispell/flyspell is.  The CASECHARS,
> > > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only
> > > taken into account by ispell/flyspell.
> >
> > I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...)
> > So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is
> > one word for emacs and two words with garbage around for flyspell. I
> > think my solution fails in such case.
>
> It's more complex than that: with some languages, and at least with
> aspell, we take these parameters from the dictionary.  So they cannot
> be known in advance in some cases.
>

Hi,

Not yet sure if  I am missing something important, but I am playing with a
regexp search in flyspell-word-search-* functions based on what flyspell
thinks is the word to spellcheck (`word') and what thinks should not be
part of a word (`NOTCASECHARS'). Since no OTHERCHARS is used there may be
some intermediate matches being false positives that will be discarded once
flyspell-word checks them.

I have tested this in Alekseys's file and is apparently working well and in
this particular case with much better efficiency. Need to think about more
ad-hoc situations where it may fail or slow down things. Suggestions for
possible failures are welcome.

Patch is attached. I did the tests against an old and patched version of
flyspell.el (that shipped with Debian stable) and built the patch for it.
Should apply and work similarly in trunk's flyspell.el.

-- 
Agustin

[-- Attachment #1.2: Type: text/html, Size: 2452 bytes --]

[-- Attachment #2: flyspell.el_flyspell-word-search.2.diff --]
[-- Type: text/plain, Size: 1299 bytes --]

--- flyspell.el.orig	2014-02-23 02:17:03.680107519 +0100
+++ flyspell.el	2014-02-23 02:50:50.634625248 +0100
@@ -1050,8 +1050,19 @@
   (save-excursion
     (let ((r '())
 	  (inhibit-point-motion-hooks t)
+	  (flyspell-not-casechars (flyspell-get-not-casechars))
 	  p)
-      (while (and (not r) (setq p (search-backward word bound t)))
+      (while 
+	  (and (not r) 
+	       (setq p 
+		     (re-search-backward
+		      (concat
+		       "\\(" flyspell-not-casechars "\\|\\b\\)"
+		       "\\(" word "\\)"
+		       flyspell-not-casechars
+		       )
+		      bound t)))
+	(goto-char (match-beginning 2))
 	(let ((lw (flyspell-get-word)))
 	  (if (and (consp lw)
 		   (if ignore-case
@@ -1068,8 +1079,19 @@
   (save-excursion
     (let ((r '())
 	  (inhibit-point-motion-hooks t)
+	  (flyspell-not-casechars (flyspell-get-not-casechars))
 	  p)
-      (while (and (not r) (setq p (search-forward word bound t)))
+      (while 
+	  (and (not r) 
+	       (setq p 
+		     (re-search-forward 
+		      (concat
+		       flyspell-not-casechars
+		       "\\(" word "\\)"
+		       "\\(" flyspell-not-casechars "\\|\\b\\)"
+		       )
+		      bound t)))
+	(goto-char (match-beginning 1))
 	(let ((lw (flyspell-get-word)))
 	  (if (and (consp lw) (string-equal (car lw) word))
 	      (setq r p)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-23  1:26                   ` Agustin Martin
@ 2014-02-23 18:36                     ` Eli Zaretskii
  2014-02-23 19:56                     ` Aleksey Cherepanov
  2014-02-23 20:39                     ` Aleksey Cherepanov
  2 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-02-23 18:36 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800, aleksey.4erepanov

> Date: Sun, 23 Feb 2014 02:26:00 +0100
> From: Agustin Martin <agustin.martin@hispalinux.es>
> Cc: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> 
> Not yet sure if  I am missing something important, but I am playing with a
> regexp search in flyspell-word-search-* functions based on what flyspell
> thinks is the word to spellcheck (`word') and what thinks should not be
> part of a word (`NOTCASECHARS'). Since no OTHERCHARS is used there may be
> some intermediate matches being false positives that will be discarded once
> flyspell-word checks them.
> 
> I have tested this in Alekseys's file and is apparently working well and in
> this particular case with much better efficiency. Need to think about more
> ad-hoc situations where it may fail or slow down things. Suggestions for
> possible failures are welcome.
> 
> Patch is attached. I did the tests against an old and patched version of
> flyspell.el (that shipped with Debian stable) and built the patch for it.
> Should apply and work similarly in trunk's flyspell.el.

Thanks, it's good to know it's possible to speed up the search for
duplicate mis-spellings without sacrificing correctness.

However, for any speedup that we will be able to come up with, there
can always be a buffer large enough to make the delay annoyingly long.
Therefore, I think the default of flyspell-duplicate-distance should
not be -1, but some finite and reasonably small value.  Or maybe we
should turn off this feature by default.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-23  1:26                   ` Agustin Martin
  2014-02-23 18:36                     ` Eli Zaretskii
@ 2014-02-23 19:56                     ` Aleksey Cherepanov
  2014-02-23 23:02                       ` Aleksey Cherepanov
  2014-02-23 20:39                     ` Aleksey Cherepanov
  2 siblings, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-23 19:56 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

On Sun, Feb 23, 2014 at 02:26:00AM +0100, Agustin Martin wrote:
> 2014-02-22 22:03 GMT+01:00 Eli Zaretskii <eliz@gnu.org>:
> 
> > > Date: Sat, 22 Feb 2014 22:55:11 +0400
> > > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> > >
> > > > > Emacs words are language sensitive too.
> > > >
> > > > But not in the same way as ispell/flyspell is.  The CASECHARS,
> > > > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only
> > > > taken into account by ispell/flyspell.
> > >
> > > I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...)
> > > So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is
> > > one word for emacs and two words with garbage around for flyspell. I
> > > think my solution fails in such case.
> >
> > It's more complex than that: with some languages, and at least with
> > aspell, we take these parameters from the dictionary.  So they cannot
> > be known in advance in some cases.
> >
> 
> Hi,
> 
> Not yet sure if  I am missing something important, but I am playing with a
> regexp search in flyspell-word-search-* functions based on what flyspell
> thinks is the word to spellcheck (`word') and what thinks should not be
> part of a word (`NOTCASECHARS'). Since no OTHERCHARS is used there may be
> some intermediate matches being false positives that will be discarded once
> flyspell-word checks them.
> 
> I have tested this in Alekseys's file and is apparently working well and in
> this particular case with much better efficiency. Need to think about more
> ad-hoc situations where it may fail or slow down things. Suggestions for
> possible failures are welcome.
> 
> Patch is attached. I did the tests against an old and patched version of
> flyspell.el (that shipped with Debian stable) and built the patch for it.
> Should apply and work similarly in trunk's flyspell.el.
> 

> --- flyspell.el.orig	2014-02-23 02:17:03.680107519 +0100
> +++ flyspell.el	2014-02-23 02:50:50.634625248 +0100
> @@ -1050,8 +1050,19 @@
>    (save-excursion
>      (let ((r '())
>  	  (inhibit-point-motion-hooks t)
> +	  (flyspell-not-casechars (flyspell-get-not-casechars))

I'd move concat here too so it is out of inner loop.

>  	  p)
> -      (while (and (not r) (setq p (search-backward word bound t)))
> +      (while 
> +	  (and (not r) 
> +	       (setq p 
> +		     (re-search-backward
> +		      (concat
> +		       "\\(" flyspell-not-casechars "\\|\\b\\)"

I think \b here could be replaced with \` (beginning of buffer). I
think it is the only boundary we need that is not described by
not-casechars, word sequence. Similarly \' (end of buffer) could be
used for forward search.

Also not capturing group ("\\(?:") could be used because we do not
need a match data of the first group. It should work faster but I
don't really know.

Maybe it would be faster to not capture word but capture one char or
void but I doubt the difference would be noticable.

> +		       "\\(" word "\\)"

I think regexp-quote around the word is necessary here.

> +		       flyspell-not-casechars
> +		       )
> +		      bound t)))
> +	(goto-char (match-beginning 2))

s/2/1/ if the first group is not capturing.

>  	(let ((lw (flyspell-get-word)))
>  	  (if (and (consp lw)
>  		   (if ignore-case
> @@ -1068,8 +1079,19 @@
>    (save-excursion
>      (let ((r '())
>  	  (inhibit-point-motion-hooks t)
> +	  (flyspell-not-casechars (flyspell-get-not-casechars))

concat here as above.

>  	  p)
> -      (while (and (not r) (setq p (search-forward word bound t)))
> +      (while 
> +	  (and (not r) 
> +	       (setq p 
> +		     (re-search-forward 
> +		      (concat
> +		       flyspell-not-casechars
> +		       "\\(" word "\\)"

regexp-quote as above.

> +		       "\\(" flyspell-not-casechars "\\|\\b\\)"

I think \b could be replaced by \' here as described above.

The second group could be not capturing here.

> +		       )
> +		      bound t)))
> +	(goto-char (match-beginning 1))

I guess match-end should here.

>  	(let ((lw (flyspell-get-word)))
>  	  (if (and (consp lw) (string-equal (car lw) word))
>  	      (setq r p)


I guess that \b would work faster than the group so we could have 'if'
statement around the whole loop that has one implementation with \b
for case when casechars are "[[:alpha:]]" and not-casechars are
"[^[:alpha:]]" and another implementation as above for other cases.
But it seems cumbersome.

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-23  1:26                   ` Agustin Martin
  2014-02-23 18:36                     ` Eli Zaretskii
  2014-02-23 19:56                     ` Aleksey Cherepanov
@ 2014-02-23 20:39                     ` Aleksey Cherepanov
  2 siblings, 0 replies; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-23 20:39 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

On Sun, Feb 23, 2014 at 02:26:00AM +0100, Agustin Martin wrote:
> --- flyspell.el.orig	2014-02-23 02:17:03.680107519 +0100
> +++ flyspell.el	2014-02-23 02:50:50.634625248 +0100
> @@ -1050,8 +1050,19 @@
>    (save-excursion
>      (let ((r '())
>  	  (inhibit-point-motion-hooks t)
> +	  (flyspell-not-casechars (flyspell-get-not-casechars))
>  	  p)
> -      (while (and (not r) (setq p (search-backward word bound t)))
> +      (while 
> +	  (and (not r) 
> +	       (setq p 
> +		     (re-search-backward
> +		      (concat
> +		       "\\(" flyspell-not-casechars "\\|\\b\\)"
> +		       "\\(" word "\\)"
> +		       flyspell-not-casechars
> +		       )
> +		      bound t)))
> +	(goto-char (match-beginning 2))

I am not yet sure that it is important. But as written we store
position - 1 instead of position if we matched with
flyspell-not-casechars branch. (goto-char ...) could be placed
inside (setq p ...) to fix it:
(setq p
      (and
        (re-search ...)
        (goto-char ...)))
I do not know real difference yet though.

Respectively we store position + 1 in forward search.

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-23 19:56                     ` Aleksey Cherepanov
@ 2014-02-23 23:02                       ` Aleksey Cherepanov
  2014-02-24 16:03                         ` Aleksey Cherepanov
  0 siblings, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-23 23:02 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

I've performed some tests against my .org file (not in emacs -Q):

(insert
 (mapconcat (lambda (re)
              (save-excursion
                (let ((time (current-time))
                      (count 0))
                  (while (re-search-backward re nil t)
                    (setq count (1+ count)))
                  (format "%d: %S :: %s" count (subtract-time (current-time) time) re))))
            '("\\<[[:alpha:]]"
              "\\b[[:alpha:]]"
              "\\([^[:alpha:]]\\|\\b\\)[[:alpha:]]"
              "\\([^[:alpha:]]\\|\\`\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\|\\`\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\)[[:alpha:]]"
              "[^[:alpha:]][[:alpha:]]"
              "\\(?:\\b\\|'\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\|\\`\\)\\([[:alpha:]]+\\)"
              "\\([^[:alpha:]]\\|\\`\\)\\(?:[[:alpha:]]+\\)"
              "\\([^[:alpha:]]\\|\\`\\)[[:alpha:]]+")
            "\n"))

Matches| Time              | Regexp tried
299158: (0 2 841190 614000) :: \<[[:alpha:]]
299158: (0 2 876846 547000) :: \b[[:alpha:]]
307919: (0 3 321676 163000) :: \([^[:alpha:]]\|\b\)[[:alpha:]]
307899: (0 3 291931 838000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 821347 257000) :: \(?:[^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 760125 839000) :: \(?:[^[:alpha:]]\)[[:alpha:]]
307899: (0 2 765410 758000) :: [^[:alpha:]][[:alpha:]]
299518: (0 2 998895 976000) :: \(?:\b\|'\)[[:alpha:]]
307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\)
307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\)
307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+

I should admit that word search breaks things even for setup with
[[:alpha:]]: a0a is 1 word for emacs and 2 for flyspell. I missed it
because Russian behaves differently (there is word boundary on border
between digits and Russian letters). My bad.

307899: (0 2 760125 839000) :: \(?:[^[:alpha:]]\)[[:alpha:]]
307899: (0 2 765410 758000) :: [^[:alpha:]][[:alpha:]]
These two suggest that it may provide a speed up if we do not check
beginning of buffer in regexp but check it separately. But I doubt it
is worth it.

On Sun, Feb 23, 2014 at 11:56:59PM +0400, Aleksey Cherepanov wrote:
> Also not capturing group ("\\(?:") could be used because we do not
> need a match data of the first group. It should work faster but I
> don't really know.

307899: (0 3 291931 838000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 821347 257000) :: \(?:[^[:alpha:]]\|\`\)[[:alpha:]]
The test shows that not capturing group is faster.

> Maybe it would be faster to not capture word but capture one char or
> void but I doubt the difference would be noticable.

307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\)
307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\)
307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+
Unexpectedly capturing of word works a bit faster. Maybe it is not a
word but the second group and it would work differently for search
forward. Or alpha+ instead of fixed word caused it. Anyway the
difference is very small.

Capturing word allows us to make a function to wrap a word into regexp
like word-search-regexp function wraps a word for
word-search-forward/-backward functions.

> I guess that \b would work faster than the group so we could have 'if'
> statement around the whole loop that has one implementation with \b
> for case when casechars are "[[:alpha:]]" and not-casechars are
> "[^[:alpha:]]" and another implementation as above for other cases.
> But it seems cumbersome.

My guess is wrong: \b works slower than the group. Also it is
inappropriate at all.

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-23 23:02                       ` Aleksey Cherepanov
@ 2014-02-24 16:03                         ` Aleksey Cherepanov
  2014-02-26 20:32                           ` Agustin Martin
  0 siblings, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-24 16:03 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

[-- Attachment #1: Type: text/plain, Size: 3297 bytes --]

I played with different (maybe wrong) implementations of
flyspell-word-search-backward and measured time against t.txt
(produced by the one-liner). All implementations are attached.

perl -e 'print(((join " ", ("met and") x 10) . "\n") x 30000)' > t.txt

my-test-agustin                            - Implementation from Agustin Martin with regexp-quote
my-test-concat-up                          - and concat moved upper
my-test-concat-up-goto                     - and goto-char moved into setq
my-test-concat-up-goto-notcap              - and ?: added to the first group
my-test-concat-up-goto-notcap-bob          - and \b replaced by \`
my-test-concat-up-goto-notcap-bob-bobp     - and goto-char replaced with conditional forward-char (on bobp)
my-test-concat-up-goto-notcap-nobob-bobp   - and the first group is removed, this case is handled separately,
my-test-concat-up-goto-notcap-nobob-nobobp - and bobp check is replaced by progn due to separate handling
my-test-goto-notcap-nobob-nobobp           - and concat moved down (back),
my-test-concat-up-goto-notcap-nobob-bobp-fixed - fixed for correct handling of beginning of buffer.

# |String| Time           |Result| Function name
1  nd  (0 0 192227 640000)  nil   my-test-agustin
2  nd  (0 0 192569 63000)   nil   my-test-concat-up
3  nd  (0 0 193895 468000)  nil   my-test-concat-up-goto
4  nd  (0 0 194372 743000)  nil   my-test-concat-up-goto-notcap
5  nd  (0 0 151535 868000)  nil   my-test-concat-up-goto-notcap-bob
6  nd  (0 0 131831 49000)   nil   my-test-concat-up-goto-notcap-bob-bobp
7  nd  (0 0 92012 191000)   nil   my-test-concat-up-goto-notcap-nobob-bobp
8  nd  (0 0 93928 281000)   nil   my-test-concat-up-goto-notcap-nobob-nobobp
9  nd  (0 0 93796 52000)    nil   my-test-goto-notcap-nobob-nobobp
10 nd  (0 0 94061 645000)   nil   my-test-concat-up-goto-notcap-nobob-bobp-fixed

It is from Messages of (my-try "nd") in t.txt.

The last 4 functions are quite close and often mixes differently due
to fluctuations. Really they could not be measured against this file
because re-search-forward always should return nil, I think.

Functions 7, 8, 9 are not correct: they find a word if we search a
word at the beginning of buffer staying at the middle of it. Function
10 has logic to handle this case.

Other corner cases should be thought and tried too. The times could be
different for other files and other words.

On Mon, Feb 24, 2014 at 03:02:51AM +0400, Aleksey Cherepanov wrote:
> I've performed some tests against my .org file (not in emacs -Q):

> On Sun, Feb 23, 2014 at 11:56:59PM +0400, Aleksey Cherepanov wrote:
> > Maybe it would be faster to not capture word but capture one char or
> > void but I doubt the difference would be noticable.
> 
> 307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\)
> 307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\)
> 307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+
> Unexpectedly capturing of word works a bit faster. Maybe it is not a
> word but the second group and it would work differently for search
> forward. Or alpha+ instead of fixed word caused it. Anyway the
> difference is very small.

We could avoid capturing at all. And it works faster as shown by 4
last functions.

Thanks!

-- 
Regards,
Aleksey Cherepanov

[-- Attachment #2: t.el --]
[-- Type: text/plain, Size: 11242 bytes --]


;; Implementation from Agustin Martin with additional regexp-quote
(defun my-test-agustin (word bound &optional ignore-case)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
          (flyspell-not-casechars (flyspell-get-not-casechars))
	  p)
      (while
          (and (not r)
               (setq p
                     (re-search-backward
                      (concat
                       "\\(" flyspell-not-casechars "\\|\\b\\)"
                       "\\(" (regexp-quote word) "\\)"
                       flyspell-not-casechars
                       )
                      bound t)))
        (goto-char (match-beginning 2))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))

(defun my-test-concat-up (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     "\\(" flyspell-not-casechars "\\|\\b\\)"
                     "\\(" (regexp-quote word) "\\)"
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (re-search-backward word-re bound t)))
        (goto-char (match-beginning 2))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))

(defun my-test-concat-up-goto (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     "\\(" flyspell-not-casechars "\\|\\b\\)"
                     "\\(" (regexp-quote word) "\\)"
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (and
                      (re-search-backward word-re bound t)
                      (goto-char (match-beginning 2)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))

(defun my-test-concat-up-goto-notcap (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     "\\(?:" flyspell-not-casechars "\\|\\b\\)"
                     "\\(" (regexp-quote word) "\\)"
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (and
                      (re-search-backward word-re bound t)
                      (goto-char (match-beginning 2)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))

(defun my-test-concat-up-goto-notcap-bob (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     "\\(?:" flyspell-not-casechars "\\|\\`\\)"
                     "\\(" (regexp-quote word) "\\)"
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (and
                      (re-search-backward word-re bound t)
                      (goto-char (match-beginning 2)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))

(defun my-test-concat-up-goto-notcap-bob-bobp (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     "\\(?:" flyspell-not-casechars "\\|\\`\\)"
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (and
                      (re-search-backward word-re bound t)
                      (unless (bobp)
                        (forward-char)
                        (point)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))

;; Wrong
(defun my-test-concat-up-goto-notcap-nobob-bobp (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (and
                      (re-search-backward word-re bound t)
                      (unless (bobp)
                        (forward-char)
                        (point)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      (unless r
        (setq p (goto-char (point-min)))
        (let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p))))
      r)))

;; Wrong
(defun my-test-concat-up-goto-notcap-nobob-nobobp (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (and
                      (re-search-backward word-re bound t)
                      (progn
                        (forward-char)
                        (point)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      (unless r
        (setq p (goto-char (point-min)))
        (let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p))))
      r)))

;; Wrong
(defun my-test-goto-notcap-nobob-nobobp (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (and
                      (re-search-backward (concat
                                           flyspell-not-casechars
                                           (regexp-quote word)
                                           flyspell-not-casechars)
                                          bound
                                          t)
                      (progn
                        (forward-char)
                        (point)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      (unless r
        (setq p (goto-char (point-min)))
        (let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p))))
      r)))

(defun my-test-concat-up-goto-notcap-nobob-bobp-fixed (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p
                     (and
                      (re-search-backward word-re bound t)
                      (unless (bobp)
                        (forward-char)
                        (point)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      (unless r
        (let ((pos (point)))
          (setq p (goto-char (point-min)))
          (and (search-forward word (length word) t)
               (let ((lw (flyspell-get-word)))
                 (if (and (consp lw)
                          (if ignore-case
                              (string-equal (downcase (car lw)) (downcase word))
                            (string-equal (car lw) word)))
                     (setq r p))))))
      r)))

(defun my-try (word)
  (message "%s"
           (mapconcat (lambda (func)
                        (end-of-buffer)
                        (let* ((time (current-time))
                               (res (apply func '("nd" nil))))
                          (format ":>: %s %S =%S %S"
                                  word
                                  (subtract-time (current-time) time)
                                  res
                                  func)))
                      (let ((lst '(my-test-agustin
                                   my-test-concat-up
                                   my-test-concat-up-goto
                                   my-test-concat-up-goto-notcap
                                   my-test-concat-up-goto-notcap-bob
                                   my-test-concat-up-goto-notcap-bob-bobp
                                   my-test-concat-up-goto-notcap-nobob-bobp
                                   my-test-concat-up-goto-notcap-nobob-nobobp
                                   my-test-goto-notcap-nobob-nobobp
                                   my-test-concat-up-goto-notcap-nobob-bobp-fixed)))
                        (concatenate 'list lst lst))
                      "\n")))

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-24 16:03                         ` Aleksey Cherepanov
@ 2014-02-26 20:32                           ` Agustin Martin
  2014-02-28 11:45                             ` Agustin Martin
  0 siblings, 1 reply; 33+ messages in thread
From: Agustin Martin @ 2014-02-26 20:32 UTC (permalink / raw)
  To: Aleksey Cherepanov, 16800

On Mon, Feb 24, 2014 at 08:03:17PM +0400, Aleksey Cherepanov wrote:
> I played with different (maybe wrong) implementations of
> flyspell-word-search-backward and measured time against t.txt
> (produced by the one-liner). All implementations are attached.

[ ... Tons of extensive and impressive debugging ... ]

> We could avoid capturing at all. And it works faster as shown by 4
> last functions.

Hi, 

Thanks a lot for the extensive debugging and for all the suggestions. I
have been playing with something based in your last function, but trying
to get something more compact, see below current status


;; -----------------------------------                         
(defun my-test-concat-up-goto-notcap-nobob-bobp-if (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (length word) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
;; -----------------------------------                         

I did some efficiency test and it seemed similar to those of your efficient
functions. Need to check further for corner cases, bugs, etc ...

Big thanks!

-- 
Agustin





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-26 20:32                           ` Agustin Martin
@ 2014-02-28 11:45                             ` Agustin Martin
  2014-02-28 11:51                               ` Eli Zaretskii
  2014-02-28 23:11                               ` Aleksey Cherepanov
  0 siblings, 2 replies; 33+ messages in thread
From: Agustin Martin @ 2014-02-28 11:45 UTC (permalink / raw)
  To: Aleksey Cherepanov, 16800

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

On Wed, Feb 26, 2014 at 09:32:02PM +0100, Agustin Martin wrote:
> On Mon, Feb 24, 2014 at 08:03:17PM +0400, Aleksey Cherepanov wrote:
> > I played with different (maybe wrong) implementations of
> > flyspell-word-search-backward and measured time against t.txt
> > (produced by the one-liner). All implementations are attached.
> 
> [ ... Tons of extensive and impressive debugging ... ]
> 
> > We could avoid capturing at all. And it works faster as shown by 4
> > last functions.
> 
> Hi, 
> 
> Thanks a lot for the extensive debugging and for all the suggestions. I
> have been playing with something based in your last function, but trying
> to get something more compact, see below current status
[ ... ]
> I did some efficiency test and it seemed similar to those of your efficient
> functions. Need to check further for corner cases, bugs, etc ...

Hi, Aleksey

Please find attached my first candidate for commit. Is similar to what I
sent before, but needed to add an explicit check for word at eob in
`flyspell-word-search-forward'. 

Will try to have more testing before committing. Seems to work well with the
file generated by your one-liner, even with corner cases like new
misspellings added at bob or eob, but the wider the testing the better.

Hope no one will generate files with words containing something in
OTHERCHARS.

Thanks for all your help

-- 
Agustin

[-- Attachment #2: flyspell.el_flyspell-word-search.3.diff --]
[-- Type: text/x-diff, Size: 2166 bytes --]

--- flyspell.el.orig	2014-02-26 19:05:29.651986038 +0100
+++ flyspell.el	2014-02-28 12:01:03.930010553 +0100
@@ -1048,10 +1048,21 @@
 ;;*---------------------------------------------------------------------*/
 (defun flyspell-word-search-backward (word bound &optional ignore-case)
   (save-excursion
-    (let ((r '())
-	  (inhibit-point-motion-hooks t)
-	  p)
-      (while (and (not r) (setq p (search-backward word bound t)))
+    (let* ((r '())
+	   (inhibit-point-motion-hooks t)
+	   (flyspell-not-casechars (flyspell-get-not-casechars))
+	   (word-re (concat flyspell-not-casechars
+			    (regexp-quote word)
+			    flyspell-not-casechars))
+	   p)
+      (while
+	  (and (not r)
+	       (setq p (if (re-search-backward word-re bound t)
+			   ;; word-re match begins one char before word
+			   (progn (forward-char) (point))
+			 ;; Check above does not match similar word at b-o-b
+			 (goto-char (point-min))
+			 (search-forward word (length word) t))))
 	(let ((lw (flyspell-get-word)))
 	  (if (and (consp lw)
 		   (if ignore-case
@@ -1066,10 +1077,25 @@
 ;;*---------------------------------------------------------------------*/
 (defun flyspell-word-search-forward (word bound)
   (save-excursion
-    (let ((r '())
-	  (inhibit-point-motion-hooks t)
-	  p)
-      (while (and (not r) (setq p (search-forward word bound t)))
+    (let* ((r '())
+	   (inhibit-point-motion-hooks t)
+	   (word-end (nth 2 (flyspell-get-word)))
+	   (flyspell-not-casechars (flyspell-get-not-casechars))
+	   (word-re (concat flyspell-not-casechars
+			    (regexp-quote word)
+			    flyspell-not-casechars))
+	   p)
+      (while
+	  (and (not r)
+	       (setq p (if (= word-end (point-max))
+			   nil ;; Current word is at e-o-b. No forward search
+			 (if (re-search-forward word-re bound t)
+			     ;; word-re match ends one char after word
+			     (progn (backward-char) (point))
+			   ;; Check above does not match similar word at e-o-b
+			   (goto-char (point-max))
+			   (search-backward word (- (point-max)
+						    (length word)) t)))))
 	(let ((lw (flyspell-get-word)))
 	  (if (and (consp lw) (string-equal (car lw) word))
 	      (setq r p)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-28 11:45                             ` Agustin Martin
@ 2014-02-28 11:51                               ` Eli Zaretskii
  2014-03-01 21:44                                 ` Aleksey Cherepanov
  2014-02-28 23:11                               ` Aleksey Cherepanov
  1 sibling, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-02-28 11:51 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800, aleksey.4erepanov

> Date: Fri, 28 Feb 2014 12:45:45 +0100
> From: Agustin Martin <agustin.martin@hispalinux.es>
> 
> Please find attached my first candidate for commit. Is similar to what I
> sent before, but needed to add an explicit check for word at eob in
> `flyspell-word-search-forward'. 
> 
> Will try to have more testing before committing. Seems to work well with the
> file generated by your one-liner, even with corner cases like new
> misspellings added at bob or eob, but the wider the testing the better.
> 
> Hope no one will generate files with words containing something in
> OTHERCHARS.
> 
> Thanks for all your help

Thanks to both of you, but I still think that having flyspell search
without limits for duplicate mis-spellings is not a good idea.  We
have no control on how big user buffers could be.

So I think we should limit that search by default.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-28 11:45                             ` Agustin Martin
  2014-02-28 11:51                               ` Eli Zaretskii
@ 2014-02-28 23:11                               ` Aleksey Cherepanov
  2014-03-01 10:33                                 ` Aleksey Cherepanov
                                                   ` (2 more replies)
  1 sibling, 3 replies; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-02-28 23:11 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

[-- Attachment #1: Type: text/plain, Size: 3303 bytes --]

Hi, Agustin!

Wow! Your 'if' in 'while's condition is very elegant. Nice!

On Fri, Feb 28, 2014 at 12:45:45PM +0100, Agustin Martin wrote:
> On Wed, Feb 26, 2014 at 09:32:02PM +0100, Agustin Martin wrote:
> > On Mon, Feb 24, 2014 at 08:03:17PM +0400, Aleksey Cherepanov wrote:
> > > I played with different (maybe wrong) implementations of
> > > flyspell-word-search-backward and measured time against t.txt
> > > (produced by the one-liner). All implementations are attached.
> > 
> > [ ... Tons of extensive and impressive debugging ... ]
> > 
> > > We could avoid capturing at all. And it works faster as shown by 4
> > > last functions.
> > 
> > Hi, 
> > 
> > Thanks a lot for the extensive debugging and for all the suggestions. I
> > have been playing with something based in your last function, but trying
> > to get something more compact, see below current status
> [ ... ]
> > I did some efficiency test and it seemed similar to those of your efficient
> > functions. Need to check further for corner cases, bugs, etc ...
> 
> Hi, Aleksey
> 
> Please find attached my first candidate for commit. Is similar to what I
> sent before, but needed to add an explicit check for word at eob in
> `flyspell-word-search-forward'. 
> 
> Will try to have more testing before committing. Seems to work well with the
> file generated by your one-liner, even with corner cases like new
> misspellings added at bob or eob, but the wider the testing the better.

I've wrote a small fuzzer. It is in attach. To run it:
$ LANG=C emacs -Q --eval '(load-file "t2.el")'
Then C-j to start. It modifies buffer you are in.

Your -forward function gets stuck. (kbd "nd SPC and C-a") could repeat
it. my-test-forward-agustin-fixed contains fix. It incorporates
simplified word-end logic: we slip forward using flyspell-get-word,
then we check eobp. Though I did not understand why -backward does not
need a similar fix and I got the answer: my mistake with (length word)
did not allow one word to be marked as duplicate.

(if condition nil ...) could be replaced with (unless condition ...)
but I do not know what one is more readable.


(kbd "nd SPC and SPC nd C-b") fails to highlight the second "nd" as
duplicate. It is a problem with bound equal to (length word) in
-backward function. I did not check it when I wrote it.
> +			 (search-forward word (length word) t))))
                         (search-forward word (1+ (length word)) t))))

One "nd" is colored as duplicate due to -backward function after that
fix. I did not touch it yet because it is a time for a break for me.

> Hope no one will generate files with words containing something in
> OTHERCHARS.

Why?

Otherchars are not rare as of ' is there for "american" dictionary. So
even this email contains such words ("while's").

BTW quite interesting flyspell behaviour could be observed with
"met'met'and": if you jump back and forth over this word then met'met
is highlighted when you are at the beginning and met'and is
highlighted when you are at the end.

Also "met'met'and met'and" highlights both met'and as mis-spelled (the
second met'and is not marked as duplicate).


Are there any variables that could affect search like
case-fold-search? My fuzzer does not set them but users could have
them set.

Thanks!

-- 
Regards,
Aleksey Cherepanov

[-- Attachment #2: t2.el --]
[-- Type: text/plain, Size: 10856 bytes --]

(require 'cl)
(require 'flyspell)

(setq my-fuzzer-buffer-name "*temp for fuzzer*")
(switch-to-buffer my-fuzzer-buffer-name)
(unless (= (point-min) (point-max))
  (error "Could not operate on non-empty buffer"))

(flyspell-mode 1)

(random t)

;; Orig
(defun my-test-backward-orig (word bound &optional ignore-case)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-backward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))
(defun my-test-forward-orig (word bound)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-forward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Agustin Martin
(defun my-test-backward-agustin (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (length word) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin (word bound)
  (save-excursion
    (let* ((r '())
          (inhibit-point-motion-hooks t)
          (word-end (nth 2 (flyspell-get-word)))
          (flyspell-not-casechars (flyspell-get-not-casechars))
          (word-re (concat flyspell-not-casechars
                           (regexp-quote word)
                           flyspell-not-casechars))
          p)
      (while
         (and (not r)
              (setq p (if (= word-end (point-max))
                          nil ;; Current word is at e-o-b. No forward search
                        (if (re-search-forward word-re bound t)
                            ;; word-re match ends one char after word
                            (progn (backward-char) (point))
                          ;; Check above does not match similar word at e-o-b
                          (goto-char (point-max))
                          (search-backward word (- (point-max)
                                                   (length word)) t)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Fixed
(defun my-test-backward-agustin-fixed (word bound &optional ignore-case)
  ;; (my-test-backward-agustin word bound ignore-case))
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (1+ (length word)) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin-fixed (word bound)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat flyspell-not-casechars
                            (regexp-quote word)
                            flyspell-not-casechars))
           p)
      (flyspell-get-word)
      (while
          (and (not r)
               (setq p (if (eobp)
                           nil ;; Current word is at e-o-b. No forward search
                         (if (re-search-forward word-re bound t)
                             ;; word-re match ends one char after word
                             (progn (backward-char) (point))
                           ;; Check above does not match similar word at e-o-b
                           (goto-char (point-max))
                           (and (search-backward word (- (point-max)
                                                         (length word)) t)
                                (goto-char (point-max)))))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

(defun my-make-test-macro ()
  (let* ((good "met")
         (sep "SPC")
         (bad "nd")
         (oc "'")
         (bol "C-a")
         ;; not really eol but enough
         (eol "C-e")
         (parts (list good sep bad oc bol eol))
         (len (length parts)))
    (eval `(kbd ,(mapconcat (lambda (a)
                              (nth (random len) parts))
                            (make-list (1+ (random 100)) 0)
                            " ")))))

;; nil if everythings is equal,
;; 'badtext if text is not equal,
;; position is the first position with different properties.
(defun my-compare-strings-with-properties (a b)
  (if (string= (car a) (car b))
      (let ((len (length (car a)))
            (pos 0)
            (badpos nil)
            (faces1 (cadr a))
            (faces2 (cadr b)))
        (while (and (not badpos)
                    (< pos len))
          (unless (equal (nth pos faces1)
                         (nth pos faces2))
              (setq badpos pos))
          ;; (message ">> %d" pos)
          (setq pos (1+ pos)))
        ;; (if badpos
        ;;     (progn
        ;;       (message ":>> faces1 %S" faces1)
        ;;       (message ":>> faces2 %S" faces2)))
        badpos)
    'badtext))

(defun my-try-macro (macro)
  (let ((strings
         ;; (message ">> count = %d, macro = %S" count macro)
         (mapcar
          (lambda (name)
            (delete-region (point-min) (point-max))
            (letf (((symbol-function 'flyspell-word-search-forward)
                    (intern (concat "my-test-forward-" (symbol-name name))))
                   ((symbol-function 'flyspell-word-search-backward)
                    (intern (concat "my-test-backward-" (symbol-name name)))))
              ;; (message ">> pre %S %d" name count)
              (execute-kbd-macro macro)
              ;; (message ">> post %S %d" name count)
              )
            (list (buffer-string)
                  (mapcar (lambda (pos)
                            (get-char-property pos 'face))
                          (number-sequence (point-min) (point-max)))))
          '(orig new))))
    (my-compare-strings-with-properties (car strings) (cadr strings))))

;; It may not reduce to the minimun in one run. It fails at reductions
;; if 2 or more chars should be removed at the same time.
(defun my-reduce (macro)
  (let ((bad (my-try-macro macro))
        (fails 0)
        newmacro)
    (if bad
      (while (< fails 100)
      (let ((pos (random (length macro))))
        (setq newmacro (concat (substring macro 0 pos) (substring macro (1+ pos))))
        ;; (message ">> %S" macro)
        ;; (message ">> %S" newmacro)
        (if (my-try-macro newmacro)
            (progn
              (setq fails 0)
              (setq macro newmacro))
          (setq fails (1+ fails)))))
      (message ":>>   We reduce only faulty macros"))
    macro))

(defun my-reset-new ()
  (defun my-test-backward-new (word bound &optional ignore-case)
    (my-test-backward-agustin-fixed word bound ignore-case))
  (defun my-test-forward-new (word bound)
    (my-test-forward-agustin-fixed word bound)))
(my-reset-new)

(defun my-try-mixed-pairs (macro)
  (if (my-try-macro macro)
      (progn
        (my-reset-new)
        (defun my-test-backward-new (word bound &optional ignore-case)
          (my-test-backward-orig word bound ignore-case))
        (if (my-try-macro macro)
            (message ":>> Difference is from -forward function"))
        (my-reset-new)
        (defun my-test-forward-new (word bound)
          (my-test-forward-orig word bound))
        (if (my-try-macro macro)
            (message ":>> Difference is from -backward function")))
    (message ":>>   We mix pairs only for faulty macros")))

(defun my-fuzz ()
  (interactive)
  (unless (string= (ispell-get-otherchars) "[']")
    (error "Unexpected not-casechars value"))
  (buffer-disable-undo)
  (unwind-protect
      (let ((more t)
            (count 0)
            (time (current-time)))
        (while (and more
                    (< count (if my-macro 1 15)))
          (let* ((macro (or my-macro (my-make-test-macro)))
                 (bad (my-try-macro macro)))
            (setq more (not bad))
            (unless more
              (message ":>> Bad at %S running %S" bad macro)
              (my-try-mixed-pairs macro)
              (message ":>> Reduced macro: %S" (my-reduce macro))))
          (setq count (1+ count)))
        (message ":>> Fuzzing: %d macros are finished in %S"
                 count
                 (subtract-time (current-time) time))
        (message ":>> %s" (if more "Without differences" "There are differences")))
    (buffer-enable-undo))
  nil)
(global-set-key (kbd "C-j") 'my-fuzz)

(split-window-right)
(other-window 1)
(view-echo-area-messages)
(other-window 1)

;; For manual debug

;; (defun flyspell-word-search-backward (word bound &optional ignore-case)
;;   (my-test-backward-agustin-fixed word bound ignore-case))
;; (defun flyspell-word-search-forward (word bound)
;;   (my-test-forward-agustin-fixed word bound))

;; Define non-nil to run only one test with this macro not randomly
(setq my-macro nil)

;; (setq my-macro (kbd "nd SPC and SPC nd C-a"))
;; (setq my-macro (kbd "nd SPC nd C-a"))

;; (setq my-macro (kbd "nd SPC and C-a"))

;; (setq my-macro (kbd "n SPC n C-a"))

(setq my-macro (kbd "nd C-e"))


^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-28 23:11                               ` Aleksey Cherepanov
@ 2014-03-01 10:33                                 ` Aleksey Cherepanov
  2014-03-01 15:50                                   ` Aleksey Cherepanov
  2014-03-01 21:39                                 ` Aleksey Cherepanov
  2014-03-09 17:25                                 ` Agustin Martin
  2 siblings, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-03-01 10:33 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Sat, Mar 01, 2014 at 03:11:41AM +0400, Aleksey Cherepanov wrote:
> Wow! Your 'if' in 'while's condition is very elegant. Nice!
> 
> On Fri, Feb 28, 2014 at 12:45:45PM +0100, Agustin Martin wrote:
> > Please find attached my first candidate for commit. Is similar to what I
> > sent before, but needed to add an explicit check for word at eob in
> > `flyspell-word-search-forward'. 
> > 
> > Will try to have more testing before committing. Seems to work well with the
> > file generated by your one-liner, even with corner cases like new
> > misspellings added at bob or eob, but the wider the testing the better.
> 
> I've wrote a small fuzzer. It is in attach. To run it:
> $ LANG=C emacs -Q --eval '(load-file "t2.el")'
> Then C-j to start. It modifies buffer you are in.

There is a mistake in my-try-mixed-pairs, fixed version is attached.

> (kbd "nd SPC and SPC nd C-b") fails to highlight the second "nd" as
> duplicate. It is a problem with bound equal to (length word) in
> -backward function. I did not check it when I wrote it.
> > +			 (search-forward word (length word) t))))
>                          (search-forward word (1+ (length word)) t))))

(1+ ...) is wrong, it should be similar to -forward: (+ (point-min)
...) because (point-min) is not always 1 (narrowing could change
this).

BTW flyspell does not escape restrictions/narrowing when it searches
for duplicate. Would not it be more convenient to widen before search?

Like
(save-restriction
  (widen)
  ... search ...

> One "nd" is colored as duplicate due to -backward function after that
> fix. I did not touch it yet because it is a time for a break for me.

Thanks!

-- 
Regards,
Aleksey Cherepanov

[-- Attachment #2: t2.el --]
[-- Type: text/plain, Size: 11071 bytes --]

(require 'cl)
(require 'flyspell)

(setq my-fuzzer-buffer-name "*temp for fuzzer*")
(switch-to-buffer my-fuzzer-buffer-name)
(unless (= (point-min) (point-max))
  (error "Could not operate on non-empty buffer"))

(flyspell-mode 1)

(random t)

;; Orig
(defun my-test-backward-orig (word bound &optional ignore-case)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-backward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))
(defun my-test-forward-orig (word bound)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-forward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Agustin Martin
(defun my-test-backward-agustin (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (length word) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin (word bound)
  (save-excursion
    (let* ((r '())
          (inhibit-point-motion-hooks t)
          (word-end (nth 2 (flyspell-get-word)))
          (flyspell-not-casechars (flyspell-get-not-casechars))
          (word-re (concat flyspell-not-casechars
                           (regexp-quote word)
                           flyspell-not-casechars))
          p)
      (while
         (and (not r)
              (setq p (if (= word-end (point-max))
                          nil ;; Current word is at e-o-b. No forward search
                        (if (re-search-forward word-re bound t)
                            ;; word-re match ends one char after word
                            (progn (backward-char) (point))
                          ;; Check above does not match similar word at e-o-b
                          (goto-char (point-max))
                          (search-backward word (- (point-max)
                                                   (length word)) t)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Fixed
(defun my-test-backward-agustin-fixed (word bound &optional ignore-case)
  ;; (my-test-backward-agustin word bound ignore-case))
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (+ (point-min)
                                                 (length word)) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin-fixed (word bound)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat flyspell-not-casechars
                            (regexp-quote word)
                            flyspell-not-casechars))
           p)
      (flyspell-get-word)
      (while
          (and (not r)
               (setq p (if (eobp)
                           nil ;; Current word is at e-o-b. No forward search
                         (if (re-search-forward word-re bound t)
                             ;; word-re match ends one char after word
                             (progn (backward-char) (point))
                           ;; Check above does not match similar word at e-o-b
                           (goto-char (point-max))
                           (and (search-backward word (- (point-max)
                                                         (length word)) t)
                                (goto-char (point-max)))))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

(defun my-make-test-macro ()
  (let* ((good "met")
         (sep "SPC")
         (bad "nd")
         (oc "'")
         (bol "C-a")
         ;; not really eol but enough
         (eol "C-e")
         (parts (list good sep bad oc bol eol))
         (len (length parts)))
    (eval `(kbd ,(mapconcat (lambda (a)
                              (nth (random len) parts))
                            (make-list (1+ (random 100)) 0)
                            " ")))))

;; nil if everythings is equal,
;; 'badtext if text is not equal,
;; position is the first position with different properties.
(defun my-compare-strings-with-properties (a b)
  (if (string= (car a) (car b))
      (let ((len (length (car a)))
            (pos 0)
            (badpos nil)
            (faces1 (cadr a))
            (faces2 (cadr b)))
        (while (and (not badpos)
                    (< pos len))
          (unless (equal (nth pos faces1)
                         (nth pos faces2))
              (setq badpos pos))
          ;; (message ">> %d" pos)
          (setq pos (1+ pos)))
        ;; (if badpos
        ;;     (progn
        ;;       (message ":>> faces1 %S" faces1)
        ;;       (message ":>> faces2 %S" faces2)))
        badpos)
    'badtext))

(defun my-try-macro (macro)
  (let ((strings
         ;; (message ">> count = %d, macro = %S" count macro)
         (mapcar
          (lambda (name)
            (delete-region (point-min) (point-max))
            (letf (((symbol-function 'flyspell-word-search-forward)
                    (intern (concat "my-test-forward-" (symbol-name name))))
                   ((symbol-function 'flyspell-word-search-backward)
                    (intern (concat "my-test-backward-" (symbol-name name)))))
              ;; (message ">> pre %S %d" name count)
              (execute-kbd-macro macro)
              ;; (message ">> post %S %d" name count)
              )
            (list (buffer-string)
                  (mapcar (lambda (pos)
                            (get-char-property pos 'face))
                          (number-sequence (point-min) (point-max)))))
          '(orig new))))
    (my-compare-strings-with-properties (car strings) (cadr strings))))

;; It may not reduce to the minimun in one run. It fails at reductions
;; if 2 or more chars should be removed at the same time.
(defun my-reduce (macro)
  (let ((bad (my-try-macro macro))
        (fails 0)
        newmacro)
    (if bad
      (while (< fails 100)
      (let ((pos (random (length macro))))
        (setq newmacro (concat (substring macro 0 pos) (substring macro (1+ pos))))
        ;; (message ">> %S" macro)
        ;; (message ">> %S" newmacro)
        (if (my-try-macro newmacro)
            (progn
              (setq fails 0)
              (setq macro newmacro))
          (setq fails (1+ fails)))))
      (message ":>>   We reduce only faulty macros"))
    macro))

;; Change this to use other functions instead of -agustin-fixed
(defun my-reset-new ()
  (defun my-test-backw+ard-new (word bound &optional ignore-case)
    (my-test-backward-agustin-fixed word bound ignore-case))
  (defun my-test-forward-new (word bound)
    (my-test-forward-agustin-fixed word bound)))
(my-reset-new)

(defun my-try-mixed-pairs (macro)
  (unwind-protect
      (if (my-try-macro macro)
          (progn
            (my-reset-new)
            (defun my-test-backward-new (word bound &optional ignore-case)
              (my-test-backward-orig word bound ignore-case))
            (if (my-try-macro macro)
                (message ":>> Difference is from -forward function"))
            (my-reset-new)
            (defun my-test-forward-new (word bound)
              (my-test-forward-orig word bound))
            (if (my-try-macro macro)
                (message ":>> Difference is from -backward function")))
        (message ":>>   We mix pairs only for faulty macros"))
    (my-reset-new)))

(defun my-fuzz ()
  (interactive)
  (unless (string= (ispell-get-otherchars) "[']")
    (error "Unexpected not-casechars value"))
  (buffer-disable-undo)
  (unwind-protect
      (let ((more t)
            (count 0)
            (time (current-time)))
        (while (and more
                    (< count (if my-macro 1 15)))
          (let* ((macro (or my-macro (my-make-test-macro)))
                 (bad (my-try-macro macro)))
            (setq more (not bad))
            (unless more
              (message ":>> Bad at %S running %S" bad macro)
              (my-try-mixed-pairs macro)
              (message ":>> Reduced macro: %S" (my-reduce macro))))
          (setq count (1+ count)))
        (message ":>> Fuzzing: %d macros are finished in %S"
                 count
                 (subtract-time (current-time) time))
        (message ":>> %s" (if more "Without differences" "There are differences")))
    (buffer-enable-undo))
  nil)
(global-set-key (kbd "C-j") 'my-fuzz)

(split-window-right)
(other-window 1)
(view-echo-area-messages)
(other-window 1)

;; For manual debug

;; (defun flyspell-word-search-backward (word bound &optional ignore-case)
;;   (my-test-backward-agustin-fixed word bound ignore-case))
;; (defun flyspell-word-search-forward (word bound)
;;   (my-test-forward-agustin-fixed word bound))

;; Define non-nil to run only one test with this macro not randomly
(setq my-macro nil)

;; (setq my-macro (kbd "nd SPC and SPC nd C-a"))
;; (setq my-macro (kbd "nd SPC nd C-a"))

;; (setq my-macro (kbd "nd SPC and C-a"))

(setq my-macro (kbd "n SPC n C-a"))

;; (setq my-macro (kbd "nd C-e"))


^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-03-01 10:33                                 ` Aleksey Cherepanov
@ 2014-03-01 15:50                                   ` Aleksey Cherepanov
  0 siblings, 0 replies; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-03-01 15:50 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

On Sat, Mar 01, 2014 at 02:33:05PM +0400, Aleksey Cherepanov wrote:
> ;; Change this to use other functions instead of -agustin-fixed
> (defun my-reset-new ()
>   (defun my-test-backw+ard-new (word bound &optional ignore-case)

Sorry! '+' is here by mistake. It should be
   (defun my-test-backward-new (word bound &optional ignore-case)

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-28 23:11                               ` Aleksey Cherepanov
  2014-03-01 10:33                                 ` Aleksey Cherepanov
@ 2014-03-01 21:39                                 ` Aleksey Cherepanov
  2014-03-09 17:25                                 ` Agustin Martin
  2 siblings, 0 replies; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-03-01 21:39 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

[-- Attachment #1: Type: text/plain, Size: 2420 bytes --]

On Sat, Mar 01, 2014 at 03:11:41AM +0400, Aleksey Cherepanov wrote:
> I've wrote a small fuzzer. It is in attach. To run it:
> $ LANG=C emacs -Q --eval '(load-file "t2.el")'
> Then C-j to start. It modifies buffer you are in.

New version is attached.

M-j tries last macro or macro specified in my-macro variable.
For manual experiments C-o and C-u C-o defines flyspell-word-search-*
as my-test-*-(orig|new). Though I improved output so C-j should be
enough.

> > Hope no one will generate files with words containing something in
> > OTHERCHARS.
> 
> Why?
> 
> Otherchars are not rare as of ' is there for "american" dictionary. So
> even this email contains such words ("while's").
> 
> BTW quite interesting flyspell behaviour could be observed with
> "met'met'and": if you jump back and forth over this word then met'met
> is highlighted when you are at the beginning and met'and is
> highlighted when you are at the end.
> 
> Also "met'met'and met'and" highlights both met'and as mis-spelled (the
> second met'and is not marked as duplicate).

I think original search of "n'n" against "n'n'n'n" finds only
(n'n)'(n'n) but not n'(n'n)'n. Our search marks the first word as
duplicate running (kbd "n'n SPC en'n'n C-a") while original search
does not. What behaviour is preferable? Should the first word of "n'n
en'n'n" be marked as duplicate?

> Are there any variables that could affect search like
> case-fold-search? My fuzzer does not set them but users could have
> them set.

Also my fuzzer does not try bounds for the search. But we will be in
trouble if the search bound is at word bound because we want one more
char. Though we could extend bound by 1 char to solve that.


Now only forward search is enabled in my fuzzer. Setup it at the end
of file as you need.

I've implemented a variant of forward search using regexp. It seems
that forward search does not get slow from the group in regexp. I did
not measured well though. The function is shorter with regexp. Maybe
we should make a correct variant before fast one... %-)

Also forward search works a bit faster in general. So we could try to
implement backward search though forward search.

I've removed
	    (goto-char (1+ p))
to not fail on (kbd "nd SPC d'nd SPC nd SPC met C-a").

At the moment the fuzzer could pass several thousands of tests well.
You need to wait for fails or improve test generator.

Thanks!

-- 
Regards,
Aleksey Cherepanov

[-- Attachment #2: t2.el --]
[-- Type: text/plain, Size: 15532 bytes --]

(require 'cl)
(require 'flyspell)

(setq my-fuzzer-buffer-name "*temp for fuzzer*")
(switch-to-buffer my-fuzzer-buffer-name)
(unless (= (point-min) (point-max))
  (error "Could not operate on non-empty buffer"))

(flyspell-mode 1)

(random t)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Implementations

;; Orig
(defun my-test-backward-orig (word bound &optional ignore-case)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-backward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw)
		   (if ignore-case
		       (string-equal (downcase (car lw)) (downcase word))
		     (string-equal (car lw) word)))
	      (setq r p)
	    (goto-char p))))
      r)))
(defun my-test-forward-orig (word bound)
  (save-excursion
    (let ((r '())
	  (inhibit-point-motion-hooks t)
	  p)
      (while (and (not r) (setq p (search-forward word bound t)))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Agustin Martin
(defun my-test-backward-agustin (word bound &optional ignore-case)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (length word) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin (word bound)
  (save-excursion
    (let* ((r '())
          (inhibit-point-motion-hooks t)
          (word-end (nth 2 (flyspell-get-word)))
          (flyspell-not-casechars (flyspell-get-not-casechars))
          (word-re (concat flyspell-not-casechars
                           (regexp-quote word)
                           flyspell-not-casechars))
          p)
      (while
         (and (not r)
              (setq p (if (= word-end (point-max))
                          nil ;; Current word is at e-o-b. No forward search
                        (if (re-search-forward word-re bound t)
                            ;; word-re match ends one char after word
                            (progn (backward-char) (point))
                          ;; Check above does not match similar word at e-o-b
                          (goto-char (point-max))
                          (search-backward word (- (point-max)
                                                   (length word)) t)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
	    (goto-char (1+ p)))))
      r)))

;; Fixed
(defun my-test-backward-agustin-fixed (word bound &optional ignore-case)
  ;; (my-test-backward-agustin word bound ignore-case))
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat
                     flyspell-not-casechars
                     (regexp-quote word)
                     flyspell-not-casechars))
           p)
      (while
          (and (not r)
               (setq p (if (re-search-backward word-re bound t)
                           (progn (forward-char) (point))
                         ;; Check if word is at bob
                         (goto-char (point-min))
                         (search-forward word (+ (point-min)
                                                 (length word)) t))))
        (let ((lw (flyspell-get-word)))
          (if (and (consp lw)
                   (if ignore-case
                       (string-equal (downcase (car lw)) (downcase word))
                     (string-equal (car lw) word)))
              (setq r p)
            (goto-char p))))
      r)))
(defun my-test-forward-agustin-fixed (word bound)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat flyspell-not-casechars
                            (regexp-quote word)
                            flyspell-not-casechars))
           p)
      (flyspell-get-word)
      (while
          (and (not r)
               (setq p (if (eobp)
                           nil ;; Current word is at e-o-b. No forward search
                         (if (re-search-forward word-re bound t)
                             ;; word-re match ends one char after word
                             (progn (backward-char) (point))
                           ;; Check above does not match similar word at e-o-b
                           (goto-char (point-max))
                           (and (search-backward word (- (point-max)
                                                         (length word)) t)
                                (goto-char (point-max)))))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
            ;; We don't need to move forward due to additional char
            ;; before word in regexp
	    ;; (goto-char (1+ p))
            )))
      r)))

;; With eob in regexp
(defun my-test-forward-eob (word bound)
  (save-excursion
    (let* ((r '())
           (inhibit-point-motion-hooks t)
           (flyspell-not-casechars (flyspell-get-not-casechars))
           (word-re (concat flyspell-not-casechars
                            (regexp-quote word)
                            "\\(?:" flyspell-not-casechars "\\|\\'\\)"))
           p)
      (while
          (and (not r)
               (setq p (and
                        (re-search-forward word-re bound t)
                        (if (eobp)
                            (point)
                          (backward-char)
                          (point)))))
	(let ((lw (flyspell-get-word)))
	  (if (and (consp lw) (string-equal (car lw) word))
	      (setq r p)
            ;; We don't need to move forward due to additional char
            ;; before word in regexp
	    ;; (goto-char (1+ p))
            )))
      r)))

;; End of Implementations
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Fuzzer

(defun my-make-test-macro ()
  (let* ((good "met")
         (sep "SPC")
         (bad "nd")
         (oc "'")
         (bol "C-a")
         ;; not really eol but enough
         (eol "C-e")
         (parts (list good sep bad oc bol eol))
         (len (length parts)))
    (eval `(kbd ,(mapconcat (lambda (a)
                              (nth (random len) parts))
                            (make-list (1+ (random 100)) 0)
                            " ")))))

;; nil if everythings is equal,
;; 'badtext if text is not equal,
;; position is the first position with different properties.
(defun my-compare-strings-with-properties (a b)
  (if (string= (car a) (car b))
      (let ((len (length (car a)))
            (pos 0)
            (badpos nil)
            (faces1 (cadr a))
            (faces2 (cadr b)))
        (while (and (not badpos)
                    (< pos len))
          (unless (equal (nth pos faces1)
                         (nth pos faces2))
              (setq badpos pos))
          ;; (message ">> %d" pos)
          (setq pos (1+ pos)))
        (if my-show-faces
            (if badpos
                (progn
                  (message ":>> faces1 %S" faces1)
                  (message ":>> faces2 %S" faces2))
              (message ":>> No diff")))
        badpos)
    'badtext))

(defun my-make-string-with-faces (a)
  (let ((str (car a))
        (faces (cadr a)))
    (mapcar (lambda (pos)
              (set-text-properties pos
                                   (1+ pos)
                                   `(fontified t font-lock-face ,(nth pos faces))
                                   str))
            (number-sequence 0 (1- (length str))))
    str))
(defun my-make-strings-with-faces (a b)
  (concat "\n:>> orig:"
          (my-make-string-with-faces a)
          "\n:>>  new:"
          (my-make-string-with-faces b)
          "\n"))

(setq my-show-nice-faces nil)

(defun my-try-macro (macro)
  (let ((strings
         ;; (message ">> count = %d, macro = %S" count macro)
         (mapcar
          (lambda (name)
            (delete-region (point-min) (point-max))
            (letf (((symbol-function 'flyspell-word-search-forward)
                    (intern (concat "my-test-forward-" (symbol-name name))))
                   ((symbol-function 'flyspell-word-search-backward)
                    (intern (concat "my-test-backward-" (symbol-name name)))))
              ;; (message ">> pre %S %d" name count)
              (execute-kbd-macro macro)
              ;; (message ">> post %S %d" name count)
              )
            (list (buffer-string)
                  (mapcar (lambda (pos)
                            (get-char-property pos 'face))
                          (number-sequence (point-min) (point-max)))))
          '(orig new))))
    (let ((bad (my-compare-strings-with-properties (car strings) (cadr strings))))
      (if (and bad my-show-nice-faces)
          (with-current-buffer "*Messages*"
            (insert (my-make-strings-with-faces (car strings) (cadr strings)))))
      bad)))

;; It may not reduce to the minimun in one run. It fails at reductions
;; if 2 or more chars should be removed at the same time.
(defun my-reduce (macro)
  (let ((bad (my-try-macro macro))
        (fails 0)
        newmacro)
    (if bad
      (while (< fails 100)
      (let ((pos (random (length macro))))
        (setq newmacro (concat (substring macro 0 pos) (substring macro (1+ pos))))
        ;; (message ">> %S" macro)
        ;; (message ">> %S" newmacro)
        (if (my-try-macro newmacro)
            (progn
              (setq fails 0)
              (setq macro newmacro))
          (setq fails (1+ fails)))))
      (message ":>>   We reduce only faulty macros"))
    macro))

(defun my-try-mixed-pairs (macro)
  (unwind-protect
      (if (my-try-macro macro)
          (progn
            (my-reset-new)
            (defun my-test-backward-new (word bound &optional ignore-case)
              (my-test-backward-orig word bound ignore-case))
            (if (my-try-macro macro)
                (message ":>> Difference is from -forward function"))
            (my-reset-new)
            (defun my-test-forward-new (word bound)
              (my-test-forward-orig word bound))
            (if (my-try-macro macro)
                (message ":>> Difference is from -backward function")))
        (message ":>>   We mix pairs only for faulty macros"))
    (my-reset-new)))

(defun my-fuzz ()
  (interactive)
  (unless (string= (ispell-get-otherchars) "[']")
    (error "Unexpected not-casechars value"))
  (buffer-disable-undo)
  (unwind-protect
      (let ((more t)
            (count 0)
            (update-step 100)
            (time (current-time)))
        (while (and more
                    (< count (if my-macro 1 1000)))
          (let* ((macro (or my-macro (my-make-test-macro)))
                 (bad (let ((my-show-nice-faces t))
                        (my-try-macro macro))))
            (setq more (not bad))
            (unless more
              (if (numberp bad)
                  (message ":>> pos :%s^" (make-string bad ? )))
              (message ":>> Bad at %S running %S" bad macro)
              (my-try-mixed-pairs macro)
              (setq my-macro-last (my-reduce macro))
              (message ":>> Reduced macro: %S" my-macro-last))
            (if (= 0 (% count update-step))
                (message ":>> In progress, count = %d (shows between every %d)" count update-step)))
          (setq count (1+ count)))
        (message ":>> Fuzzing: %d macros are finished in %S"
                 count
                 (subtract-time (current-time) time))
        (message ":>> %s" (if more "Without differences" "There are differences")))
    (buffer-enable-undo))
  nil)
(global-set-key (kbd "C-j") 'my-fuzz)

;; use -orig with prefix arg,
;; use -new without prefix arg
(defun my-choose-flyspell-funcs (arg)
  (interactive "P")
  (if arg
      (progn
        (defun flyspell-word-search-backward (word bound &optional ignore-case)
          (my-test-backward-orig word bound ignore-case))
        (defun flyspell-word-search-forward (word bound)
          (my-test-forward-orig word bound))
        (message ">> Using orig"))
    (defun flyspell-word-search-backward (word bound &optional ignore-case)
      (my-test-backward-new word bound ignore-case))
    (defun flyspell-word-search-forward (word bound)
      (my-test-forward-new word bound))
    (message ">> Using new")))
(global-set-key (kbd "C-o") 'my-choose-flyspell-funcs)

(setq my-macro-last nil)

(setq my-show-faces nil)
(defun my-show-faces-func ()
  (interactive)
  (let ((macro (or my-macro my-macro-last)))
    (if macro
        (let ((my-show-faces t))
          (my-try-macro macro))
      (error "No macro specified"))))
(global-set-key (kbd "M-j") 'my-show-faces-func)

(split-window-right)
(other-window 1)
(view-echo-area-messages)
(other-window 1)

;; For manual debug

;; (defun flyspell-word-search-backward (word bound &optional ignore-case)
;;   (my-test-backward-agustin-fixed word bound ignore-case))
;; (defun flyspell-word-search-forward (word bound)
;;   (my-test-forward-agustin-fixed word bound))


;; Change this to use other functions instead of -agustin-fixed
(defun my-reset-new ()
  (defun my-test-backward-new (word bound &optional ignore-case)
    ;; (my-test-backward-agustin-fixed word bound ignore-case))
    (my-test-backward-orig word bound ignore-case))
  (defun my-test-forward-new (word bound)
    ;; (my-test-forward-agustin-fixed word bound)))
    ;; (my-test-forward-agustin word bound)))
    (my-test-forward-eob word bound)))
(my-reset-new)


;; Define non-nil to run only one test with this macro not randomly
(setq my-macro nil)

;; (setq my-macro (kbd "nd SPC and SPC nd C-a"))
;; (setq my-macro (kbd "nd SPC nd C-a"))

;; (setq my-macro (kbd "nd SPC and C-a"))

;; (setq my-macro (kbd "n SPC n C-a"))

;; (setq my-macro (kbd "nd C-e"))

;; (setq my-macro "'nd end'nd'nd\x01nd\x01")
;; (setq my-macro "'n en'n'n\x01n\x01")

(setq my-macro
      ;; "'nd end'nd'nd\x01nd\x01"
      ;; "'n en'n'n\x01n\x01"
      ;; "n'n en'n'n\x01"
      "a n'n en'n'n\x01"
      ;; "n'n n'n'n\x01"
      ;; "n'n n'n'n\x02\x02\x02\x02\x02\x02\x02\x02"
      ;; "d'nd\x01ndmet \x05met ndmet\x01"
      ;; "d'n\x01nd \x05d nd\x01"
      ;; "nd d'nd nd\x01"
      ;; "nd d'nd nd met\x01"
      )


^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-28 11:51                               ` Eli Zaretskii
@ 2014-03-01 21:44                                 ` Aleksey Cherepanov
  2014-03-02  3:56                                   ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-03-01 21:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 16800, Agustin Martin

On Fri, Feb 28, 2014 at 01:51:41PM +0200, Eli Zaretskii wrote:
> Thanks to both of you, but I still think that having flyspell search
> without limits for duplicate mis-spellings is not a good idea.  We
> have no control on how big user buffers could be.
> 
> So I think we should limit that search by default.

May be.

We could try to limit execution time. Though checks for that could
slow down the search.

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-03-01 21:44                                 ` Aleksey Cherepanov
@ 2014-03-02  3:56                                   ` Eli Zaretskii
  2014-03-09 17:36                                     ` Agustin Martin
  0 siblings, 1 reply; 33+ messages in thread
From: Eli Zaretskii @ 2014-03-02  3:56 UTC (permalink / raw)
  To: Aleksey Cherepanov; +Cc: 16800, agustin.martin

> Date: Sun, 2 Mar 2014 01:44:27 +0400
> From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> Cc: Agustin Martin <agustin.martin@hispalinux.es>, 16800@debbugs.gnu.org
> 
> We could try to limit execution time. Though checks for that could
> slow down the search.

Exactly.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-02-28 23:11                               ` Aleksey Cherepanov
  2014-03-01 10:33                                 ` Aleksey Cherepanov
  2014-03-01 21:39                                 ` Aleksey Cherepanov
@ 2014-03-09 17:25                                 ` Agustin Martin
  2015-03-06 21:46                                   ` Agustin Martin
  2 siblings, 1 reply; 33+ messages in thread
From: Agustin Martin @ 2014-03-09 17:25 UTC (permalink / raw)
  To: Aleksey Cherepanov, 16800

[-- Attachment #1: Type: text/plain, Size: 5304 bytes --]

2014-03-01 0:11 GMT+01:00 Aleksey Cherepanov <aleksey.4erepanov@gmail.com>:

> On Fri, Feb 28, 2014 at 12:45:45PM +0100, Agustin Martin wrote:
> >
> > Please find attached my first candidate for commit. Is similar to what I
> > sent before, but needed to add an explicit check for word at eob in
> > `flyspell-word-search-forward'.
> >
> > Will try to have more testing before committing. Seems to work well with
> the
> > file generated by your one-liner, even with corner cases like new
> > misspellings added at bob or eob, but the wider the testing the better.
>
> I've wrote a small fuzzer. It is in attach. To run it:
> $ LANG=C emacs -Q --eval '(load-file "t2.el")'
> Then C-j to start. It modifies buffer you are in.
>
> Your -forward function gets stuck. (kbd "nd SPC and C-a") could repeat
> it. my-test-forward-agustin-fixed contains fix. It incorporates
> simplified word-end logic: we slip forward using flyspell-get-word,
> then we check eobp. Though I did not understand why -backward does not
> need a similar fix and I got the answer: my mistake with (length word)
> did not allow one word to be marked as duplicate.
>

Hi,

Sorry for the delay, I am rather busy lately.

The problem is that final condition can cause an endless loop. I'd prefer
to use a flag to make sure that test is passed only once as much, see below


> (if condition nil ...) could be replaced with (unless condition ...)
> but I do not know what one is more readable.
>

While there was `nil' in the version I attached, the actual code in my
personal debug code was not exactly `nil' but something like

(progn (message "word %s is at eob" word) nil)

Together with being more readable, that is why the structure was left
there. I have to thing about this, but with the full word-re test we may
even get rid of that test and all the flyspell.-get-word call, but I need
some time to test all that. If still needed (unless ...) may be used in the
final version.


> (kbd "nd SPC and SPC nd C-b") fails to highlight the second "nd" as
> duplicate. It is a problem with bound equal to (length word) in
> -backward function. I did not check it when I wrote it.
> > +                      (search-forward word (length word) t))))
>                          (search-forward word (1+ (length word)) t))))


One "nd" is colored as duplicate due to -backward function after that
> fix. I did not touch it yet because it is a time for a break for me.
>

FIxed in my copy. Later noticed that backward function has some other
glitches, like flyspell-get-word looking at wrong word. Fixed in my copy,
but I will look at it later.

> Hope no one will generate files with words containing something in
>  > OTHERCHARS.
>
> Why?
>

Sorry for not being clear, I was thinking about a really rare corner case
current code may not handle well

Write 'nd-et' some thousand times and then write 'et' in a language where
'nd' and 'et' are misspellings and '-' is part of otherchars. I'd expect to
have a lot of false positives in search.


> BTW quite interesting flyspell behaviour could be observed with
> "met'met'and": if you jump back and forth over this word then met'met
> is highlighted when you are at the beginning and met'and is
> highlighted when you are at the end.
>
> Also "met'met'and met'and" highlights both met'and as mis-spelled (the
> second met'and is not marked as duplicate).
>

Funny :-)

I do not expect to have time to check everything shortly. Also, note that
Emacs 24.4 is on the way, so trunk is frozen for anything potentially
problematic. Unless I consider the code absolutely rock solid, I'd prefer
to wait until 24.4 is released.

Sorry I could not yet play with your fuzzer, will try next week. Thanks for
all the work you are putting here.

This is what I am using to break the loop in the forward function (see
`keep'). min limit in last condition should also be adjusted to bound if
not unlimited.

;; --------------------------- 8<
----------------------------------------------------------
(defun flyspell-word-search-forward (word bound)
  (save-excursion
    (let* ((r '())
       (inhibit-point-motion-hooks t)
       (word-end (nth 2 (flyspell-get-word)))
       (flyspell-not-casechars (flyspell-get-not-casechars))
       (word-re (concat flyspell-not-casechars
                (regexp-quote word)
                flyspell-not-casechars))
       (keep t) ;; Control flag to exit loop below after check word at eob.
       p)
      (while
      (and (not r)
           keep
           (setq p (if (= word-end (point-max))
               nil ;; Current word is at e-o-b. No forward search
             (if (re-search-forward word-re bound t)
                 ;; word-re match ends one char after word
                 (progn (backward-char) (point) )
               ;; Check above does not match similar word at e-o-b
               (goto-char (point-max))
               (setq keep nil) ;; Ensure this is last iteration
               (search-backward word (- (point-max)
                            (length word)) t)))))
    (let ((lw (flyspell-get-word)))
      (if (and (consp lw) (string-equal (car lw) word))
          (setq r p)
        (goto-char (1+ p)))))
      r)))
;; --------------------------- 8<
----------------------------------------------------------

Regards,

-- 
Agustin

[-- Attachment #2: Type: text/html, Size: 7231 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-03-02  3:56                                   ` Eli Zaretskii
@ 2014-03-09 17:36                                     ` Agustin Martin
  2014-03-09 18:02                                       ` Aleksey Cherepanov
  0 siblings, 1 reply; 33+ messages in thread
From: Agustin Martin @ 2014-03-09 17:36 UTC (permalink / raw)
  To: Aleksey Cherepanov, 16800

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

2014-03-02 4:56 GMT+01:00 Eli Zaretskii <eliz@gnu.org>:

> > Date: Sun, 2 Mar 2014 01:44:27 +0400
> > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> > Cc: Agustin Martin <agustin.martin@hispalinux.es>, 16800@debbugs.gnu.org
> >
> > We could try to limit execution time. Though checks for that could
> > slow down the search.
>
> Exactly.
>

I see other problem with that, it is not deterministic, since the limit
depends on system load.

I have mixed feelings about changing current default from unlimited, but I
slowly changing my mind towards having  a big but not unlimited value as
default.

On the one hand, not putting limits in default value looks nicer, but on
the other that may have a non negligible impact in performance for really
huge files, as Eli points out.

Alexey's one-liner is 30000 lines and 2.4e6 chars size. While new code
seems to work for it, I'd put the limit somewhere lower, no more than 1e6.
This should be huge enough for any practical use and for anyone to notice
the difference.

Regards,

-- 
Agustin

[-- Attachment #2: Type: text/html, Size: 1692 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-03-09 17:36                                     ` Agustin Martin
@ 2014-03-09 18:02                                       ` Aleksey Cherepanov
  2014-03-09 18:24                                         ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Aleksey Cherepanov @ 2014-03-09 18:02 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800

On Sun, Mar 09, 2014 at 06:36:55PM +0100, Agustin Martin wrote:
> 2014-03-02 4:56 GMT+01:00 Eli Zaretskii <eliz@gnu.org>:
> 
> > > Date: Sun, 2 Mar 2014 01:44:27 +0400
> > > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> > > Cc: Agustin Martin <agustin.martin@hispalinux.es>, 16800@debbugs.gnu.org
> > >
> > > We could try to limit execution time. Though checks for that could
> > > slow down the search.
> >
> > Exactly.
> >
> 
> I see other problem with that, it is not deterministic, since the limit
> depends on system load.

We may have limit by time with additional limit for lowest radius so
the search works not less than for N chars and if it looks further
then not more than M milliseconds.

Thanks!

-- 
Regards,
Aleksey Cherepanov





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-03-09 18:02                                       ` Aleksey Cherepanov
@ 2014-03-09 18:24                                         ` Eli Zaretskii
  0 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2014-03-09 18:24 UTC (permalink / raw)
  To: Aleksey Cherepanov; +Cc: 16800

> Date: Sun, 9 Mar 2014 22:02:44 +0400
> From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> Cc: 16800@debbugs.gnu.org
> 
> On Sun, Mar 09, 2014 at 06:36:55PM +0100, Agustin Martin wrote:
> > 2014-03-02 4:56 GMT+01:00 Eli Zaretskii <eliz@gnu.org>:
> > 
> > > > Date: Sun, 2 Mar 2014 01:44:27 +0400
> > > > From: Aleksey Cherepanov <aleksey.4erepanov@gmail.com>
> > > > Cc: Agustin Martin <agustin.martin@hispalinux.es>, 16800@debbugs.gnu.org
> > > >
> > > > We could try to limit execution time. Though checks for that could
> > > > slow down the search.
> > >
> > > Exactly.
> > >
> > 
> > I see other problem with that, it is not deterministic, since the limit
> > depends on system load.
> 
> We may have limit by time with additional limit for lowest radius so
> the search works not less than for N chars and if it looks further
> then not more than M milliseconds.

FWIW, I think this would be over-engineering.  Emacs never does
anything like that, we always have simple, deterministic limits in
terms of characters or lines.  This makes it easy for users to
customize their sessions in a way whose effect is simple to understand
and predictable.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2014-03-09 17:25                                 ` Agustin Martin
@ 2015-03-06 21:46                                   ` Agustin Martin
  2015-03-07  8:09                                     ` Eli Zaretskii
  0 siblings, 1 reply; 33+ messages in thread
From: Agustin Martin @ 2015-03-06 21:46 UTC (permalink / raw)
  To: Aleksey Cherepanov, 16800-done

[-- Attachment #1: Type: text/plain, Size: 541 bytes --]

Version: 24.5

2014-03-09 18:25 GMT+01:00 Agustin Martin <agustimartin@gmail.com>:

>
> I do not expect to have time to check everything shortly. Also, note that
> Emacs 24.4 is on the way, so trunk is frozen for anything potentially
> problematic. Unless I consider the code absolutely rock solid, I'd prefer
> to wait until 24.4 is released.
>

Hi,

emacs24 was released, fix committed to emacs-24 branch and merged into
trunk.

It is time to close this bug report.

Thanks for your contribution to the project. Best regards,

-- 
Agustin

[-- Attachment #2: Type: text/html, Size: 1082 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* bug#16800: 24.3; flyspell works slow on very short words at the end of big file
  2015-03-06 21:46                                   ` Agustin Martin
@ 2015-03-07  8:09                                     ` Eli Zaretskii
  0 siblings, 0 replies; 33+ messages in thread
From: Eli Zaretskii @ 2015-03-07  8:09 UTC (permalink / raw)
  To: Agustin Martin; +Cc: 16800, aleksey.4erepanov, agustin6martin

> Date: Fri, 6 Mar 2015 22:46:59 +0100
> From: Agustin Martin <agustin6martin@gmail.com>
> 
> emacs24 was released, fix committed to emacs-24 branch and merged into trunk.

Thanks.  But since we are currently pretesting 24.5, please avoid
committing to the emacs-24 branch changes that do not fix clear
regressions in 24.4.





^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-03-07  8:09 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-18 20:56 bug#16800: 24.3; flyspell works slow on very short words at the end of big file Aleksey Cherepanov
2014-02-21 10:15 ` Eli Zaretskii
2014-02-21 14:38   ` Agustin Martin
2014-02-21 15:12     ` Eli Zaretskii
2014-02-21 15:21       ` Eli Zaretskii
2014-02-22 12:44       ` Aleksey Cherepanov
2014-02-22 13:10         ` Eli Zaretskii
2014-02-22 16:02           ` Aleksey Cherepanov
2014-02-22 16:41             ` Eli Zaretskii
2014-02-22 18:55               ` Aleksey Cherepanov
2014-02-22 20:16                 ` Aleksey Cherepanov
2014-02-22 21:03                 ` Eli Zaretskii
2014-02-23  1:26                   ` Agustin Martin
2014-02-23 18:36                     ` Eli Zaretskii
2014-02-23 19:56                     ` Aleksey Cherepanov
2014-02-23 23:02                       ` Aleksey Cherepanov
2014-02-24 16:03                         ` Aleksey Cherepanov
2014-02-26 20:32                           ` Agustin Martin
2014-02-28 11:45                             ` Agustin Martin
2014-02-28 11:51                               ` Eli Zaretskii
2014-03-01 21:44                                 ` Aleksey Cherepanov
2014-03-02  3:56                                   ` Eli Zaretskii
2014-03-09 17:36                                     ` Agustin Martin
2014-03-09 18:02                                       ` Aleksey Cherepanov
2014-03-09 18:24                                         ` Eli Zaretskii
2014-02-28 23:11                               ` Aleksey Cherepanov
2014-03-01 10:33                                 ` Aleksey Cherepanov
2014-03-01 15:50                                   ` Aleksey Cherepanov
2014-03-01 21:39                                 ` Aleksey Cherepanov
2014-03-09 17:25                                 ` Agustin Martin
2015-03-06 21:46                                   ` Agustin Martin
2015-03-07  8:09                                     ` Eli Zaretskii
2014-02-23 20:39                     ` Aleksey Cherepanov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).