unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
@ 2009-09-21 22:24 ` Kevin Ryde
  2009-09-22 21:40   ` Stefan Monnier
  2009-09-23 23:15   ` bug#4511: marked as done (23.1; flyspell-mode slow editing near end of big html file) Emacs bug Tracking System
  0 siblings, 2 replies; 10+ messages in thread
From: Kevin Ryde @ 2009-09-21 22:24 UTC (permalink / raw)
  To: bug-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1651 bytes --]

When flyspell-mode is enabled in a big html file, and point is somewhere
near the end of the buffer, typing text or moving point with C-f and C-b
become sluggish, to the point of being nearly unusable.

(This is a regression from emacs 22, where flyspell-mode was fine on
such files.)

I expect "big file" is relative to cpu speed, but 300 kbytes is bad on
my slow pc (not an outrageously huge file).  To reproduce try this of
about 600 kbytes,

    (progn
      (switch-to-buffer "foo")
      (dotimes (i 50000) (insert (format "<p> abc def\n" i)))

      (html-mode)
      (flyspell-mode))

It takes a few seconds to create the buffer, but of course that's not
the bug.  The bad bit is if you move point around with C-f / C-b near
the end of the buffer, or type some plain text there outside of a <tag>,
where it's sluggish between keystrokes.  (Try upping the 50000 on a fast
cpu if necessary.)


I track the slowness to where `sgml-mode-flyspell-verify' does

    (looking-back "<[^>\n]*")

I take it this func is asking whether point is within a <tag> or not.
Does that regexp end up asking re-search-backward to consider every "<"
in the buffer or something, before deciding no match is possible?

I find it hugely faster to do an old fashioned skip-chars-backward as
below -- assuming I'm not mistaken that the "\n" in the existing
`looking-back' is supposed mean examining no more than the current line.

2009-09-21  Kevin Ryde  <user42@zip.com.au>

	* textmodes/flyspell.el (sgml-mode-flyspell-verify): Use
	skip-chars-backward instead of looking-back, to avoid a very slow
	regexp match when far into a big buffer with a lots of "<" chars.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: flyspell.el.sgml-verify.diff --]
[-- Type: text/x-diff, Size: 542 bytes --]

--- flyspell.el.~1.146.~	2009-09-18 08:23:13.000000000 +1000
+++ flyspell.el	2009-09-21 16:36:12.000000000 +1000
@@ -363,7 +363,9 @@
   "Function used for `flyspell-generic-check-word-predicate' in SGML mode."
   (not (save-excursion
 	 (or (looking-at "[^<\n]*>")
-	     (ispell-looking-back "<[^>\n]*")
+	     (save-excursion
+	       (skip-chars-backward "^<>\n")   ;; \n only look at current line
+	       (not (equal ?< (char-before)))) ;; "<" if in a tag
 	     (and (looking-at "[^&\n]*;")
 		  (ispell-looking-back "&[^;\n]*"))))))
 

[-- Attachment #3: Type: text/plain, Size: 1078 bytes --]





In GNU Emacs 23.1.1 (i486-pc-linux-gnu, GTK+ Version 2.16.5)
 of 2009-08-03 on raven, modified by Debian
configured using `configure  '--build=i486-linux-gnu' '--host=i486-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var/lib' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs23:/etc/emacs:/usr/local/share/emacs/23.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/23.1/site-lisp:/usr/share/emacs/site-lisp:/usr/share/emacs/23.1/leim' '--with-x=yes' '--with-x-toolkit=gtk' '--with-toolkit-scroll-bars' 'build_alias=i486-linux-gnu' 'host_alias=i486-linux-gnu' 'CFLAGS=-DDEBIAN -g -O2' 'LDFLAGS=-g' 'CPPFLAGS=''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_AU
  value of $XMODIFIERS: nil
  locale-coding-system: iso-latin-1-unix
  default-enable-multibyte-characters: t

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
  2009-09-21 22:24 ` bug#4511: 23.1; flyspell-mode slow editing near end of big html file Kevin Ryde
@ 2009-09-22 21:40   ` Stefan Monnier
  2009-09-23  0:56     ` Kevin Ryde
  2009-09-23 23:15   ` bug#4511: marked as done (23.1; flyspell-mode slow editing near end of big html file) Emacs bug Tracking System
  1 sibling, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2009-09-22 21:40 UTC (permalink / raw)
  To: Kevin Ryde; +Cc: bug-gnu-emacs, 4511

> I track the slowness to where `sgml-mode-flyspell-verify' does

>     (looking-back "<[^>\n]*")

> I take it this func is asking whether point is within a <tag> or not.
> Does that regexp end up asking re-search-backward to consider every "<"
> in the buffer or something, before deciding no match is possible?

Yes, looking-back is a dog.  You need to pass it a `limit' argument.
I think the `limit' argument should be mandatory.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
  2009-09-22 21:40   ` Stefan Monnier
@ 2009-09-23  0:56     ` Kevin Ryde
  2009-09-23  3:13       ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Ryde @ 2009-09-23  0:56 UTC (permalink / raw)
  To: 4511; +Cc: Stefan Monnier

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>
> You need to pass it a `limit' argument.

I thought about that a bit.  The limit would be the immediately
preceding "<", ">", or "\n", since whichever of them is hit first
answers whether you're in a tag or not.

There'd be no need for a separate limit calculation if a regexp could be
cooked up to stop on the first of those three.  I suppose it'd be along
the lines of (untested) ...

     (and (looking-back "\\([<>\n]\\)[^<>\n]*?")
          (equal "<" (match-string 1)))

but `skip-chars-backward' seems clearer to me, and might be a couple of
nanoseconds quicker too in fact.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
  2009-09-23  0:56     ` Kevin Ryde
@ 2009-09-23  3:13       ` Stefan Monnier
  2009-10-16 21:57         ` Kevin Ryde
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2009-09-23  3:13 UTC (permalink / raw)
  To: Kevin Ryde; +Cc: 4511

>> You need to pass it a `limit' argument.
> I thought about that a bit.  The limit would be the immediately
> preceding "<", ">", or "\n", since whichever of them is hit first
> answers whether you're in a tag or not.

(line-beginning-position) will do fine.

> There'd be no need for a separate limit calculation if a regexp could be
> cooked up to stop on the first of those three.

The given regexp is actually plenty, in this respect.  It's just that
looking-back is a dog and doesn't make good use of the regexp.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: marked as done (23.1; flyspell-mode slow editing near end of big html file)
  2009-09-21 22:24 ` bug#4511: 23.1; flyspell-mode slow editing near end of big html file Kevin Ryde
  2009-09-22 21:40   ` Stefan Monnier
@ 2009-09-23 23:15   ` Emacs bug Tracking System
  1 sibling, 0 replies; 10+ messages in thread
From: Emacs bug Tracking System @ 2009-09-23 23:15 UTC (permalink / raw)
  To: Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

Your message dated Wed, 23 Sep 2009 19:06:04 -0400
with message-id <jwvhbuttg6s.fsf-monnier+emacsbugreports@gnu.org>
and subject line Re: bug#4511: 23.1; flyspell-mode slow editing near end of big html file
has caused the Emacs bug report #4511,
regarding 23.1; flyspell-mode slow editing near end of big html file
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@emacsbugs.donarmstrong.com
immediately.)


-- 
4511: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=4511
Emacs Bug Tracking System
Contact owner@emacsbugs.donarmstrong.com with problems

[-- Attachment #2: Type: message/rfc822, Size: 5796 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 1651 bytes --]

When flyspell-mode is enabled in a big html file, and point is somewhere
near the end of the buffer, typing text or moving point with C-f and C-b
become sluggish, to the point of being nearly unusable.

(This is a regression from emacs 22, where flyspell-mode was fine on
such files.)

I expect "big file" is relative to cpu speed, but 300 kbytes is bad on
my slow pc (not an outrageously huge file).  To reproduce try this of
about 600 kbytes,

    (progn
      (switch-to-buffer "foo")
      (dotimes (i 50000) (insert (format "<p> abc def\n" i)))

      (html-mode)
      (flyspell-mode))

It takes a few seconds to create the buffer, but of course that's not
the bug.  The bad bit is if you move point around with C-f / C-b near
the end of the buffer, or type some plain text there outside of a <tag>,
where it's sluggish between keystrokes.  (Try upping the 50000 on a fast
cpu if necessary.)


I track the slowness to where `sgml-mode-flyspell-verify' does

    (looking-back "<[^>\n]*")

I take it this func is asking whether point is within a <tag> or not.
Does that regexp end up asking re-search-backward to consider every "<"
in the buffer or something, before deciding no match is possible?

I find it hugely faster to do an old fashioned skip-chars-backward as
below -- assuming I'm not mistaken that the "\n" in the existing
`looking-back' is supposed mean examining no more than the current line.

2009-09-21  Kevin Ryde  <user42@zip.com.au>

	* textmodes/flyspell.el (sgml-mode-flyspell-verify): Use
	skip-chars-backward instead of looking-back, to avoid a very slow
	regexp match when far into a big buffer with a lots of "<" chars.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2.1.2: flyspell.el.sgml-verify.diff --]
[-- Type: text/x-diff, Size: 542 bytes --]

--- flyspell.el.~1.146.~	2009-09-18 08:23:13.000000000 +1000
+++ flyspell.el	2009-09-21 16:36:12.000000000 +1000
@@ -363,7 +363,9 @@
   "Function used for `flyspell-generic-check-word-predicate' in SGML mode."
   (not (save-excursion
 	 (or (looking-at "[^<\n]*>")
-	     (ispell-looking-back "<[^>\n]*")
+	     (save-excursion
+	       (skip-chars-backward "^<>\n")   ;; \n only look at current line
+	       (not (equal ?< (char-before)))) ;; "<" if in a tag
 	     (and (looking-at "[^&\n]*;")
 		  (ispell-looking-back "&[^;\n]*"))))))
 

[-- Attachment #2.1.3: Type: text/plain, Size: 1078 bytes --]





In GNU Emacs 23.1.1 (i486-pc-linux-gnu, GTK+ Version 2.16.5)
 of 2009-08-03 on raven, modified by Debian
configured using `configure  '--build=i486-linux-gnu' '--host=i486-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var/lib' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs23:/etc/emacs:/usr/local/share/emacs/23.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/23.1/site-lisp:/usr/share/emacs/site-lisp:/usr/share/emacs/23.1/leim' '--with-x=yes' '--with-x-toolkit=gtk' '--with-toolkit-scroll-bars' 'build_alias=i486-linux-gnu' 'host_alias=i486-linux-gnu' 'CFLAGS=-DDEBIAN -g -O2' 'LDFLAGS=-g' 'CPPFLAGS=''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_AU
  value of $XMODIFIERS: nil
  locale-coding-system: iso-latin-1-unix
  default-enable-multibyte-characters: t

[-- Attachment #3: Type: message/rfc822, Size: 2087 bytes --]

From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Kevin Ryde <user42@zip.com.au>
Subject: Re: bug#4511: 23.1; flyspell-mode slow editing near end of big html file
Date: Wed, 23 Sep 2009 19:06:04 -0400
Message-ID: <jwvhbuttg6s.fsf-monnier+emacsbugreports@gnu.org>

>>> You need to pass it a `limit' argument.
>> I thought about that a bit.  The limit would be the immediately
>> preceding "<", ">", or "\n", since whichever of them is hit first
>> answers whether you're in a tag or not.

> (line-beginning-position) will do fine.

Installed,


        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
  2009-09-23  3:13       ` Stefan Monnier
@ 2009-10-16 21:57         ` Kevin Ryde
  2009-10-17  2:14           ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Ryde @ 2009-10-16 21:57 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 4511

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
> The given regexp is actually plenty, in this respect.  It's just that
> looking-back is a dog and doesn't make good use of the regexp.

Oh, well, I suppose a genuine reverse matcher could do the right thing,
probably if "<" was added to the exclusions like "<[^<>\n]*" -- not that
that helps since there isn't a reverse matcher :-).


But on the principle "why can't someone else do it", what about letting
`sgml-lexical-context' determine the context.  Tested only briefly:

(defun sgml-mode-flyspell-verify ()
  "Function used for `flyspell-generic-check-word-predicate' in SGML mode."
  (not (memq (car (sgml-lexical-context))
             '(tag pi))))

Seems fast enough for me, and I think it means CDATA text is checked,
which I think would be desirable, but I'm not well up on that stuff.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
  2009-10-16 21:57         ` Kevin Ryde
@ 2009-10-17  2:14           ` Stefan Monnier
  2009-11-07  0:21             ` Kevin Ryde
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2009-10-17  2:14 UTC (permalink / raw)
  To: Kevin Ryde; +Cc: 4511

> But on the principle "why can't someone else do it", what about letting
> `sgml-lexical-context' determine the context.  Tested only briefly:

> (defun sgml-mode-flyspell-verify ()
>   "Function used for `flyspell-generic-check-word-predicate' in SGML mode."
>   (not (memq (car (sgml-lexical-context))
>              '(tag pi))))

> Seems fast enough for me, and I think it means CDATA text is checked,
> which I think would be desirable, but I'm not well up on that stuff.

If performance is good enough, then it's a very good option, indeed.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
  2009-10-17  2:14           ` Stefan Monnier
@ 2009-11-07  0:21             ` Kevin Ryde
  2009-11-10 22:18               ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Ryde @ 2009-11-07  0:21 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 4511

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
> If performance is good enough,

I've been using it, it seems good.  What chance putting it in for
everyone to have a go?

The only thing to note is right now sgml-lexical-context doesn't
recognise <!-- ... --> comments (bug 4781).  But the current
sgml-mode-flyspell-verify code doesn't recognise such comments either,
so nothing is lost.

I think it makes sense to spell check comments.  The net effect of
excluding just "tag" and "pi" parts is that tag and attribute names are
skipped, but basically everything else is checked.  String valued
attributes are checked, which makes sense for

    <img ... alt="Some text">

though string values which are urls might not want checking.  Maybe some
tag/attribute type info could distinguish the two cases, if it seemed
important enough ...





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
  2009-11-07  0:21             ` Kevin Ryde
@ 2009-11-10 22:18               ` Stefan Monnier
  2009-11-17  0:22                 ` Kevin Ryde
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2009-11-10 22:18 UTC (permalink / raw)
  To: Kevin Ryde; +Cc: 4511

>> If performance is good enough,
> I've been using it, it seems good.  What chance putting it in for
> everyone to have a go?

Try it.


        Stefan





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#4511: 23.1; flyspell-mode slow editing near end of big html file
  2009-11-10 22:18               ` Stefan Monnier
@ 2009-11-17  0:22                 ` Kevin Ryde
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Ryde @ 2009-11-17  0:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 4511

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
>
> Try it.

Done.  It'll benefit from bug#4781 when that's addressed.





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-11-17  0:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <jwvhbuttg6s.fsf-monnier+emacsbugreports@gnu.org>
2009-09-21 22:24 ` bug#4511: 23.1; flyspell-mode slow editing near end of big html file Kevin Ryde
2009-09-22 21:40   ` Stefan Monnier
2009-09-23  0:56     ` Kevin Ryde
2009-09-23  3:13       ` Stefan Monnier
2009-10-16 21:57         ` Kevin Ryde
2009-10-17  2:14           ` Stefan Monnier
2009-11-07  0:21             ` Kevin Ryde
2009-11-10 22:18               ` Stefan Monnier
2009-11-17  0:22                 ` Kevin Ryde
2009-09-23 23:15   ` bug#4511: marked as done (23.1; flyspell-mode slow editing near end of big html file) Emacs bug Tracking System

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).