emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* org-element checks make flyspell prohibitively slow
@ 2014-03-17 14:32 Matt Lundin
  2014-03-17 21:22 ` Nicolas Goaziou
  0 siblings, 1 reply; 3+ messages in thread
From: Matt Lundin @ 2014-03-17 14:32 UTC (permalink / raw)
  To: Emacs-orgmode list

[-- Attachment #1: Type: text/plain, Size: 290 bytes --]

The rewrite of org-mode-flyspell-verify in commit
4a27c2b4b67201e0b23f431bdaeb6460b31e1394 (Nov 21, 2013) makes navigating
org-mode files with large chunks of text very slow.

For instance, I started up a minimal emacs:

/usr/bin/emacs -Q -l ~/config/minimal.el

...where minimal.el is...


[-- Attachment #2: minimal.el --]
[-- Type: application/emacs-lisp, Size: 104 bytes --]

[-- Attachment #3: Type: text/plain, Size: 7317 bytes --]


and (insert "\n\n=> " (emacs-version))

=> GNU Emacs 24.3.1 (x86_64-unknown-linux-gnu, GTK+ Version 3.10.7)
 of 2014-01-28 on var-lib-archbuild-extra-x86_64-juergen

and (insert "\n\n=> " (org-version nil t))

=> Org-mode version 8.2.5h (release_8.2.5h-757-gc444e4 @ /home/matt/org-mode/lisp/)

I open a test.org file containing the following.

--8<---------------cut here---------------start------------->8---
* A headline
* Arch packages
* Another headline
--8<---------------cut here---------------end--------------->8---

After opening a line under "Arch Packages" I call...

C-u M-! pacman -Ss [RET]

(Of course, this only works with archlinux.) This inserts a long list of
packages that look like this:

--8<---------------cut here---------------start------------->8---
core/acl 2.2.52-2 [installed]
    Access control list utilities, libraries and headers
core/archlinux-keyring 20140220-1 [installed]
    Arch Linux PGP keyring
core/attr 2.4.47-1 [installed]
    Extended attribute support library for ACL support
core/autoconf 2.69-1 (base-devel) [installed]
    A GNU tool for automatically configuring source code
core/automake 1.14.1-1 (base-devel) [installed]
    A GNU tool for automatically creating Makefiles
--8<---------------cut here---------------end--------------->8---
 
All in all, it's 12680 lines.... 

I navigate to the bottom of the file. I type... 

M-x elp-instrument-package [RET] org [RET]
M-x elp-instrument-package [RET] flyspell [RET]
M-x elp-instrument-function [RET] scroll-down-command [RET] 

Then I hit M-v three times. This takes a while.

Here are the top elp offenders:

--8<---------------cut here---------------start------------->8---
flyspell-post-command-hook                          6           10.753828775  1.7923047959
flyspell-word                                       6           10.752069764  1.7920116273
org-mode-flyspell-verify                            5           8.973166134   1.7946332267
org-element-context                                 5           8.364142505   1.6728285010
org-element--get-next-object-candidates             699         8.307898326   0.0118854053
org-element-latex-or-entity-successor               5           3.7592736849  0.7518547369
org-element-link-successor                          40          1.1079495280  0.0276987382
org-element-sub/superscript-successor               659         1.0986591029  0.0016671610
org-element-line-break-successor                    5           0.9729438699  0.194588774
org-element-at-point                                5           0.607910786   0.1215821572
org-element--parse-to                               5           0.606992172   0.1213984344
org-element--current-element                        10          0.4201667370  0.0420166737
org-element-paragraph-parser                        10          0.416739094   0.0416739094
org-element-inline-src-block-successor              5           0.3740871620  0.0748174324
org-element-text-markup-successor                   10          0.309006309   0.0309006308
org-element-timestamp-successor                     5           0.087275674   0.0174551348
org-element-statistics-cookie-successor             5           0.086838821   0.0173677642
org-element-footnote-reference-successor            5           0.0866179840  0.0173235968
org-element-target-successor                        5           0.086057234   0.0172114468
org-element-radio-target-successor                  5           0.083322691   0.0166645382
org-element-export-snippet-successor                5           0.083078665   0.016615733
org-element-macro-successor                         5           0.0828692849  0.0165738569
scroll-down-command                                 3           0.059660938   0.0198869793
--8<---------------cut here---------------end--------------->8---

Interestingly, after calling elp-results, just trying to navigate to the
org buffer with other-window takes some time. Here's the top of the new
elp list:

--8<---------------cut here---------------start------------->8---
flyspell-post-command-hook                          1           1.780324266   1.780324266
flyspell-word                                       1           1.780091208   1.780091208
org-mode-flyspell-verify                            1           1.779600437   1.779600437
org-element-context                                 1           1.6563819400  1.6563819400
org-element--get-next-object-candidates             137         1.6448783359  0.0120064112
org-element-latex-or-entity-successor               1           0.753972365   0.753972365
org-element-link-successor                          8           0.2225488189  0.0278186023
org-element-sub/superscript-successor               129         0.206221951   0.0015986197
org-element-line-break-successor                    1           0.19533769    0.19533769
org-element-at-point                                1           0.12299361    0.12299361
org-element--parse-to                               1           0.122834024   0.122834024
org-element--current-element                        2           0.085711871   0.0428559355
org-element-paragraph-parser                        2           0.084995934   0.042497967
org-element-inline-src-block-successor              1           0.074795629   0.074795629
org-element-text-markup-successor                   2           0.0633956     0.0316978
org-element-export-snippet-successor                1           0.020310027   0.020310027
org-element-timestamp-successor                     1           0.017533021   0.017533021
org-element-footnote-reference-successor            1           0.017371898   0.017371898
org-element-statistics-cookie-successor             1           0.017308972   0.017308972
org-element-radio-target-successor                  1           0.016797529   0.016797529
org-element-macro-successor                         1           0.016773084   0.016773084
org-element-target-successor                        1           0.016545404   0.016545404
org-element-subscript-parser                        128         0.0039024930  3.048...e-05
org-element-inline-babel-call-successor             1           0.000752604   0.000752604
org-element-link-parser                             7           0.000386358   5.5194e-05
flyspell-get-word                                   1           0.000357585   0.000357585
--8<---------------cut here---------------end--------------->8---

My apologies for the Arch specific example. The indentation of the
output seems to cause particular problems. I'd be happy to provide a
full test file off-list if required.

But this works (more or less) with other very large chunks of text.
E.g.,

C-u M-! w3m -dump http://www.gnu.org/software/emacs/manual/html_mono/emacs.html

Is it possible to speed up org-element-context here? For something
called as often as org-mode-flyspell-verify, do we need all the overhead
of the org-element parser? Or would a hack optimized for speed (which is
what the older version of org-mode-flyspell-verify represented) be
enough? I recall (though my memory may be faulty) discussions on the
list quite some time back in which we decided to prioritize
speed/efficiency over thoroughness/completeness in the checks run by
org-mode-flyspell-verify.

Thanks,
Matt


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: org-element checks make flyspell prohibitively slow
  2014-03-17 14:32 org-element checks make flyspell prohibitively slow Matt Lundin
@ 2014-03-17 21:22 ` Nicolas Goaziou
  2014-03-21 18:56   ` Matt Lundin
  0 siblings, 1 reply; 3+ messages in thread
From: Nicolas Goaziou @ 2014-03-17 21:22 UTC (permalink / raw)
  To: Matt Lundin; +Cc: Emacs-orgmode list

Hello,

Matt Lundin <mdl@imapmail.org> writes:

> The rewrite of org-mode-flyspell-verify in commit
> 4a27c2b4b67201e0b23f431bdaeb6460b31e1394 (Nov 21, 2013) makes navigating
> org-mode files with large chunks of text very slow.

[...]

> => Org-mode version 8.2.5h (release_8.2.5h-757-gc444e4 @
> /home/matt/org-mode/lisp/)

Could you update and try again? Parser's cache was inadvertently
disabled. I re-enabled it.

> I open a test.org file containing the following.
>
> * A headline * Arch packages * Another headline
>
> After opening a line under "Arch Packages" I call...
>
> C-u M-! pacman -Ss [RET]
>
> (Of course, this only works with archlinux.) This inserts a long list of
> packages that look like this:
>
> core/acl 2.2.52-2 [installed]
>     Access control list utilities, libraries and headers
> core/archlinux-keyring 20140220-1 [installed]
>     Arch Linux PGP keyring
> core/attr 2.4.47-1 [installed]
>     Extended attribute support library for ACL support
> core/autoconf 2.69-1 (base-devel) [installed]
>     A GNU tool for automatically configuring source code
> core/automake 1.14.1-1 (base-devel) [installed]
>     A GNU tool for automatically creating Makefiles
>  
> All in all, it's 12680 lines.... 

Note that it is a contrived example: the whole buffer is a single
paragraph containing around 150 objects. The current algorithm for
`org-element-context' is clearly not on par with such a density of
objects per paragraph.

Also, cache cannot help here, because each time you edit a paragraph,
all objects within are removed from the cache (because, AFAIK, there is
no way to know if the edition altered a previously parsed object or not,
so, as a security measure, all of them are wiped out) and you have to
parse them again.

Therefore, navigation should be fast but editing (with flyspell-mode
enabled) is going to be slow.

> But this works (more or less) with other very large chunks of text.
> E.g.,
>
> C-u M-! w3m -dump http://www.gnu.org/software/emacs/manual/html_mono/emacs.html

This one should be reasonably fast in both cases.

> Is it possible to speed up org-element-context here?

Certainly. `org-element-context' is the less optimized part of the
parser code. There is room for improvements.

> For something called as often as org-mode-flyspell-verify, do we need
> all the overhead of the org-element parser?

Of course.

> Or would a hack optimized for speed (which is what the older version
> of org-mode-flyspell-verify represented) be enough?

IMO, the old version of this function was annoying as soon as you
switched to a non-english language. YMMV.

> I recall (though my memory may be faulty) discussions on the list
> quite some time back in which we decided to prioritize
> speed/efficiency over thoroughness/completeness in the checks run by
> org-mode-flyspell-verify.

Why prioritize when we can have both?

I agree that `org-mode-flyspell-verify' is not fast enough at the time
being, but it is quite usable anyway. Also, as a very demanding
function, it is a good benchmark for the parser.

In order to improve the current state, reports (like those in your
message) help a lot. You can also help improving the algorithms.

Thank you.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: org-element checks make flyspell prohibitively slow
  2014-03-17 21:22 ` Nicolas Goaziou
@ 2014-03-21 18:56   ` Matt Lundin
  0 siblings, 0 replies; 3+ messages in thread
From: Matt Lundin @ 2014-03-21 18:56 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: Emacs-orgmode list

Hi Nicolas,

Nicolas Goaziou <n.goaziou@gmail.com> writes:

> Matt Lundin <mdl@imapmail.org> writes:
>
>> The rewrite of org-mode-flyspell-verify in commit
>> 4a27c2b4b67201e0b23f431bdaeb6460b31e1394 (Nov 21, 2013) makes navigating
>> org-mode files with large chunks of text very slow.
>
> [...]
>
>> => Org-mode version 8.2.5h (release_8.2.5h-757-gc444e4 @
>> /home/matt/org-mode/lisp/)
>
> Could you update and try again? Parser's cache was inadvertently
> disabled. I re-enabled it.

Yes, I can confirm that it is faster now. Thanks.

>> core/acl 2.2.52-2 [installed]
>>     Access control list utilities, libraries and headers
>> core/archlinux-keyring 20140220-1 [installed]
>>     Arch Linux PGP keyring
>> core/attr 2.4.47-1 [installed]
>>     Extended attribute support library for ACL support
>> core/autoconf 2.69-1 (base-devel) [installed]
>>     A GNU tool for automatically configuring source code
>> core/automake 1.14.1-1 (base-devel) [installed]
>>     A GNU tool for automatically creating Makefiles
>>  
>> All in all, it's 12680 lines.... 
>
> Note that it is a contrived example: the whole buffer is a single
> paragraph containing around 150 objects. The current algorithm for
> `org-element-context' is clearly not on par with such a density of
> objects per paragraph.

Yes, it is indeed a contrived example. I originally thought I had a use
for it --- i.e., analyzing the packages I had installed --- but soon
realized that such a task is better accomplished in a separate text
file.

> Also, cache cannot help here, because each time you edit a paragraph,
> all objects within are removed from the cache (because, AFAIK, there
> is no way to know if the edition altered a previously parsed object or
> not, so, as a security measure, all of them are wiped out) and you
> have to parse them again.
>
> Therefore, navigation should be fast but editing (with flyspell-mode
> enabled) is going to be slow.

Good to know.
Thanks again!

Matt

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-03-21 18:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-17 14:32 org-element checks make flyspell prohibitively slow Matt Lundin
2014-03-17 21:22 ` Nicolas Goaziou
2014-03-21 18:56   ` Matt Lundin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).