emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@gmail.com>
To: Maxim Nikulin <manikulin@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [PATCH] Use cache in org-up-heading-safe
Date: Thu, 06 May 2021 22:34:13 +0800	[thread overview]
Message-ID: <87tung2aoq.fsf@localhost> (raw)
In-Reply-To: <s6uhpi$kek$1@ciao.gmane.io>

[-- Attachment #1: Type: text/plain, Size: 4271 bytes --]

Maxim Nikulin <manikulin@gmail.com> writes:
> I have not tested the patch due to I do not use agenda. My interest was 
> stimulated solely by my attempts to make org-refile-get-targets faster.

Thanks for the feedback!

> I consider it as an improvement. I have noticed the only thing that must 
> be fixed: comments with old variant of the function. Reaction to my 
> other comments is optional.

Oops. Fixed.

> In org-agenda.el org-up-heading-safe function is called only from 
> org-find-top-headline.

That's probably not the slowest part. org-agenda also calls
org-up-heading-safe indirectly. In particular, org-get-tags is calling
org-up-heading-safe to get inherited tags. It is org-get-tags that is
taking >50% time of agenda generation in my agendas. And
org-up-heading-safe was the largest contributor to that >50% time.

> ...  So the purpose of the dance with backward 
> searches is to get top level headings. My expectation is that scan 
> through the whole buffer and storing result in a structure that allows 
> binary search of position (array, red-black tree, etc) may be even
> faster.

Scan through the whole buffer could be faster, but it is not always
desired. Most of Org code only need information for current headline.
Re-scanning the whole buffer would be costly.

Also, I tried to compare avl-tree with hash table. However, avl-tree
does not give much benefit for my test case of an agenda with ~12000
todo items. Since avl-tree requires much more custom code, hash table
should be better.

;; No cache
;; org-up-heading-safe  180493      285.37713697  0.0015810980
;; org-up-heading-safe  134876      199.44584854  0.0014787349
;; org-up-heading-safe  134872      198.23494205  0.0014698005

;; Hash cache
;; org-up-heading-safe  180493      5.0327451709  2.788...e-05
;; org-up-heading-safe  134872      1.3568663460  1.006...e-05
;; org-up-heading-safe  134872      0.7527555679  5.581...e-06

;; AVL cache
;; org-up-heading-safe  180500      5.1044455589  2.827...e-05
;; org-up-heading-safe  134872      1.2781428280  9.476...e-06
;; org-up-heading-safe  134872      1.2866258809  9.539...e-06

> Alternatively lazily obtained position of top heading could be stored in 
> cache to reduce number of iterations on org-find-top-line.

That one is used for org-agenda-filter-by-top-headline. I do not really
use it, so I have no clue if there is much need to optimise it
specifically. Yet, caching org-up-heading-safe will indeed speed it up
as well.

>> +    (let ((level-cache (gethash (point) org--up-heading-cache)))
>> +      (if (and level-cache
>> +               (eq (buffer-chars-modified-tick) org--up-heading-cache-tick))
>
> If buffer-chars-modified-tick is faster than gethash then reordering 
> might result in very slight improvement of performance.

Sure. Done.

>> +            ;; Parent is inside accessible part of the buffer.
>> +            (progn (goto-char level-cache)
>> +                   (funcall outline-level)))
>
> I do not see any reason why outline level can not be cached in pair with 
> position.

Hmm. You are right. Benchmarking with and without additional caching
shows that it can give extra ~2x improvement:

;; Hash cache
;; org-up-heading-safe  180493      5.0327451709  2.788...e-05
;; org-up-heading-safe  134872      1.3568663460  1.006...e-05
;; org-up-heading-safe  134872      0.7527555679  5.581...e-06

;; Hash cache + outline-level cache
;; org-up-heading-safe  180507      4.3326449169  2.400...e-05
;; org-up-heading-safe  134872      0.4952472380  3.671...e-06
;; org-up-heading-safe  134879      0.5298789350  3.928...e-06

So, outline-level is also cached in the new version of the patch.

>> +          (let (result)
>> +            (setq result
>
> I am not a lisp guru, so from my point of view this can be reduced to
> (let ((result ...

Sure.

>> +                        (format "^\\*\\{1,%d\\} " level-up) nil t)
>
> \t as an alternative to " " is used in org-refile.el,
> e.g. "^\\*+[ \t]+". Unsure which variant is canonical one and I see that 
> regexp is taken from original function, so no regression is expected.

I do not know either. Actually, I wish if Org mode code base used less
literal strings and more regexp variables from org-element.el and
org.el.

Best,
Ihor


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Use-cache-in-org-up-heading-safe.patch --]
[-- Type: text/x-diff, Size: 3775 bytes --]

From d40b2bbb1dc5113d3085be2fae52ebe5ac1b023d Mon Sep 17 00:00:00 2001
Message-Id: <d40b2bbb1dc5113d3085be2fae52ebe5ac1b023d.1620299446.git.yantar92@gmail.com>
From: Ihor Radchenko <yantar92@gmail.com>
Date: Thu, 6 May 2021 14:13:20 +0800
Subject: [PATCH] Use cache in org-up-heading-safe

* lisp/org.el (org-up-heading-safe): Use buffer-local cache to store
positions of parent headings.  The cache is invalidated when buffer
text is changed, according to `buffer-chars-modified-tick'.
(org--up-heading-cache):  Buffer-local hash-table storing the cache.
(org--up-heading-cache-tick):  The buffer modification state for
currently active `org--up-heading-cache'.

The optimisation is critical when running agenda or org-entry-get
queries using property/tag inheritance.  In such scenarios
`org-up-heading-safe' can be called thousands of times.  For example,
building todo agenda on large number of headings lead to the following
benchmark results:

Benchmark:

1. (elp-instrument-function #'org-up-heading-safe)
2. Run agenda
3. (elp-results) ;; function, # calls, total time, avg time

Without cache:
org-up-heading-safe  27555       8.4234025759  0.0003056941

With cache, first run:
org-up-heading-safe  23227       0.5300747539  2.282...e-05

With cache, second run on unchanged buffer:
org-up-heading-safe  23227       0.1447754880  6.233...e-06

The final speedup is 1-2 orders of magnitude (~15-56 times).
---
 lisp/org.el | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index 0ff13c79c..c7e87a804 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -20440,6 +20440,10 @@ (defun org-up-heading-all (arg)
 With argument, move up ARG levels."
   (outline-up-heading arg t))
 
+(defvar-local org--up-heading-cache (make-hash-table)
+  "Buffer-local `org-up-heading-safe' cache.")
+(defvar-local org--up-heading-cache-tick nil
+  "Buffer `buffer-chars-modified-tick' in `org--up-heading-cache'.")
 (defun org-up-heading-safe ()
   "Move to the heading line of which the present line is a subheading.
 This version will not throw an error.  It will return the level of the
@@ -20449,10 +20453,26 @@ (defun org-up-heading-safe ()
 because it relies on stars being the outline starters.  This can really
 make a significant difference in outlines with very many siblings."
   (when (ignore-errors (org-back-to-heading t))
-    (let ((level-up (1- (funcall outline-level))))
-      (and (> level-up 0)
-	   (re-search-backward (format "^\\*\\{1,%d\\} " level-up) nil t)
-	   (funcall outline-level)))))
+    (let (level-cache)
+      (if (and (eq (buffer-chars-modified-tick) org--up-heading-cache-tick)
+               (setq level-cache (gethash (point) org--up-heading-cache)))
+          (when (<= (point-min) (car level-cache) (point-max))
+            ;; Parent is inside accessible part of the buffer.
+            (progn (goto-char (car level-cache))
+                   (cdr level-cache)))
+        ;; Buffer modified.  Invalidate cache.
+        (unless (eq (buffer-chars-modified-tick) org--up-heading-cache-tick)
+          (setq-local org--up-heading-cache-tick
+                      (buffer-chars-modified-tick))
+          (clrhash org--up-heading-cache))
+        (let* ((level-up (1- (funcall outline-level)))
+               (pos (point))
+               (result (and (> level-up 0)
+	                    (re-search-backward
+                             (format "^\\*\\{1,%d\\} " level-up) nil t)
+	                    (funcall outline-level))))
+          (when result (puthash pos (cons (point) result) org--up-heading-cache))
+          result)))))
 
 (defun org-up-heading-or-point-min ()
   "Move to the heading line of which the present is a subheading, or point-min.
-- 
2.26.3


  reply	other threads:[~2021-05-06 14:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04 15:08 [PATCH] Use cache in org-up-heading-safe Ihor Radchenko
2021-05-05 16:40 ` Maxim Nikulin
2021-05-06 14:34   ` Ihor Radchenko [this message]
2021-05-06 17:02     ` Maxim Nikulin
2021-05-07  2:08       ` Ihor Radchenko
2021-05-08 11:28         ` Maxim Nikulin
2021-05-10 15:14           ` Ihor Radchenko
2021-05-15 11:58         ` Bastien
2021-05-16  6:15 ` Bastien
2021-05-16  6:36   ` Ihor Radchenko
2021-05-16  8:53     ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tung2aoq.fsf@localhost \
    --to=yantar92@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).