emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [PATCH] Bug: fragile org refile cache
@ 2021-04-28 16:09 Maxim Nikulin
  2021-04-29  0:50 ` Samuel Wales
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Nikulin @ 2021-04-28 16:09 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]

Reviewing my attempt to speedup collecting of refile targets
https://orgmode.org/list/s209r8$16en$1@ciao.gmane.io/
I have realized that refile cache is unreliable.

With specific customization, cache content and thus refile targets 
depend on the function called first: `org-refile' ([C-u] C-c C-w) or 
`org-goto' (C-u C-c C-w).

I decided to try to provide minimal example and steps to reproduce in 
the form of a test marked as expected failure.

Certainly cache should be enabled
     (org-refile-use-cache t)
`org-refile-targets' should have (nil . (:maxlevel . 5)) for the 
affected buffer. The value of the following setting is intentionally set 
to alternative value in comparison to `org-goto' code
     (org-refile-use-outline-path nil)
To see the issue interactively, you likely should set
     (org-outline-path-complete-in-steps nil)

Clean cache (C-u C-u C-u C-c C-w) and try jumping using C-u C-c C-w and 
C-u C-c C-j in various order. Use TAB completion to see targets.

To make difference more apparent, define 
`org-refile-target-verify-function' to filter-out some headings.

My expectation that each command has list of targets formatted 
accordingly to user setting or `org-goto' internal overrides. Actually 
lists of target are the same since they share cache entry.

I suppose, cache keys should include values of all parameters affecting 
filtering and formatting, not only regexp for heading selection.

However I have no idea how to derive some value suitable for cache key 
from `org-refile-target-verify-function'.

[-- Attachment #2: 0001-testing-lisp-test-org.el-Refile-cache-failure.patch --]
[-- Type: text/x-patch, Size: 2431 bytes --]

From d23b10d658539a4646ef015ac2660e1f8c8e7e1b Mon Sep 17 00:00:00 2001
From: Max Nikulin <manikulin@gmail.com>
Date: Wed, 28 Apr 2021 22:30:04 +0700
Subject: [PATCH] testing/lisp/test-org.el: Refile cache failure

testing/lisp/test-org.el (test-org/org-refile-cache): Add expected
failure for cache content discrepancy when it is populated
by `org-refile' or by `org-goto' (C-u C-c C-j).

Usage of just regexp for heading filtering as cache key is unreliable
since other settings affects selected entries
(`org-refile-target-verify-function') or content of entries
(`org-refile-use-outline-path').  However most of users have no chance
to notice collision since they do not call both functions
interchangeably or have `org-refile-targets' different from
the value in `org-goto' code.
---
 testing/lisp/test-org.el | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index 9f0ede1b9..f44cd76a6 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -6394,6 +6394,30 @@ Paragraph<point>"
 	    (org-refile-targets `((nil :level . 1))))
        (member (buffer-name) (mapcar #'car (org-refile-get-targets)))))))
 
+(ert-deftest test-org/org-refile-cache ()
+  "Demostrate a cache conflict of `org-goto' and `org-refile'.
+
+Notice that `org-refile-target-verify-function' can cause more severe
+divergence of cache content generated by these functions."
+  :expected-result :failed
+  (org-test-with-temp-text "* H1\n** H2\n"
+    (let ((org-refile-use-cache t)
+          ;; Regexp serves as cache key, so targets should be the same
+          ;; as in `org-goto' code.
+          (org-refile-targets '((nil . (:maxlevel . 5))))
+          ;; This value is opposite to the one in `org-goto' code.
+          (org-refile-use-outline-path nil)
+          (targets-refile))
+      ;; If jumping using `org-refile' (C-u C-c C-w)...
+      (setq targets-refile (mapcar #'car (org-refile-get-targets)))
+      (org-refile-cache-clear)
+      ;; Tune settings to simulate `org-goto' (C-u C-c C-j).
+      (let ((org-refile-use-outline-path t))
+        ;; Value is discarded, call just to populate the cache.
+        (org-refile-get-targets))
+      ;; Targets got by `org-refile'.
+      (let ((targets-goto (mapcar #'car (org-refile-get-targets))))
+        (should (equal targets-refile targets-goto))))))
 
 \f
 ;;; Sparse trees
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-28 16:09 [PATCH] Bug: fragile org refile cache Maxim Nikulin
@ 2021-04-29  0:50 ` Samuel Wales
  2021-04-29  1:29   ` Ihor Radchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Samuel Wales @ 2021-04-29  0:50 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

long ago i used to use the refile cache.  i think it is probably not
widely used, or maybe even not at all.

the reason i stopped was behavior that sounds similar to your
description.  such as, it would produce a set of results that did not
match the calling conditions.  i cannot reach the thread you point to,
however.

what i did was use ido-hacks to speed up ido and restrict to :goto:
and :refile: tagged entries for common targets.  this worked for me;
it is fast despite my huge set of olpaths.  it would be great if the
cache were unnecessary.

that might nt work for you, but perhaps this provides confirmation or
ideas for you.


On 4/28/21, Maxim Nikulin <manikulin@gmail.com> wrote:
> Reviewing my attempt to speedup collecting of refile targets
> https://orgmode.org/list/s209r8$16en$1@ciao.gmane.io/
> I have realized that refile cache is unreliable.
>
> With specific customization, cache content and thus refile targets
> depend on the function called first: `org-refile' ([C-u] C-c C-w) or
> `org-goto' (C-u C-c C-w).
>
> I decided to try to provide minimal example and steps to reproduce in
> the form of a test marked as expected failure.
>
> Certainly cache should be enabled
>      (org-refile-use-cache t)
> `org-refile-targets' should have (nil . (:maxlevel . 5)) for the
> affected buffer. The value of the following setting is intentionally set
> to alternative value in comparison to `org-goto' code
>      (org-refile-use-outline-path nil)
> To see the issue interactively, you likely should set
>      (org-outline-path-complete-in-steps nil)
>
> Clean cache (C-u C-u C-u C-c C-w) and try jumping using C-u C-c C-w and
> C-u C-c C-j in various order. Use TAB completion to see targets.
>
> To make difference more apparent, define
> `org-refile-target-verify-function' to filter-out some headings.
>
> My expectation that each command has list of targets formatted
> accordingly to user setting or `org-goto' internal overrides. Actually
> lists of target are the same since they share cache entry.
>
> I suppose, cache keys should include values of all parameters affecting
> filtering and formatting, not only regexp for heading selection.
>
> However I have no idea how to derive some value suitable for cache key
> from `org-refile-target-verify-function'.
>


-- 
The Kafka Pandemic

Please learn what misopathy is.
https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29  0:50 ` Samuel Wales
@ 2021-04-29  1:29   ` Ihor Radchenko
  2021-04-29  1:34     ` Samuel Wales
  0 siblings, 1 reply; 18+ messages in thread
From: Ihor Radchenko @ 2021-04-29  1:29 UTC (permalink / raw)
  To: Samuel Wales; +Cc: Maxim Nikulin, emacs-orgmode

Samuel Wales <samologist@gmail.com> writes:

> long ago i used to use the refile cache.  i think it is probably not
> widely used, or maybe even not at all.

At least, I do use it. A lot. I rely on it.

I do not observe the breakage as described in the first message, mostly
because I use refile cache exclusively for org-refile. Yet, I often see
"Please regenerate the refile cache with C-0 C-c C-w" when I do a lot of
batch refiling. A faster, more reliable, caching would be certainly
welcome.

In general, various parts of Org mode code base implement different
types of caches in parallel. I am aware at least about org-element,
org-scan-tags, org-agenda, org-refile, and org-goto. Probably Org mode
could benefit from unified caching mechanism? A good implementation
coming to my mind is org-ql [1]. It implements tag caches, outline path
caches, and can even be used to cache results of an arbitrary function
with point at heading. Basically, all (except org-element) types of
caches Org mode uses now are already implemented in org-ql in unified
way.

WDYT?

[1] https://github.com/alphapapa/org-ql

Best,
Ihor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29  1:29   ` Ihor Radchenko
@ 2021-04-29  1:34     ` Samuel Wales
  2021-04-29 12:45       ` Maxim Nikulin
  2021-04-29 13:30       ` Ihor Radchenko
  0 siblings, 2 replies; 18+ messages in thread
From: Samuel Wales @ 2021-04-29  1:34 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Maxim Nikulin, emacs-orgmode

thanks for reporting that you use it.

would it be more useful if it automaticaly generated the cache instead
of telling you to runt he command to do so?

if a solid, perhaps unified, cache existed, would org-id use it too?

On 4/28/21, Ihor Radchenko <yantar92@gmail.com> wrote:
> Samuel Wales <samologist@gmail.com> writes:
>
>> long ago i used to use the refile cache.  i think it is probably not
>> widely used, or maybe even not at all.
>
> At least, I do use it. A lot. I rely on it.
>
> I do not observe the breakage as described in the first message, mostly
> because I use refile cache exclusively for org-refile. Yet, I often see
> "Please regenerate the refile cache with C-0 C-c C-w" when I do a lot of
> batch refiling. A faster, more reliable, caching would be certainly
> welcome.
>
> In general, various parts of Org mode code base implement different
> types of caches in parallel. I am aware at least about org-element,
> org-scan-tags, org-agenda, org-refile, and org-goto. Probably Org mode
> could benefit from unified caching mechanism? A good implementation
> coming to my mind is org-ql [1]. It implements tag caches, outline path
> caches, and can even be used to cache results of an arbitrary function
> with point at heading. Basically, all (except org-element) types of
> caches Org mode uses now are already implemented in org-ql in unified
> way.
>
> WDYT?
>
> [1] https://github.com/alphapapa/org-ql
>
> Best,
> Ihor
>


-- 
The Kafka Pandemic

Please learn what misopathy is.
https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29  1:34     ` Samuel Wales
@ 2021-04-29 12:45       ` Maxim Nikulin
  2021-04-29 14:12         ` Ihor Radchenko
  2021-04-29 13:30       ` Ihor Radchenko
  1 sibling, 1 reply; 18+ messages in thread
From: Maxim Nikulin @ 2021-04-29 12:45 UTC (permalink / raw)
  To: emacs-orgmode

On 29/04/2021 08:29, Ihor Radchenko wrote:
> I do not observe the breakage as described in the first message,
> mostly because I use refile cache exclusively for org-refile.

Maybe I could avoid org-goto as well. Actual reason to use it was that 
it does not ask for file name as the first step in the case of 
(org-refile-use-outline-path 'file). It took enough time to me to 
realize how to jump/refile to non-leaf heading without such settings.

> A faster, more reliable, caching would
> be certainly welcome.

Just an idea. Is it possible to implement some specific text property 
for heading lines, namely cleaned out heading text (no cookies, tags, 
hidden parts of links), that is updated after each editing (likely 
something like font locks)? It could significantly speed up scanning 
buffer for goto/refile targets. Unfortunately it would not help for 
files that have not opened yet.

> I am aware at least about org-element, 
> org-scan-tags, org-agenda, org-refile, and org-goto.

To be clear, org-refile and org-goto share the same cache and it is the 
source of the problem.

> A good > implementation coming to my mind is org-ql [1].

I have not looked into the code yet, so I am not ready to discuss it. At 
least Adam should agree to submit patches to org and Bastien and Nicolas 
should agree to support it.

On 29/04/2021 08:34, Samuel Wales wrote:
> 
> would it be more useful if it automaticaly generated the cache instead
> of telling you to runt he command to do so?

I am surprised as well that cache is not just regenerated when 
org-refile detects that it is stale. Can it be that under certain 
circumstances it just causes delay and, even with updated cache, user 
action fails anyway?

On 29/04/2021 07:50, Samuel Wales wrote:
> i cannot reach the thread you point to, however.

Actually namely your response inspired me to look closer at the 
implementation of collection of refile targets. Is there a particular 
form of Message-Id that is convenient for you? The date of initial 
message is 2021-03-02 (in UTC).





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29  1:34     ` Samuel Wales
  2021-04-29 12:45       ` Maxim Nikulin
@ 2021-04-29 13:30       ` Ihor Radchenko
  2021-04-29 19:17         ` Tim Cross
  1 sibling, 1 reply; 18+ messages in thread
From: Ihor Radchenko @ 2021-04-29 13:30 UTC (permalink / raw)
  To: Samuel Wales; +Cc: Maxim Nikulin, emacs-orgmode

Samuel Wales <samologist@gmail.com> writes:

> would it be more useful if it automaticaly generated the cache instead
> of telling you to runt he command to do so?

I think so. To be frank, I do not understand the reason why it is not
done by default.

> if a solid, perhaps unified, cache existed, would org-id use it too?

Sure. Why not? I imagine such cache can store the following info:

org files in system -> per-file cache -> per-heading cache -> ...

Best,
Ihor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 12:45       ` Maxim Nikulin
@ 2021-04-29 14:12         ` Ihor Radchenko
  2021-04-29 15:04           ` Maxim Nikulin
  0 siblings, 1 reply; 18+ messages in thread
From: Ihor Radchenko @ 2021-04-29 14:12 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> Maybe I could avoid org-goto as well. Actual reason to use it was that 
> it does not ask for file name as the first step in the case of 
> (org-refile-use-outline-path 'file). It took enough time to me to 
> realize how to jump/refile to non-leaf heading without such settings.

Note that Org mode has integration with imenu. It is an alternative you
can use to jump around current buffer.

> Just an idea. Is it possible to implement some specific text property 
> for heading lines, namely cleaned out heading text (no cookies, tags, 
> hidden parts of links), that is updated after each editing (likely 
> something like font locks)? It could significantly speed up scanning 
> buffer for goto/refile targets. Unfortunately it would not help for 
> files that have not opened yet.

There is org-outline-path-cache used by org-get-outline-cache. It avoids
computing parent's outline path multiple times, which is already a great
improvement.

For the cleaned heading text, I do not think that re-calculating the
heading text on each change is a good idea. It may degrade typing
latency. Yet, an acceptable approach could be simply invalidating cache
for the changed headings. Then, outline paths can be re-calculated on
changed headings when needed. This should be an improvement compared to
blanket (setq org-outline-path-cache nil) in org-refile-get-targets.

> To be clear, org-refile and org-goto share the same cache and it is the 
> source of the problem.

I would say that the problem is rather org-goto (ab)using org-refile
incorrectly. I have the following piece of elisp using org-refile
mechanism for headline selection:

(let ((org-refile-history (append saved-id parent-project))
	  (org-refile-cache nil)
	  (org-refile-target-verify-function #'org-att-id-skip-function))
      (let ((prompt-ans (org-refile-get-location "Link to attachment from")))
	(prog1
	    (org-id-get (seq-find #'markerp
				  prompt-ans)
			'create))))

org-refile-cache can be simply let-bound to nil (in my case) or
alternative cache variable, if the alternative cache should persist.

Best,
Ihor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 14:12         ` Ihor Radchenko
@ 2021-04-29 15:04           ` Maxim Nikulin
  2021-04-29 16:08             ` Ihor Radchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Nikulin @ 2021-04-29 15:04 UTC (permalink / raw)
  To: emacs-orgmode

On 29/04/2021 21:12, Ihor Radchenko wrote:
> There is org-outline-path-cache used by org-get-outline-cache. It avoids
> computing parent's outline path multiple times, which is already a great
> improvement.

Curiously my experience is that avoiding this lazy cache with 
backtracking and maintaining custom structure during sequential scan of 
the buffer works several times faster. However it is appropriate time to 
populate the cache you mentioned. Unfortunately it is still necessary to 
cleanup heading text, and it consumes significant time.

> org-refile-cache can be simply let-bound to nil (in my case) or
> alternative cache variable, if the alternative cache should persist.

Thank you for the idea. But I still hope that both function could use 
the same cache.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 15:04           ` Maxim Nikulin
@ 2021-04-29 16:08             ` Ihor Radchenko
  2021-04-29 16:51               ` Maxim Nikulin
                                 ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ihor Radchenko @ 2021-04-29 16:08 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> Curiously my experience is that avoiding this lazy cache with 
> backtracking and maintaining custom structure during sequential scan of 
> the buffer works several times faster.

My experience is exactly opposite. Or maybe I miss something. Can you
elaborate?

> ... However it is appropriate time to 
> populate the cache you mentioned. Unfortunately it is still necessary to 
> cleanup heading text, and it consumes significant time.

Did you do any benchmarks? I just tried

Outline path without cache:

(benchmark-run 1
  (goto-char (point-min))
  (while (re-search-forward "^\\*+" nil t)
    (org-get-outline-path t nil))) => (6.051079914 1 0.2864724879999869)

Outline path with cache:

(benchmark-run 1
  (goto-char (point-min))
  (while (re-search-forward "^\\*+" nil t)
    (org-get-outline-path t nil))) => (1.658461165 0 0.0)

Just cleanup heading text:

(benchmark-run 1
  (goto-char (point-min))
  (while (re-search-forward "^\\*+" nil t)
    (let ((case-fold-search nil))
      (looking-at org-complex-heading-regexp)
      (if (not (match-end 4)) ""
	;; Remove statistics cookies.
	(org-trim
	 (org-link-display-format
	  (replace-regexp-in-string
	   "\\[[0-9]+%\\]\\|\\[[0-9]+/[0-9]+\\]" ""
	   (match-string-no-properties 4)))))))) => (0.013364877 0 0.0)

Best,
Ihor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 16:08             ` Ihor Radchenko
@ 2021-04-29 16:51               ` Maxim Nikulin
  2021-04-30 16:56               ` Maxim Nikulin
  2021-05-01 14:48               ` Maxim Nikulin
  2 siblings, 0 replies; 18+ messages in thread
From: Maxim Nikulin @ 2021-04-29 16:51 UTC (permalink / raw)
  To: emacs-orgmode

On 29/04/2021 23:08, Ihor Radchenko wrote:
> Maxim Nikulin writes:
> 
>> Curiously my experience is that avoiding this lazy cache with
>> backtracking and maintaining custom structure during sequential scan of
>> the buffer works several times faster.
> 
> My experience is exactly opposite. Or maybe I miss something. Can you
> elaborate?

My benchmarks may be incorrect due to for development version I did not 
compile files. I did not purged outline path cache as well.
https://orgmode.org/list/s1qola$158l$1@ciao.gmane.io/

> Outline path without cache:
> 
> (benchmark-run 1
>    (goto-char (point-min))
>    (while (re-search-forward "^\\*+" nil t)
>      (org-get-outline-path t nil))) => (6.051079914 1 0.2864724879999869)
> 
> Outline path with cache:
> 
> (benchmark-run 1
>    (goto-char (point-min))
>    (while (re-search-forward "^\\*+" nil t)
>      (org-get-outline-path t nil))) => (1.658461165 0 0.0)
                                ^ t
I suppose.

I agree with such test.

Notice however the following patch (warning: :level and :max-level a 
cached with the same key)
https://orgmode.org/list/s209r8$16en$1@ciao.gmane.io/

Avoiding call to org-get-outline-path and using custom structure during 
single pass scan through the buffer allowed to significantly improve 
performance.

> Just cleanup heading text:
> 
> (benchmark-run 1
>    (goto-char (point-min))
>    (while (re-search-forward "^\\*+" nil t)
>      (let ((case-fold-search nil))
>        (looking-at org-complex-heading-regexp)
>        (if (not (match-end 4)) ""
> 	;; Remove statistics cookies.
> 	(org-trim
> 	 (org-link-display-format
> 	  (replace-regexp-in-string
> 	   "\\[[0-9]+%\\]\\|\\[[0-9]+/[0-9]+\\]" ""
> 	   (match-string-no-properties 4)))))))) => (0.013364877 0 0.0)

I may be wrong with the following statement. Attempt to profile 
org-refile-get-targets could give quite different results. I have seen a 
note that Emacs use internally a cache for only 5 compiled regular 
expressions. Just one extra regexp and every matching function require 
compiling of its regexp just wiped from the cache. It is a time 
consuming procedure. I am unsure whether you added all regexps used 
(directly or through function calls) by inner loop of 
org-refile-get-targets.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 13:30       ` Ihor Radchenko
@ 2021-04-29 19:17         ` Tim Cross
  2021-04-29 22:43           ` Samuel Wales
  2021-05-02  7:03           ` Ihor Radchenko
  0 siblings, 2 replies; 18+ messages in thread
From: Tim Cross @ 2021-04-29 19:17 UTC (permalink / raw)
  To: emacs-orgmode


Ihor Radchenko <yantar92@gmail.com> writes:

> Samuel Wales <samologist@gmail.com> writes:
>
>> would it be more useful if it automaticaly generated the cache instead
>> of telling you to runt he command to do so?
>
> I think so. To be frank, I do not understand the reason why it is not
> done by default.
>
>> if a solid, perhaps unified, cache existed, would org-id use it too?
>
> Sure. Why not? I imagine such cache can store the following info:
>
> org files in system -> per-file cache -> per-heading cache -> ...
>

I suspect the reason it is not done automatically is that getting that
right for all use cases is very hard to do without adding adverse impact
to performance. A cache which is marked as 'dirty' too often results in
too frequent cache refresh operations, but at the same time, determining
what changes in an org file actually invalidate the cache can be a
process intensive operation. Allowing the user to force cache refresh
when needed is likely a reasonable compromise. 

I recall having a lot of trouble getting org-refile to work well for me.
I use it a lot, but it was so long ago, I don't recall how I got to my
final configuration (I think I may have modified my workflow to work
better with what I was able to get working reasonably reliably and
efficiently). I now tend to refile to a fairly static set of paths, so
all works OK.

Sorry I cannot provide anything more substantial. I do understand your
frustration, but not sure what the right fix is. I can see having a
cache which is automatically refreshed when necessary will be
problematic to get working for all use cases. Having the ability to turn
automatic refresh on/off and having the ability to manually force a
refresh will likely always be required.



-- 
Tim Cross


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 19:17         ` Tim Cross
@ 2021-04-29 22:43           ` Samuel Wales
  2021-05-02  7:03           ` Ihor Radchenko
  1 sibling, 0 replies; 18+ messages in thread
From: Samuel Wales @ 2021-04-29 22:43 UTC (permalink / raw)
  To: Tim Cross; +Cc: emacs-orgmode

some users of org might not know that org-refile can go to the
location without refiling anything if you pass a c-u arg.

thus, org-goto is not the only header-selecting goto mechanism; there
is also refile goto.  [and imenu, helm, etc.]

i use refile-goto and find it satisfying for my purposes.  the
selection mechanism is exactly the same as for org-refile.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 16:08             ` Ihor Radchenko
  2021-04-29 16:51               ` Maxim Nikulin
@ 2021-04-30 16:56               ` Maxim Nikulin
  2021-05-01 14:48               ` Maxim Nikulin
  2 siblings, 0 replies; 18+ messages in thread
From: Maxim Nikulin @ 2021-04-30 16:56 UTC (permalink / raw)
  To: emacs-orgmode

On 29/04/2021 23:08, Ihor Radchenko wrote:
> Did you do any benchmarks? I just tried
> 
> Outline path without cache:

I have expanded your tests to make them more close to org-get-outline-path

org-get-outline-path without cache
| 5.459114059 | 12 | 1.2053580009999987 |
org-get-outline-path with cache
| 2.166378543 |  5 | 0.5011187770000003 |

Custom code to track outline path during scan
| 0.718288205 | 4 | 0.4005034349999992 |
The same without (org-trim ...) stuff
| 0.208798040 | 0 |                0.0 |

 From my point of view avoiding org-get-outline-path allows to 
significantly improve performance. Removing cookies, etc. is far from 
being negligible.

Test file is org-manual.org with 404 headings. Since it is not so much, 
I set minimal CPU frequency.

Notice that outline cache is cleared before each scan to simulate the 
case when org-goto is invoked just after emacs start or after 
adding/removing some lines in the beginning of the file. Actually it is 
a problem of outline cache that it easily became stale due to e.g. extra 
line close to buffer start.

Details:

#+begin_example bash
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ;
do
     echo powersave >"$i" ;
done
#+end_example

#+begin_src elisp
   (byte-compile-file (expand-file-name "~/src/emacs/org-mode/lisp/org.el"))
   (defun nm-count-headings (headings)
     (let ((n 0))
       (dolist (h headings)
	(setq n (+ n (length h))))
       (list (length headings) n)))
#+end_src

#+RESULTS:
: nm-count-headings

#+begin_src elisp
   (defun ir-tst-scan (buffer use-cache)
     (with-current-buffer buffer
       (goto-char 1)
       (let ((result nil))
	(while (re-search-forward "^\\*+[ \t]" nil t)
	  (beginning-of-line)
	  (let ((case-fold-search nil))
	    (looking-at org-complex-heading-regexp))
	  (push (org-get-outline-path t use-cache) result)
	  (end-of-line))
	result)))
   (byte-compile 'ir-tst-scan)
   (mapcar
    (lambda (use-cache)
      (let ((headings))
        (append
	(list use-cache)
	(benchmark-run 10
	  (progn
	    (setq org-outline-path-cache nil)
	    (setq headings
		  (ir-tst-scan "org-manual.org" use-cache))))
	(nm-count-headings headings))))
    '(nil t))
#+end_src

#+RESULTS:
| nil | 5.459114059 | 12 | 1.2053580009999987 | 404 | 1069 |
| t   | 2.166378543 |  5 | 0.5011187770000003 | 404 | 1069 |

| use-cache | benchmark | |                   | headings | outline 
components |

#+begin_src elisp
   (defun nm-tst-scan (buffer clean-headers)
     (with-current-buffer buffer
       (goto-char 1)
       (let ((result nil))
	(let ((outline-path))
	  (while (re-search-forward "^\\*+[ \t]" nil t)
	    (beginning-of-line)
	    (let ((case-fold-search nil))
	      (looking-at org-complex-heading-regexp))
	    (let ((heading (match-string-no-properties 4))
		  (heading-level (length (match-string-no-properties 1))))
	      (while (and outline-path (<= heading-level (caar outline-path)))
		(pop outline-path))
	      (push (cons heading-level
			  (if (not heading)
			      ""
			    (if (not clean-headers)
				heading
			      (org-trim
			       (org-link-display-format
				(replace-regexp-in-string
				 "\\[[0-9]+%\\]\\|\\[[0-9]+/[0-9]+\\]" ""
				 heading))))))
		    outline-path))
	    (end-of-line)
	    (push (mapcar #'cdr outline-path) result)))
	result)))
   (byte-compile 'nm-tst-scan)
   (mapcar
    (lambda (clean-headers)
      (let ((headings))
        (append
	(list clean-headers)
	(benchmark-run 10
	  (progn
	    (setq org-outline-path-cache nil)
	    (setq headings
		  (nm-tst-scan "org-manual.org" clean-headers))))
	(nm-count-headings headings))))
    '(nil t))
#+end_src

#+RESULTS:
| nil | 0.20879804000000002 | 0 |                0.0 | 404 | 1069 |
| t   |         0.718288205 | 4 | 0.4005034349999992 | 404 | 1069 |

| clean-headers | benchmark |   |                    | headings | 
outline components |



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 16:08             ` Ihor Radchenko
  2021-04-29 16:51               ` Maxim Nikulin
  2021-04-30 16:56               ` Maxim Nikulin
@ 2021-05-01 14:48               ` Maxim Nikulin
  2021-05-02  6:59                 ` Ihor Radchenko
  2 siblings, 1 reply; 18+ messages in thread
From: Maxim Nikulin @ 2021-05-01 14:48 UTC (permalink / raw)
  To: emacs-orgmode

On 29/04/2021 23:08, Ihor Radchenko wrote:
> 
> My experience is exactly opposite. Or maybe I miss something. Can you
> elaborate?

Some additions. org-outline-path-cache is used solely by 
org-refile-get-targets (maybe there are some calls in other packages) 
but it efficiency is questionable. It was not clear for me earlier that 
the cache is reset before each scan through a buffer. So if 
org-refile-use is disabled, org-outline-path-cache from previous run of 
org-refile or org-goto is not used as well. A query to 
org-outline-path-cache requires at least one backward search and hash 
lookup. During sequential scan in org-refile-get-targets it is enough to 
have previous heading path to update it when new heading is found. I 
think, org-outline-path-cache should be deprecated.

> Just cleanup heading text:

I have realized what is wrong with this benchmark. It runs so fast 
because it matches no headings, so it never spent time for cleaning them up.

> (benchmark-run 1
>    (goto-char (point-min))
>    (while (re-search-forward "^\\*+" nil t)
>      (let ((case-fold-search nil))

(beginning-of-line)
should be added before next call and (end-of-line) somewhere later in 
while body.

>        (looking-at org-complex-heading-regexp)
>        (if (not (match-end 4)) ""
> 	;; Remove statistics cookies.
> 	(org-trim
> 	 (org-link-display-format
> 	  (replace-regexp-in-string
> 	   "\\[[0-9]+%\\]\\|\\[[0-9]+/[0-9]+\\]" ""
> 	   (match-string-no-properties 4)))))))) => (0.013364877 0 0.0)

On 29/04/2021 21:12, Ihor Radchenko wrote:
> For the cleaned heading text, I do not think that re-calculating the
> heading text on each change is a good idea. It may degrade typing
> latency. Yet, an acceptable approach could be simply invalidating cache
> for the changed headings. Then, outline paths can be re-calculated on
> changed headings when needed.

I agree that it is enough to invalidate cleaned heading on edit to 
refresh it in org-refile-get-targets. On the other hand, I still prefer 
text properties since they could be fetched even if some lines have been 
added or removed before. Position-based cache is useless in such cases.

Concerning typing latency, it should be postponed and resumed when no 
new edits is performed for certain period of time (~1s). However I am 
unsure if it is possible to accurately track all affected lines since 
later changes can add/remove lines before the line scheduled for 
invalidation.

On 30/04/2021 02:17, Tim Cross wrote:
> I suspect the reason it is not done automatically is that getting that
> right for all use cases is very hard to do without adding adverse impact
> to performance. A cache which is marked as 'dirty' too often results in
> too frequent cache refresh operations, but at the same time, determining
> what changes in an org file actually invalidate the cache can be a
> process intensive operation.

I believed that idea risen in this thread was to regenerate cache 
instead of spitting
"Please regenerate the refile cache with `C-0 C-c C-w'"
leaving more tricky cases for the user.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-05-01 14:48               ` Maxim Nikulin
@ 2021-05-02  6:59                 ` Ihor Radchenko
  2021-05-04 16:55                   ` Maxim Nikulin
  0 siblings, 1 reply; 18+ messages in thread
From: Ihor Radchenko @ 2021-05-02  6:59 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> Some additions. org-outline-path-cache is used solely by 
> org-refile-get-targets (maybe there are some calls in other packages) 
> but it efficiency is questionable. It was not clear for me earlier that 
> the cache is reset before each scan through a buffer. So if 
> org-refile-use is disabled, org-outline-path-cache from previous run of 
> org-refile or org-goto is not used as well. A query to 
> org-outline-path-cache requires at least one backward search and hash 
> lookup. During sequential scan in org-refile-get-targets it is enough to 
> have previous heading path to update it when new heading is found. I 
> think, org-outline-path-cache should be deprecated.

At least helm-org-ql and helm-org are using `org-get-outline-path' with
cache. I do think it is useful. Though, indeed, not as fast as
single-pass scan. Probably, we can rewrite the code of
`org-get-outline-path' instead of deprecating it?

>> Just cleanup heading text:
>
> I have realized what is wrong with this benchmark. It runs so fast 
> because it matches no headings, so it never spent time for cleaning them up.

Oh. You are right. I tested with fixed benchmark and it is indeed much slower.

> On 29/04/2021 21:12, Ihor Radchenko wrote:
>> For the cleaned heading text, I do not think that re-calculating the
>> heading text on each change is a good idea. It may degrade typing
>> latency. Yet, an acceptable approach could be simply invalidating cache
>> for the changed headings. Then, outline paths can be re-calculated on
>> changed headings when needed.
>
> I agree that it is enough to invalidate cleaned heading on edit to 
> refresh it in org-refile-get-targets. On the other hand, I still prefer 
> text properties since they could be fetched even if some lines have been 
> added or removed before. Position-based cache is useless in such cases.

Note that `org-refresh-category-properties' is using pretty much same
idea. However, there is no automatic invalidation and one needs to run
org-refresh-category-properties to ensure that cache is up to date.

Alternatively, markers can be used here. They will not be invalidated by
changes in buffer.

> Concerning typing latency, it should be postponed and resumed when no 
> new edits is performed for certain period of time (~1s). However I am 
> unsure if it is possible to accurately track all affected lines since 
> later changes can add/remove lines before the line scheduled for 
> invalidation.

As one option, we can accumulate the changed regions in some variable
and process them on timer. As I remember, org-element-cache is already
doing something similar. In fact, org-element-cache might be modified to
handle outline-path as well. The problem is that org-element-cache has
some difficult bugs. I tried using it multiple times and ran into issues
I was not able to debug. I may work on this in future.

Best,
Ihor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-04-29 19:17         ` Tim Cross
  2021-04-29 22:43           ` Samuel Wales
@ 2021-05-02  7:03           ` Ihor Radchenko
  1 sibling, 0 replies; 18+ messages in thread
From: Ihor Radchenko @ 2021-05-02  7:03 UTC (permalink / raw)
  To: Tim Cross; +Cc: emacs-orgmode

Tim Cross <theophilusx@gmail.com> writes:

> I suspect the reason it is not done automatically is that getting that
> right for all use cases is very hard to do without adding adverse impact
> to performance. A cache which is marked as 'dirty' too often results in
> too frequent cache refresh operations, but at the same time, determining
> what changes in an org file actually invalidate the cache can be a
> process intensive operation. Allowing the user to force cache refresh
> when needed is likely a reasonable compromise. 

Not really. `helm-org-ql' uses cache that is invalidated by _any_ change
in buffer + updating cache only when results are queried. Performance is
much better compared with built-in Org solutions.

Also, there is `org-element-use-cache' doing more granular cache
invalidation. You can enable it and test the performance. It is not
really degraded (except some hard-to-catch bugs existing in the
org-element-cache code).

Best,
Ihor


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-05-02  6:59                 ` Ihor Radchenko
@ 2021-05-04 16:55                   ` Maxim Nikulin
  2021-05-05  0:53                     ` Ihor Radchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Nikulin @ 2021-05-04 16:55 UTC (permalink / raw)
  To: emacs-orgmode

On 02/05/2021 13:59, Ihor Radchenko wrote:
> Maxim Nikulin writes:
> 
>> Some additions. org-outline-path-cache is used solely by
>> org-refile-get-targets (maybe there are some calls in other packages)
>> but it efficiency is questionable. It was not clear for me earlier that
>> the cache is reset before each scan through a buffer. So if
>> org-refile-use is disabled, org-outline-path-cache from previous run of
>> org-refile or org-goto is not used as well. A query to
>> org-outline-path-cache requires at least one backward search and hash
>> lookup. During sequential scan in org-refile-get-targets it is enough to
>> have previous heading path to update it when new heading is found. I
>> think, org-outline-path-cache should be deprecated.
> 
> At least helm-org-ql and helm-org are using `org-get-outline-path' with
> cache. I do think it is useful. Though, indeed, not as fast as
> single-pass scan. Probably, we can rewrite the code of
> `org-get-outline-path' instead of deprecating it?

helm-org-ql.el that is a part of org-ql does not use cache when calling 
org-get-outline-path. helm-org performs sequential scan similar to 
org-refile-get-targets. Maybe a new function should be introduced to 
address such case.

On 29/04/2021 21:12, Ihor Radchenko wrote:
> Note that Org mode has integration with imenu. It is an alternative you
> can use to jump around current buffer.

I do not see any advantages of imenu over jumping using org-goto or 
org-refile with enabled cache.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Bug: fragile org refile cache
  2021-05-04 16:55                   ` Maxim Nikulin
@ 2021-05-05  0:53                     ` Ihor Radchenko
  0 siblings, 0 replies; 18+ messages in thread
From: Ihor Radchenko @ 2021-05-05  0:53 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> helm-org-ql.el that is a part of org-ql does not use cache when calling 
> org-get-outline-path. helm-org performs sequential scan similar to 
> org-refile-get-targets.

Hmm. You are right. But they could. I myself ran into an issue with
helm-org-ql exactly because formatting displayed headings was slow. If
org-refile-use-cache is deprecated, there would be no option to use
cache at all.

Best,
Ihor


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-05-05  0:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 16:09 [PATCH] Bug: fragile org refile cache Maxim Nikulin
2021-04-29  0:50 ` Samuel Wales
2021-04-29  1:29   ` Ihor Radchenko
2021-04-29  1:34     ` Samuel Wales
2021-04-29 12:45       ` Maxim Nikulin
2021-04-29 14:12         ` Ihor Radchenko
2021-04-29 15:04           ` Maxim Nikulin
2021-04-29 16:08             ` Ihor Radchenko
2021-04-29 16:51               ` Maxim Nikulin
2021-04-30 16:56               ` Maxim Nikulin
2021-05-01 14:48               ` Maxim Nikulin
2021-05-02  6:59                 ` Ihor Radchenko
2021-05-04 16:55                   ` Maxim Nikulin
2021-05-05  0:53                     ` Ihor Radchenko
2021-04-29 13:30       ` Ihor Radchenko
2021-04-29 19:17         ` Tim Cross
2021-04-29 22:43           ` Samuel Wales
2021-05-02  7:03           ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).