bug#52753: 29.0.50; Printing long list-like structures fails

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

From: Ihor Radchenko <yantar92@gmail.com>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: 52753@debbugs.gnu.org
Subject: bug#52753: 29.0.50; Printing long list-like structures fails
Date: Fri, 24 Dec 2021 18:56:19 +0800	[thread overview]
Message-ID: <87o856b7ak.fsf@localhost> (raw)
In-Reply-To: <22AEBE6E-68E7-47D9-885C-AE6621AF4AEE@acm.org>

Mattias Engdegård <mattiase@acm.org> writes:

> The elisp printer isn't really geared towards printing deeply nested values since it uses recursion. Perhaps you could just print the contents of your skip list as a flat list or vector for serialization purposes, and rebuild the structure when loading it back in again.

Thanks for the suggestion.

It is probably what I have to do anyway to support older Emacs. However,
it will not solve the problem revealed in this bug report.

I understand that the current implementations of the read/prin1 are
recursive. Yet, they are able to read ordinary lists in non-recursive
way.

Skip list is conceptually very similar to ordinary list. Maybe the
existing list reader functionality can be somehow adapted to read
list-like structures?

And more generally, I suspect that M-x memory-report suffers from
similar problem with prin1 - treating list-like structure recursively.
While you showed how to circumvent the printing/reading of skip lists, I
do not see how to fix the memory-report behaviour and the Emacs crash
(which, I guess, may also be related to treating list-likes
recursively).

I would like to highlight the crash reproducer especially. I may accept
limitation for printer/reader, but crash cannot be tolerated.

<end of on-topic>
-------
<begin of off-topic to the bug report>

> (I'd use a balanced tree-like structure instead of a skip list. Performance is similar, and allows for immutability and persistence.)

This is not directly related to my bug report, but if you have a good
knowledge of using data structures in practice, I would appreciate
comments.

I know that balanced trees have the same asymptotic behaviour with skip
lists and better worst-case behaviour. However, the original Pugh's paper
introducing skip lists did a comparison between skip list and
self-balancing trees. Pugh et al [1] showed that insertion time in skip
lists is still O(log N), but with smaller constant. Moreover, he also
showed that skip lists can be more performant when the data structure is
edited locally [2].

[1] W Pugh [Springer Berlin Heidelberg] (1989) A Probabilistic
    Alternative to Balanced Trees. Univ
    https://doi.org/10.1007/3-540-51542-9_36
[2] W. Pugh (1990) A skip list cookbook
    https://www.semanticscholar.org/paper/A-skip-list-cookbook-Pugh/83ebde23871ce9f839ad82e0247a7481410f994e

The problem I am trying to solve is performance of parser cache for Org
mode files. The common operations on the cache are: (1) random element
lookups (element at point while user edits the Org buffer); (2)
iterating cache elements one-by-one to filter them (agenda views,
column views, sparse trees, etc); (3) Deleting or inserting a series of
adjacent elements upon file modification (normal user edits). (4)
Combination of cache iteration with adding/removing elements in-place
(agenda lookups when parts of the cache should be calculated as we find
that some elements are not yet in cache).

The current implementation of org-element-cache is indeed using balanced
AVL tree (built-in Emacs implementation). However, given that many
cache operations are not random, but involve groups of subsequent
elements, I wanted to give skip list a try.

The benchmarks (see code below) of skip list vs. AVL-tree give the
following results:

Skip list:
 - Insertion of 200k random numbers:                    0.80 sec
 - Lookup of 200k random numbers:                       0.73 sec
 - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.42 sec
 - Lookup of 200k subsequent numbers:                   0.32 sec

AVL tree:
 - Insertion of 200k random numbers:                    0.41 sec
 - Lookup of 200k random numbers:                       0.18 sec
 - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.44 sec (often triggers garbage collection)
 - Lookup of 200k subsequent numbers:                   0.17 sec

My implementation of skip list is comparable for subsequent number
queries and 2-4x worse for random queries.

A more tricky benchmark is iterating the list/tree and simultaneously
altering it (a common operation in agenda/sparse tree code):

For both Skip list and AVL tree I first create a 100k numbers (0, 2, 4,
..., 200000) and then iterate over the data structure adding the
remaining odd numbers on every iteration.

For skip list: Elapsed time: 0.509455s
For AVL tree:  Elapsed time: 1.348294s

This time, the skip list "wins" also requiring a _lot_ less code (see
below). Every insertion to AVL tree requires a O(log N) extra
comparisons to find the new element and O(log N) extra memory for new
stack.

Finally, a real-world test for a complex agenda query on a 15Mb Org
file (the data is output from elp-results):

For skip list:
;; org-element-cache-map  1986        18.579660950  0.0093553176
;; org-element-cache-map  1986        19.202719912  0.0096690432
For AVL tree:
;; org-element-cache-map  1986        21.047835249  0.0105981043
;; org-element-cache-map  1986        20.733410823  0.0104397838

AVL tree is 10% slower.

Given that most of the CPU time is devoted to actual parsing, 10%
overall difference when switching from AVL tree to skip list is very
good. In fact, my daily usage of skip-list branch when most of the file
elements are already cached feels a lot snappier (though I have no
consistent benchmark to test this and may fall into cognitive bias).

In summary, skip list appears to give benefit in terms to overall speed
(within context of grammar cache). The current implementation of skip
list + org-element-cache is slightly faster compared to AVL tree and
feels much snappier. However, the difference is indeed just a constant
and probably depends on query patterns. If built-in AVL tree is
optimised, the performance benefit may disappear (or not, if one
optimises my skip list implementation better).

The other aspect is ease of usage. AVL tree is trickier to work with
(see below code example). Though I do not have enough practical
experience with skip lists to know the possible code pitfalls years
later.

Any commends are welcome.

-----

The code:

;; Random queries
(require 'org-skip-list)
(setq skiplist (org-skip-list-create #'<))
(benchmark-run 200000 (org-skip-list-insert skiplist (random 1000000000)))
(benchmark-run 200000 (org-skip-list-find skiplist (random 1000000000)))

;; Ordered queries
(require 'org-skip-list)
(setq skiplist (org-skip-list-create #'<))
(setq i 0)
(benchmark-run 200000 (org-skip-list-insert skiplist i) (cl-incf i))
(setq i 0)
(benchmark-run 200000 (org-skip-list-find skiplist i) (cl-incf i))

;; Random queries
(require 'avl-tree)
(setq tree (avl-tree-create #'<))
(benchmark-run 200000 (avl-tree-enter tree (random 1000000000)))
(benchmark-run 200000 (avl-tree-member-p tree (random 1000000000)))

;; Ordered queries
(require 'avl-tree)
(setq tree (avl-tree-create #'<))
(setq i 0)
(benchmark-run 200000 (avl-tree-enter tree i) (cl-incf i))
(setq i 0)
(benchmark-run 200000 (avl-tree-member-p tree i) (cl-incf i))

;; Modification during iteration
(require 'org-skip-list)
(setq skiplist (org-skip-list-create #'<))
(let ((i 0))
  (while (<= i 200000)
    (when (= 0 (% i 2))
      (org-skip-list-insert skiplist i))
    (setq i (+ i 2))))
(benchmark-progn
  (iter-do (data (org-skip-list-iter skiplist))
    (when (< data 200000)
      (org-skip-list-insert skiplist (1+ data)))))

;; Modification during iteration
(require 'avl-tree)
(setq tree (avl-tree-create #'<))
(let ((i 0))
  (while (<= i 200000)
    (when (= 0 (% i 2))
      (avl-tree-enter tree i))
    (setq i (+ i 2))))
(benchmark-progn
  (let ((node (avl-tree--node-left (avl-tree--dummyroot tree)))
	data continue-flag
	(stack (list nil))
	(leftp t)
	(start 0))
    (while (and node (<= start 200000))
      (setq data (avl-tree--node-data node))
      (if (and leftp (avl-tree--node-left node) ; Left branch.
	       ;; ... or when we are before START.
	       (< start data))
          (progn (push node stack)
		 (setq node (avl-tree--node-left node)))
	(unless (< data start)
	  (if (= start data)
	      (cl-incf start)
	    (avl-tree-enter tree start)
            (setq node (avl-tree--node-left (avl-tree--dummyroot tree))
		  stack (list nil)
		  leftp t
		  continue-flag t)))
	(if continue-flag
	    (setq continue-flag nil)
	  (setq node (if (and (car stack)
			      ;; If START advanced beyond stack parent, skip the right branch.
			      (< (avl-tree--node-data (car stack)) start))
			 (progn
                           (setq leftp nil)
                           (pop stack))
		       ;; Otherwise, move ahead into the right
		       ;; branch when it exists.
		       (if (setq leftp (avl-tree--node-right node))
		           (avl-tree--node-right node)
			 (pop stack)))))))))

Best,
Ihor

next prev parent reply	other threads:[~2021-12-24 10:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-23 11:07 bug#52753: 29.0.50; Printing long list-like structures fails Ihor Radchenko
2021-12-23 16:11 ` Mattias Engdegård
2021-12-24 10:56   ` Ihor Radchenko [this message]
2021-12-24 16:54     ` Mattias Engdegård
2021-12-25 11:15       ` Mattias Engdegård
2022-01-23 12:44       ` Ihor Radchenko
2022-01-23 16:28         ` Mattias Engdegård
2022-01-30  9:16           ` Ihor Radchenko
2022-01-30 10:16             ` Mattias Engdegård
2022-05-31 10:31 ` Mattias Engdegård
2022-06-02 13:12   ` Ihor Radchenko
2022-06-02 16:10     ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o856b7ak.fsf@localhost \
    --to=yantar92@gmail.com \
    --cc=52753@debbugs.gnu.org \
    --cc=mattiase@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).