bug#52753: 29.0.50; Printing long list-like structures fails

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

From: "Mattias Engdegård" <mattiase@acm.org>
To: Ihor Radchenko <yantar92@gmail.com>
Cc: 52753@debbugs.gnu.org
Subject: bug#52753: 29.0.50; Printing long list-like structures fails
Date: Fri, 24 Dec 2021 17:54:45 +0100	[thread overview]
Message-ID: <5D2E995F-7A93-4A75-8AE9-AFCE02399D4E@acm.org> (raw)
In-Reply-To: <87o856b7ak.fsf@localhost>

24 dec. 2021 kl. 11:56 skrev Ihor Radchenko <yantar92@gmail.com>:

> I understand that the current implementations of the read/prin1 are
> recursive. Yet, they are able to read ordinary lists in non-recursive
> way.

Serves me right for not being precise enough! The Emacs lisp printer and reader are recursive in general, but no recursion is required for the standard list structure with links in the 'cdr' of each element. They do use recursion for reading and printing structures in the 'car' field, however. Consider:

(defun pearl (n)
  (let ((x 'grain-of-sand))
    (dotimes (_ n)
      (setq x (list x)))
    x))

If you try to print (pearl 10000) you probably get a bogus complaint about a possible circular list structure, and at least on my machine, evaluating

(let ((print-circle t))
  (print (pearl 100000)))

crashes Emacs, presumably because from C stack exhaustion.

This is of course a bug and while it would be nice to see it fixed (obviously), the effort required to do so may not necessarily stand in proportion to the severity of the bug -- but then again, reasonable people may disagree!

In particular, fixing it would entail a rather thorough reworking of the already quite complex reader and printer, replacing recursion with explicitly managed temporary heap-allocated stacks. Special care would be required not to make the common case more expensive, since it is so heavily used.

An alternative would be writing a special reader and printer specifically for marshalling Lisp data structures: they could be faster and simpler because they wouldn't need to recognise the full Lisp syntax, nor respect the various reader and printer options. The result could also be made more compact since it wouldn't be intended for human readers.

However, the code would still need to detect shared structure (your skip-list use these a lot!) and the basic Lisp syntax is fairly space-efficient, so it may be better to fix the stuff we already have. Maybe you want to give it a go?

> Skip list is conceptually very similar to ordinary list. Maybe the
> existing list reader functionality can be somehow adapted to read
> list-like structures?

Your skip list is nothing like an ordinary list; it is deeply recursive in the 'car' slot, and even alternates conses and vectors for each level. Inserting the numbers 1..4 results in

[org-skip-list- 16 2 0.367879441171 #3=(org-skip-list--head . [(1 . [#2=(2 . [#1=(3 . [(4 . [nil]) nil]) #1#])]) #2# nil nil nil nil nil nil nil nil nil nil nil nil nil nil]) [#1# #1# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3# #3#] <]

where the hurting part is [(1 . [#2=(2 . [#1=(3 . [(4 . [... -- a depth of 2N for N elements. There is also a lot of shared structure (all the #k#) imposing further work on the printer and reader.

> And more generally, I suspect that M-x memory-report suffers from
> similar problem with prin1 - treating list-like structure recursively.

True! In one respect this is more severe, since someone invoking memory-report may already be somewhat short on memory, so a solution that uses an explicitly managed stack may fail from heap exhaustion, or from thrashing (which is usually worse). We could perhaps use the old flip-the-pointers trick to do the counting in O(1) space (first demonstrated by Knuth, I believe). It works on general graphs but can be a bit slow.

> I know that balanced trees have the same asymptotic behaviour with skip
> lists and better worst-case behaviour. However, the original Pugh's paper
> introducing skip lists did a comparison between skip list and
> self-balancing trees.

That's true (I remember being fascinated by that paper) but it was written in 1989 and some things have changed since. In particular, memory locality matters more today. Random number generators have made strides but they are often expensive, and the one in Emacs isn't likely to be particularly fast (nor good).

Skip-lists have some properties that make them interesting for some uses, such as being fairly amenable to lock-free operation, but that's not of interest in Emacs today.

> The common operations on the cache are: (1) random element
> lookups (element at point while user edits the Org buffer); (2)
> iterating cache elements one-by-one to filter them (agenda views,
> column views, sparse trees, etc); (3) Deleting or inserting a series of
> adjacent elements upon file modification (normal user edits). (4)
> Combination of cache iteration with adding/removing elements in-place
> (agenda lookups when parts of the cache should be calculated as we find
> that some elements are not yet in cache).

(3) is the killer here since it rules out hash tables, or does it? Is maintaining element order a strict requirement? Otherwise just go with hash tables.

> The benchmarks (see code below) of skip list vs. AVL-tree give the
> following results:

First let me commend you for making actual measurements! Of course, the usual benchmarking caveats apply.

> - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.44 sec (often triggers garbage collection)

Most data structures allow for fast insertion of consecutive elements as a special operation, typically in more-or-less linear time. I'm sure we could add it to the standard AVL implementation if it would make a difference.

Since I'm not sure what your keys are (integers, strings, ...) nor their distribution (dense, sparse, clustered...) it's hard to recommend precise structures but some alternatives are:

- red-black-trees: typically similar to AVL trees but often easier to implement
- radix trees: very fast for dense or sparse-but-clustered integers
- B-trees: often very fast because of their good locality, many variants
- tries: many variants, typically tailored for a specific key type (eg ASCII strings).

I'm not sure if your usage benefits from persistence but it's nice to have in many cases.

Then there are implementation aspects to consider. For example, suppose you want to store three values in a tree node (value and left and right subtrees). You could use:

- vector: [L R X]
- list: (L R X)
- improper list: (L R . X)

They all have their advantages and drawbacks, and you don't know which one is best until you measure. The result is often counter-intuitive. Also remember that Elisp has efficient boolean vectors, so if you need an extra bit per node it may be more economical to gather them all in one vector instead of wasting 8-16 bytes for each bit. Even bignums can be harnessed for the occasional profit.

> In summary, skip list appears to give benefit in terms to overall speed
> (within context of grammar cache). The current implementation of skip
> list + org-element-cache is slightly faster compared to AVL tree and
> feels much snappier.

It could very well be, but it's far from a double-blind trial. Few things are without limit; one of them is Man's capacity for self-deception.

> The other aspect is ease of usage. AVL tree is trickier to work with
> (see below code example).

Looks like you are using it in a somewhat unorthodox way; those double hyphens are there for a reason. If you implement a data structure for your own use, you will of course make it handy for those specific purposes. It is not uncommon for data structures to provide cursors that help with your kind of local in-place editing.

next prev parent reply	other threads:[~2021-12-24 16:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-23 11:07 bug#52753: 29.0.50; Printing long list-like structures fails Ihor Radchenko
2021-12-23 16:11 ` Mattias Engdegård
2021-12-24 10:56   ` Ihor Radchenko
2021-12-24 16:54     ` Mattias Engdegård [this message]
2021-12-25 11:15       ` Mattias Engdegård
2022-01-23 12:44       ` Ihor Radchenko
2022-01-23 16:28         ` Mattias Engdegård
2022-01-30  9:16           ` Ihor Radchenko
2022-01-30 10:16             ` Mattias Engdegård
2022-05-31 10:31 ` Mattias Engdegård
2022-06-02 13:12   ` Ihor Radchenko
2022-06-02 16:10     ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D2E995F-7A93-4A75-8AE9-AFCE02399D4E@acm.org \
    --to=mattiase@acm.org \
    --cc=52753@debbugs.gnu.org \
    --cc=yantar92@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).