From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#52753: 29.0.50; Printing long list-like structures fails Date: Sun, 23 Jan 2022 20:44:48 +0800 Message-ID: <8735le7ha7.fsf@localhost> References: <22AEBE6E-68E7-47D9-885C-AE6621AF4AEE@acm.org> <87o856b7ak.fsf@localhost> <5D2E995F-7A93-4A75-8AE9-AFCE02399D4E@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25302"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 52753@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Jan 23 13:42:33 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nBcCW-0006NY-5G for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 23 Jan 2022 13:42:32 +0100 Original-Received: from localhost ([::1]:41846 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nBcCV-0008Io-0b for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 23 Jan 2022 07:42:31 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:38324) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nBcB4-0008I4-Nv for bug-gnu-emacs@gnu.org; Sun, 23 Jan 2022 07:41:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:45869) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nBcB4-0005JF-DQ for bug-gnu-emacs@gnu.org; Sun, 23 Jan 2022 07:41:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1nBcB4-0001gq-BY for bug-gnu-emacs@gnu.org; Sun, 23 Jan 2022 07:41:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 23 Jan 2022 12:41:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52753 X-GNU-PR-Package: emacs Original-Received: via spool by 52753-submit@debbugs.gnu.org id=B52753.16429416316438 (code B ref 52753); Sun, 23 Jan 2022 12:41:02 +0000 Original-Received: (at 52753) by debbugs.gnu.org; 23 Jan 2022 12:40:31 +0000 Original-Received: from localhost ([127.0.0.1]:38772 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nBcAY-0001fk-EM for submit@debbugs.gnu.org; Sun, 23 Jan 2022 07:40:31 -0500 Original-Received: from mail-pj1-f50.google.com ([209.85.216.50]:44887) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nBcAW-0001fO-JX for 52753@debbugs.gnu.org; Sun, 23 Jan 2022 07:40:29 -0500 Original-Received: by mail-pj1-f50.google.com with SMTP id nn16-20020a17090b38d000b001b56b2bce31so2803588pjb.3 for <52753@debbugs.gnu.org>; Sun, 23 Jan 2022 04:40:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=j1d4lI+LMjvqDcJpg9kwHlQbeUQe9/Jmz+9hvKihZW0=; b=Azt/K3QJphxWp93uQHg/dbSqnB3XRViS75f+LbKo6G2NgM5pFen9tyC497TnTwg4I6 uxZ4lF6luusyyDN9yUP37llyT3etNd4OJyC8a8CeizosbX4TUrMfpOp+N/aKARQYsOMS chQ1x+1PB2rRfKCYzBRJ+v5jTxbyw2r3fsnQdwO2Wf4n4+wiAG68zK5rWWcNCVM6tfnG wXUYsH98/X+KlsVbZ6P8oEBlu1y66bm8O77menuf/gDZ4m9SVf2x81QISQpYcbvoQC36 zK46n6Y+3Z14OJ+eAlwFxci+Kczba5SabDxn0DoRleXILNuTIlbKRmUcIL6t2hdN7goP YUmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=j1d4lI+LMjvqDcJpg9kwHlQbeUQe9/Jmz+9hvKihZW0=; b=wsPdLWouf1GE6rdEu67GsQLyn8NZC79wokssznVCjJyPPaALKKwRj48E6a/6OEh9fC as+NqVmIEejxIwkIl0jiyPncW+50c7b6UVZToB4pi2V7Jadx5H+I7Tg9E/aHDH3XrMHe RGZESDpUtqNtRjTQy4IzmNycFsso8UZsulLJuyKMNQJp7qoBtTLE295DbFgU1Bipa4qo MZlzWTkquFT91ItdSdS3Ss0RSujA2j52LSazQXPqP7GiS1RMiz98lblAhD5LVM1159qH cOoAQcwANYWhPH0YMVnHJjieaNyHUaTk/yZ6kGORgQtuob/ioar4NtsqmuvU/YY+BclP RKxw== X-Gm-Message-State: AOAM530gAIMw5ZD1Po3WRvwLmF0/WD0mFx7KYz0WLaiBz7CLelIyGzwA AONZ+1M3/N2BqkIRGbuNQvxR3NOu66A= X-Google-Smtp-Source: ABdhPJxp0WhWjlL0vMlCJmYYM+VXgzjP2BEp7HXDaAmIj4oe78aEWIIgEmJ2WGQNCw1sirPRMvH7jQ== X-Received: by 2002:a17:90b:4f8b:: with SMTP id qe11mr3013557pjb.118.1642941622566; Sun, 23 Jan 2022 04:40:22 -0800 (PST) Original-Received: from localhost ([103.125.234.161]) by smtp.gmail.com with ESMTPSA id n35sm8672966pgb.25.2022.01.23.04.40.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 23 Jan 2022 04:40:21 -0800 (PST) In-Reply-To: <5D2E995F-7A93-4A75-8AE9-AFCE02399D4E@acm.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:224906 Archived-At: Mattias Engdeg=C3=A5rd writes: > An alternative would be writing a special reader and printer specifically= for marshalling Lisp data structures: they could be faster and simpler bec= ause they wouldn't need to recognise the full Lisp syntax, nor respect the = various reader and printer options. The result could also be made more comp= act since it wouldn't be intended for human readers. > > However, the code would still need to detect shared structure (your skip-= list use these a lot!) and the basic Lisp syntax is fairly space-efficient,= so it may be better to fix the stuff we already have. Maybe you want to gi= ve it a go? The idea to provide a special reader/printer sounds reasonable. If there was a way to tell Emacs how to write some specific data structure, things would be easier. However, I am not sure what should be the mechanism to deal with malformed data structures and with circular objects. I tried to understand the printer code, but it appears to be too complex for me. Maybe I am just not familiar enough with the C part of Emacs. >> Skip list is conceptually very similar to ordinary list. Maybe the >> existing list reader functionality can be somehow adapted to read >> list-like structures? > > Your skip list is nothing like an ordinary list; it is deeply recursive i= n the 'car' slot, and even alternates conses and vectors for each level. In= serting the numbers 1..4 results in > > [org-skip-list- 16 2 0.367879441171 #3=3D(org-skip-list--head . [(1 . [#2= =3D(2 . [#1=3D(3 . [(4 . [nil]) nil]) #1#])]) #2# nil nil nil nil nil nil n= il nil nil nil nil nil nil nil]) [#1# #1# #3# #3# #3# #3# #3# #3# #3# #3# #= 3# #3# #3# #3# #3# #3#] <] > > where the hurting part is [(1 . [#2=3D(2 . [#1=3D(3 . [(4 . [... -- a dep= th of 2N for N elements. There is also a lot of shared structure (all the #= k#) imposing further work on the printer and reader. While I understand how the illustrated example is difficult for the reader, note that recursion is _not_ in the car slot. Each list element is a structure like (key . forward-vector) with key being the stored value and forward-vector storing forward references to the next elements. Conceptually, ordinary list is just a degenerate case of skip list with forward-vector storing a single forward-reference to the next element in the list. Of course, my remark does not help much to solve the observed bug. With better understanding of Elisp printer, I can probably construct an alternative data structure that can be printed without issues. I think that a "dummy" ordinary list storing all the keys can be used to prevent the recursion: [org-skip-list- 16 2 0.367879441171 (#1=3D1 #2=3D2 #3=3D3 #4=3D4) <- dummy = list (org-skip-list--head . [(#1 . [(#2 . [(#3 . [(#4 . [nil]) nil]) #3])]) #2 n= il ...])] >> I know that balanced trees have the same asymptotic behaviour with skip >> lists and better worst-case behaviour. However, the original Pugh's paper >> introducing skip lists did a comparison between skip list and >> self-balancing trees. > > That's true (I remember being fascinated by that paper) but it was writte= n in 1989 and some things have changed since. In particular, memory localit= y matters more today. Random number generators have made strides but they a= re often expensive, and the one in Emacs isn't likely to be particularly fa= st (nor good). > > Skip-lists have some properties that make them interesting for some uses,= such as being fairly amenable to lock-free operation, but that's not of in= terest in Emacs today. Since your last message, I have dived into the more recent literature on data structures. It looks like skip lists are not that bad - similar to even older AVL-tree they got updated with finger search facilities and auto-adjustments to non-random data [1,2]. However, a systematic comparison between AVL-trees, skip lists, and splay trees reveals that insertion into skip lists perform up to 2x worse compared to AVL-trees, especially for random insertion [3]. Ref. [3] generally agrees with my measurements. On the other hand, Ref. [3] showed that average lookup time should be 2x faster in skip lists - very different from my implementation. I may be doing something wrong. Ref. [3] also introduces some attractive behaviour of splay trees: they outperform AVL-trees in random+non-random search and non-random insertion. [1] Jonathan J. Pittard, Alan L. Tharp [Springer Berlin Heidelberg] (2010) Simplified Self-Adapting Skip Lists https://doi.org/10.1007/978-3-642-15381-5_16 [2] W. Pugh (1990) A skip list cookbook https://www.semanticscholar.org/paper/A-skip-list-cookbook-Pugh/83ebde23871= ce9f839ad82e0247a7481410f994e [3] Hassan Adelyar. Sayed (2008) The Complexity of Splay Trees and Skip Lists https://www.semanticscholar.org/paper/The-Complexity-of-Splay-Trees-and-Ski= p-Lists-Sayed/a0b7d3662afa5b8906861d33f22fdd34fee8f402 >> The common operations on the cache are: (1) random element >> lookups (element at point while user edits the Org buffer); (2) >> iterating cache elements one-by-one to filter them (agenda views, >> column views, sparse trees, etc); (3) Deleting or inserting a series of >> adjacent elements upon file modification (normal user edits). (4) >> Combination of cache iteration with adding/removing elements in-place >> (agenda lookups when parts of the cache should be calculated as we find >> that some elements are not yet in cache). > > (3) is the killer here since it rules out hash tables, or does it? Is mai= ntaining element order a strict requirement? Otherwise just go with hash ta= bles. I think I need to explain the Org element cache structure a but. The cache stores syntactic elements in Org buffers. For example: * headline 1 ** subheadline * headline 2 will be represented in cache like the following: (:begin 1 "headline 1" ...) (:begin 13 "subheadline" ...) (:begin 28 "headline 2" ...) The :begin values are buffer positions where the headlines begin. They are also used as data structure sorting keys to allow quick queries about syntax element at point. If the user edits the file: * headline 1 ** new ** subheadline * headline 2 we have to insert a new element and shift the :begin keys of all the elements below: (:begin 1 "headline 1" ...) (:begin 13 "new" ...) (:begin 13+7 "subheadline" ...) (:begin 28+7 "headline 2" ...) As you can see, cache does not just have to handle frequent lookups and modifications, but also support key modification. Hash tables cannot be used. The cache queries and element additions/deletions generally happen interchangeably. A typical pattern happens when user types in buffer: something like flyspell will query cache about syntax element at point followed by file modification by user; then again query, modification, ... >> - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.44 sec (often t= riggers garbage collection) > > Most data structures allow for fast insertion of consecutive elements as = a special operation, typically in more-or-less linear time. I'm sure we cou= ld add it to the standard AVL implementation if it would make a difference. Adding search fingers to built-in AVL trees will certainly make a difference. From example above, you can see that typical pattern is query+modification in certain narrow key range (in narrow region in buffer). If Emacs AVL tree supported search fingers stable against modifications, we could improve typical search time to O(1) and somewhat improve insertion time. > Since I'm not sure what your keys are (integers, strings, ...) nor their = distribution (dense, sparse, clustered...) it's hard to recommend precise s= tructures but some alternatives are: We use integers (buffer positions). Distribution is unknown a priori, but may not be uniform (at least, it is not uniform in my files). > - red-black-trees: typically similar to AVL trees but often easier to imp= lement > - radix trees: very fast for dense or sparse-but-clustered integers > - B-trees: often very fast because of their good locality, many variants > - tries: many variants, typically tailored for a specific key type (eg AS= CII strings). I am currently thinking about splay trees because of their nice performance for local queries. B-trees may be an option, but I do not see how they would be advantageous compared to AVL trees. We do everything in memory. > Then there are implementation aspects to consider. For example, suppose y= ou want to store three values in a tree node (value and left and right subt= rees). You could use: > > - vector: [L R X] > - list: (L R X) > - improper list: (L R . X) This is really counter-intuitive. I am wondering how much can be the difference in practice. At least by orders of magnitude. Do you have any examples? > They all have their advantages and drawbacks, and you don't know which on= e is best until you measure. The result is often counter-intuitive. Also re= member that Elisp has efficient boolean vectors, so if you need an extra bi= t per node it may be more economical to gather them all in one vector inste= ad of wasting 8-16 bytes for each bit. Even bignums can be harnessed for th= e occasional profit. I need to think about it. org-element.el is really not using memory effectively. A typical syntax element is structured as a list: (type (:prop1 val1 :prop2 val2 ... )) with a dozen+ properties. Accessing the element properties may take comparable time to the actual data structure queries, but changing the element data structure may be hard - there may be back-compatibility issues... Best, Ihor