From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#52753: 29.0.50; Printing long list-like structures fails Date: Fri, 24 Dec 2021 17:54:45 +0100 Message-ID: <5D2E995F-7A93-4A75-8AE9-AFCE02399D4E@acm.org> References: <22AEBE6E-68E7-47D9-885C-AE6621AF4AEE@acm.org> <87o856b7ak.fsf@localhost> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8192"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 52753@debbugs.gnu.org To: Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Dec 24 17:56:11 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1n0nrW-0001wj-RA for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 24 Dec 2021 17:56:11 +0100 Original-Received: from localhost ([::1]:54500 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n0nrV-0001ts-LE for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 24 Dec 2021 11:56:09 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:57268) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n0nqm-0001tR-GP for bug-gnu-emacs@gnu.org; Fri, 24 Dec 2021 11:55:25 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:54644) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n0nqQ-0004qT-JS for bug-gnu-emacs@gnu.org; Fri, 24 Dec 2021 11:55:17 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1n0nqQ-0002Ij-D5 for bug-gnu-emacs@gnu.org; Fri, 24 Dec 2021 11:55:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 24 Dec 2021 16:55:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52753 X-GNU-PR-Package: emacs Original-Received: via spool by 52753-submit@debbugs.gnu.org id=B52753.16403648978819 (code B ref 52753); Fri, 24 Dec 2021 16:55:02 +0000 Original-Received: (at 52753) by debbugs.gnu.org; 24 Dec 2021 16:54:57 +0000 Original-Received: from localhost ([127.0.0.1]:37957 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n0nqK-0002I9-Ee for submit@debbugs.gnu.org; Fri, 24 Dec 2021 11:54:57 -0500 Original-Received: from mail236c50.megamailservers.eu ([91.136.10.246]:51504 helo=mail56c50.megamailservers.eu) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n0nqF-0002Hr-53 for 52753@debbugs.gnu.org; Fri, 24 Dec 2021 11:54:54 -0500 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1640364888; bh=kMewvVl1JDsBVgZxBRMAAtDDT28a9u3TikA1Jpwc9RQ=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=hg99JplvUj66xPQ6URJraD+aWxGrirBKlUX+0ouFH2vojxNNorjz3Ig+9WbEIp95/ BeaGcE4a+iaxPLfJWsgb2LMTSHrVpVYPtoM5sQayNpD4Kqj4eAVAh1cRjI6PPqRtZc kyM2FRp40yFgAxAWRMjzIN7uf6Xu7HFCghh6j+rE= Feedback-ID: mattiase@acm.or Original-Received: from smtpclient.apple (c188-150-171-71.bredband.tele2.se [188.150.171.71]) (authenticated bits=0) by mail56c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 1BOGskmp014976; Fri, 24 Dec 2021 16:54:47 +0000 In-Reply-To: <87o856b7ak.fsf@localhost> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-CTCH-RefID: str=0001.0A742F16.61C5FB58.004D, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.4 cv=G/d/r/o5 c=1 sm=1 tr=0 ts=61c5fb58 a=SF+I6pRkHZhrawxbOkkvaA==:117 a=SF+I6pRkHZhrawxbOkkvaA==:17 a=kj9zAlcOel0A:10 a=M51BFTxLslgA:10 a=pGLkceISAAAA:8 a=3j4zgVDsEBfsLP-EzlkA:9 a=CjuIK1q_8ugA:10 X-Origin-Country: SE X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:223062 Archived-At: 24 dec. 2021 kl. 11:56 skrev Ihor Radchenko : > I understand that the current implementations of the read/prin1 are > recursive. Yet, they are able to read ordinary lists in non-recursive > way. Serves me right for not being precise enough! The Emacs lisp printer and = reader are recursive in general, but no recursion is required for the = standard list structure with links in the 'cdr' of each element. They do = use recursion for reading and printing structures in the 'car' field, = however. Consider: (defun pearl (n) (let ((x 'grain-of-sand)) (dotimes (_ n) (setq x (list x))) x)) If you try to print (pearl 10000) you probably get a bogus complaint = about a possible circular list structure, and at least on my machine, = evaluating (let ((print-circle t)) (print (pearl 100000))) crashes Emacs, presumably because from C stack exhaustion. This is of course a bug and while it would be nice to see it fixed = (obviously), the effort required to do so may not necessarily stand in = proportion to the severity of the bug -- but then again, reasonable = people may disagree! In particular, fixing it would entail a rather thorough reworking of the = already quite complex reader and printer, replacing recursion with = explicitly managed temporary heap-allocated stacks. Special care would = be required not to make the common case more expensive, since it is so = heavily used. An alternative would be writing a special reader and printer = specifically for marshalling Lisp data structures: they could be faster = and simpler because they wouldn't need to recognise the full Lisp = syntax, nor respect the various reader and printer options. The result = could also be made more compact since it wouldn't be intended for human = readers. However, the code would still need to detect shared structure (your = skip-list use these a lot!) and the basic Lisp syntax is fairly = space-efficient, so it may be better to fix the stuff we already have. = Maybe you want to give it a go? > Skip list is conceptually very similar to ordinary list. Maybe the > existing list reader functionality can be somehow adapted to read > list-like structures? Your skip list is nothing like an ordinary list; it is deeply recursive = in the 'car' slot, and even alternates conses and vectors for each = level. Inserting the numbers 1..4 results in [org-skip-list- 16 2 0.367879441171 #3=3D(org-skip-list--head . [(1 . = [#2=3D(2 . [#1=3D(3 . [(4 . [nil]) nil]) #1#])]) #2# nil nil nil nil nil = nil nil nil nil nil nil nil nil nil]) [#1# #1# #3# #3# #3# #3# #3# #3# = #3# #3# #3# #3# #3# #3# #3# #3#] <] where the hurting part is [(1 . [#2=3D(2 . [#1=3D(3 . [(4 . [... -- a = depth of 2N for N elements. There is also a lot of shared structure (all = the #k#) imposing further work on the printer and reader. > And more generally, I suspect that M-x memory-report suffers from > similar problem with prin1 - treating list-like structure recursively. True! In one respect this is more severe, since someone invoking = memory-report may already be somewhat short on memory, so a solution = that uses an explicitly managed stack may fail from heap exhaustion, or = from thrashing (which is usually worse). We could perhaps use the old = flip-the-pointers trick to do the counting in O(1) space (first = demonstrated by Knuth, I believe). It works on general graphs but can be = a bit slow. > I know that balanced trees have the same asymptotic behaviour with = skip > lists and better worst-case behaviour. However, the original Pugh's = paper > introducing skip lists did a comparison between skip list and > self-balancing trees. That's true (I remember being fascinated by that paper) but it was = written in 1989 and some things have changed since. In particular, = memory locality matters more today. Random number generators have made = strides but they are often expensive, and the one in Emacs isn't likely = to be particularly fast (nor good). Skip-lists have some properties that make them interesting for some = uses, such as being fairly amenable to lock-free operation, but that's = not of interest in Emacs today. > The common operations on the cache are: (1) random element > lookups (element at point while user edits the Org buffer); (2) > iterating cache elements one-by-one to filter them (agenda views, > column views, sparse trees, etc); (3) Deleting or inserting a series = of > adjacent elements upon file modification (normal user edits). (4) > Combination of cache iteration with adding/removing elements in-place > (agenda lookups when parts of the cache should be calculated as we = find > that some elements are not yet in cache). (3) is the killer here since it rules out hash tables, or does it? Is = maintaining element order a strict requirement? Otherwise just go with = hash tables. > The benchmarks (see code below) of skip list vs. AVL-tree give the > following results: First let me commend you for making actual measurements! Of course, the = usual benchmarking caveats apply. > - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.44 sec (often = triggers garbage collection) Most data structures allow for fast insertion of consecutive elements as = a special operation, typically in more-or-less linear time. I'm sure we = could add it to the standard AVL implementation if it would make a = difference. Since I'm not sure what your keys are (integers, strings, ...) nor their = distribution (dense, sparse, clustered...) it's hard to recommend = precise structures but some alternatives are: - red-black-trees: typically similar to AVL trees but often easier to = implement - radix trees: very fast for dense or sparse-but-clustered integers - B-trees: often very fast because of their good locality, many variants - tries: many variants, typically tailored for a specific key type (eg = ASCII strings). I'm not sure if your usage benefits from persistence but it's nice to = have in many cases. Then there are implementation aspects to consider. For example, suppose = you want to store three values in a tree node (value and left and right = subtrees). You could use: - vector: [L R X] - list: (L R X) - improper list: (L R . X) They all have their advantages and drawbacks, and you don't know which = one is best until you measure. The result is often counter-intuitive. = Also remember that Elisp has efficient boolean vectors, so if you need = an extra bit per node it may be more economical to gather them all in = one vector instead of wasting 8-16 bytes for each bit. Even bignums can = be harnessed for the occasional profit. > In summary, skip list appears to give benefit in terms to overall = speed > (within context of grammar cache). The current implementation of skip > list + org-element-cache is slightly faster compared to AVL tree and > feels much snappier. It could very well be, but it's far from a double-blind trial. Few = things are without limit; one of them is Man's capacity for = self-deception. > The other aspect is ease of usage. AVL tree is trickier to work with > (see below code example). Looks like you are using it in a somewhat unorthodox way; those double = hyphens are there for a reason. If you implement a data structure for = your own use, you will of course make it handy for those specific = purposes. It is not uncommon for data structures to provide cursors that = help with your kind of local in-place editing.