From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#52753: 29.0.50; Printing long list-like structures fails Date: Fri, 24 Dec 2021 18:56:19 +0800 Message-ID: <87o856b7ak.fsf@localhost> References: <22AEBE6E-68E7-47D9-885C-AE6621AF4AEE@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="33084"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 52753@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Dec 24 11:56:14 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1n0iFC-0008Pv-8y for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 24 Dec 2021 11:56:14 +0100 Original-Received: from localhost ([::1]:51178 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n0iFA-00005V-CM for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 24 Dec 2021 05:56:12 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:60978) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n0iF0-0008Vn-Gz for bug-gnu-emacs@gnu.org; Fri, 24 Dec 2021 05:56:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:52441) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n0iEz-0001DY-KX for bug-gnu-emacs@gnu.org; Fri, 24 Dec 2021 05:56:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1n0iEz-0004iZ-Ib for bug-gnu-emacs@gnu.org; Fri, 24 Dec 2021 05:56:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 24 Dec 2021 10:56:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52753 X-GNU-PR-Package: emacs Original-Received: via spool by 52753-submit@debbugs.gnu.org id=B52753.164034330918069 (code B ref 52753); Fri, 24 Dec 2021 10:56:01 +0000 Original-Received: (at 52753) by debbugs.gnu.org; 24 Dec 2021 10:55:09 +0000 Original-Received: from localhost ([127.0.0.1]:35754 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n0iE7-0004hL-9I for submit@debbugs.gnu.org; Fri, 24 Dec 2021 05:55:08 -0500 Original-Received: from mail-lj1-f181.google.com ([209.85.208.181]:45844) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n0iE2-0004gj-BK for 52753@debbugs.gnu.org; Fri, 24 Dec 2021 05:55:06 -0500 Original-Received: by mail-lj1-f181.google.com with SMTP id h15so36356ljh.12 for <52753@debbugs.gnu.org>; Fri, 24 Dec 2021 02:55:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=JrDW85En275qQfdA98wcn72E9JL8rjWGtoF4AbmiRvg=; b=ci+YH7M+GlPNZIR0IthXbZEy4ACYLVYqWd1w7dGnrAjHjKEilN7m010nkAZq7mj+eT lowPWF0IMGW0GQTz5hxOqQUIEhU770Ck9tShDVTkjXmxHODcVr+qBzbDZ1Eqk3p6Q/bU R4KKidaZKi2MhrgK3lWwridlwss9NaB9fd05nJ11UMS34m8OGccr9H+5+xSCUI5BsUq4 ra9ykuIJb+QRckWPrzpfg4epDHAiolIj/H9OFZSl7cA5SEjuO6ZC0ynz5fYLPFIdowvH FT6lFkuavoUFa04P8l96epI1PGzffremSJ2fTQlgJ9sJV2+84Va6YJQpR/gX+0nO5ECN ws3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=JrDW85En275qQfdA98wcn72E9JL8rjWGtoF4AbmiRvg=; b=vT0j9XL390fCzPtTpeMKewyRUX4kMVO4g0i12TRbhY22f7bRyPwWIRvPHFlBX+UiFT c2Dgh4QZlqvDKqT4Y86ab2itOTEJcFDEcKEJEmg/nVEG2YDJxHyK3NMYYLf7UqStLr6C SPiuCBkWlnCNPeOQiuZjCloZroSF9TdbNSCc+7dXGsTgojcyt9hwkrbBWg2SjxGWHru0 /LfbGTlSJVplVeDqvKHRmCH/Tzc2XMKAsfco/UDRhf+lfoh5i9fw9GeOIaJ0zZB8qyBq IBekvV0yylXS2KFlmhiQGDyANY9AAN2GkoL0LBryBb2YTGATprRey6sCvGtUcvXlLXxx dJkQ== X-Gm-Message-State: AOAM531R5tJK8rM+eLDL/VGf8dJqG2bo+30uvCWTkspMwqmXgCukKc7z R9GnL2p+KcSkWe3rnxvsu/M= X-Google-Smtp-Source: ABdhPJw/1pDaOn1fsElilKm0iSVun6cUY92bF/SXNBl6FDxtlV/g+Exd+SU9A5liLchOD4E5ilzNKw== X-Received: by 2002:a05:651c:145:: with SMTP id c5mr4416945ljd.237.1640343296119; Fri, 24 Dec 2021 02:54:56 -0800 (PST) Original-Received: from localhost ([158.255.2.9]) by smtp.gmail.com with ESMTPSA id n2sm766201ljq.30.2021.12.24.02.54.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Dec 2021 02:54:55 -0800 (PST) In-Reply-To: <22AEBE6E-68E7-47D9-885C-AE6621AF4AEE@acm.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:223037 Archived-At: Mattias Engdeg=C3=A5rd writes: > The elisp printer isn't really geared towards printing deeply nested valu= es since it uses recursion. Perhaps you could just print the contents of yo= ur skip list as a flat list or vector for serialization purposes, and rebui= ld the structure when loading it back in again. Thanks for the suggestion. It is probably what I have to do anyway to support older Emacs. However, it will not solve the problem revealed in this bug report. I understand that the current implementations of the read/prin1 are recursive. Yet, they are able to read ordinary lists in non-recursive way. Skip list is conceptually very similar to ordinary list. Maybe the existing list reader functionality can be somehow adapted to read list-like structures? And more generally, I suspect that M-x memory-report suffers from similar problem with prin1 - treating list-like structure recursively. While you showed how to circumvent the printing/reading of skip lists, I do not see how to fix the memory-report behaviour and the Emacs crash (which, I guess, may also be related to treating list-likes recursively). I would like to highlight the crash reproducer especially. I may accept limitation for printer/reader, but crash cannot be tolerated. ------- > (I'd use a balanced tree-like structure instead of a skip list. Performan= ce is similar, and allows for immutability and persistence.) This is not directly related to my bug report, but if you have a good knowledge of using data structures in practice, I would appreciate comments. I know that balanced trees have the same asymptotic behaviour with skip lists and better worst-case behaviour. However, the original Pugh's paper introducing skip lists did a comparison between skip list and self-balancing trees. Pugh et al [1] showed that insertion time in skip lists is still O(log N), but with smaller constant. Moreover, he also showed that skip lists can be more performant when the data structure is edited locally [2]. [1] W Pugh [Springer Berlin Heidelberg] (1989) A Probabilistic Alternative to Balanced Trees. Univ https://doi.org/10.1007/3-540-51542-9_36 [2] W. Pugh (1990) A skip list cookbook https://www.semanticscholar.org/paper/A-skip-list-cookbook-Pugh/83ebde2= 3871ce9f839ad82e0247a7481410f994e The problem I am trying to solve is performance of parser cache for Org mode files. The common operations on the cache are: (1) random element lookups (element at point while user edits the Org buffer); (2) iterating cache elements one-by-one to filter them (agenda views, column views, sparse trees, etc); (3) Deleting or inserting a series of adjacent elements upon file modification (normal user edits). (4) Combination of cache iteration with adding/removing elements in-place (agenda lookups when parts of the cache should be calculated as we find that some elements are not yet in cache). The current implementation of org-element-cache is indeed using balanced AVL tree (built-in Emacs implementation). However, given that many cache operations are not random, but involve groups of subsequent elements, I wanted to give skip list a try. The benchmarks (see code below) of skip list vs. AVL-tree give the following results: Skip list: - Insertion of 200k random numbers: 0.80 sec - Lookup of 200k random numbers: 0.73 sec - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.42 sec - Lookup of 200k subsequent numbers: 0.32 sec AVL tree: - Insertion of 200k random numbers: 0.41 sec - Lookup of 200k random numbers: 0.18 sec - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.44 sec (often tri= ggers garbage collection) - Lookup of 200k subsequent numbers: 0.17 sec My implementation of skip list is comparable for subsequent number queries and 2-4x worse for random queries. A more tricky benchmark is iterating the list/tree and simultaneously altering it (a common operation in agenda/sparse tree code): For both Skip list and AVL tree I first create a 100k numbers (0, 2, 4, ..., 200000) and then iterate over the data structure adding the remaining odd numbers on every iteration. For skip list: Elapsed time: 0.509455s For AVL tree: Elapsed time: 1.348294s This time, the skip list "wins" also requiring a _lot_ less code (see below). Every insertion to AVL tree requires a O(log N) extra comparisons to find the new element and O(log N) extra memory for new stack. Finally, a real-world test for a complex agenda query on a 15Mb Org file (the data is output from elp-results): For skip list: ;; org-element-cache-map 1986 18.579660950 0.0093553176 ;; org-element-cache-map 1986 19.202719912 0.0096690432 For AVL tree: ;; org-element-cache-map 1986 21.047835249 0.0105981043 ;; org-element-cache-map 1986 20.733410823 0.0104397838 AVL tree is 10% slower. Given that most of the CPU time is devoted to actual parsing, 10% overall difference when switching from AVL tree to skip list is very good. In fact, my daily usage of skip-list branch when most of the file elements are already cached feels a lot snappier (though I have no consistent benchmark to test this and may fall into cognitive bias). In summary, skip list appears to give benefit in terms to overall speed (within context of grammar cache). The current implementation of skip list + org-element-cache is slightly faster compared to AVL tree and feels much snappier. However, the difference is indeed just a constant and probably depends on query patterns. If built-in AVL tree is optimised, the performance benefit may disappear (or not, if one optimises my skip list implementation better). The other aspect is ease of usage. AVL tree is trickier to work with (see below code example). Though I do not have enough practical experience with skip lists to know the possible code pitfalls years later. Any commends are welcome. ----- The code: ;; Random queries (require 'org-skip-list) (setq skiplist (org-skip-list-create #'<)) (benchmark-run 200000 (org-skip-list-insert skiplist (random 1000000000))) (benchmark-run 200000 (org-skip-list-find skiplist (random 1000000000))) ;; Ordered queries (require 'org-skip-list) (setq skiplist (org-skip-list-create #'<)) (setq i 0) (benchmark-run 200000 (org-skip-list-insert skiplist i) (cl-incf i)) (setq i 0) (benchmark-run 200000 (org-skip-list-find skiplist i) (cl-incf i)) ;; Random queries (require 'avl-tree) (setq tree (avl-tree-create #'<)) (benchmark-run 200000 (avl-tree-enter tree (random 1000000000))) (benchmark-run 200000 (avl-tree-member-p tree (random 1000000000))) ;; Ordered queries (require 'avl-tree) (setq tree (avl-tree-create #'<)) (setq i 0) (benchmark-run 200000 (avl-tree-enter tree i) (cl-incf i)) (setq i 0) (benchmark-run 200000 (avl-tree-member-p tree i) (cl-incf i)) ;; Modification during iteration (require 'org-skip-list) (setq skiplist (org-skip-list-create #'<)) (let ((i 0)) (while (<=3D i 200000) (when (=3D 0 (% i 2)) (org-skip-list-insert skiplist i)) (setq i (+ i 2)))) (benchmark-progn (iter-do (data (org-skip-list-iter skiplist)) (when (< data 200000) (org-skip-list-insert skiplist (1+ data))))) ;; Modification during iteration (require 'avl-tree) (setq tree (avl-tree-create #'<)) (let ((i 0)) (while (<=3D i 200000) (when (=3D 0 (% i 2)) (avl-tree-enter tree i)) (setq i (+ i 2)))) (benchmark-progn (let ((node (avl-tree--node-left (avl-tree--dummyroot tree))) data continue-flag (stack (list nil)) (leftp t) (start 0)) (while (and node (<=3D start 200000)) (setq data (avl-tree--node-data node)) (if (and leftp (avl-tree--node-left node) ; Left branch. ;; ... or when we are before START. (< start data)) (progn (push node stack) (setq node (avl-tree--node-left node))) (unless (< data start) (if (=3D start data) (cl-incf start) (avl-tree-enter tree start) (setq node (avl-tree--node-left (avl-tree--dummyroot tree)) stack (list nil) leftp t continue-flag t))) (if continue-flag (setq continue-flag nil) (setq node (if (and (car stack) ;; If START advanced beyond stack parent, skip the right branch. (< (avl-tree--node-data (car stack)) start)) (progn (setq leftp nil) (pop stack)) ;; Otherwise, move ahead into the right ;; branch when it exists. (if (setq leftp (avl-tree--node-right node)) (avl-tree--node-right node) (pop stack))))))))) Best, Ihor