From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id wDaENEbaw17rFgAA0tVLHw (envelope-from ) for ; Tue, 19 May 2020 13:08:22 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id CL5FMEbaw17CcAAAbx9fmQ (envelope-from ) for ; Tue, 19 May 2020 13:08:22 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 3B54F9404E0 for ; Tue, 19 May 2020 13:08:22 +0000 (UTC) Received: from localhost ([::1]:52842 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jb1yn-0007dX-6S for larch@yhetil.org; Tue, 19 May 2020 09:08:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57300) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jb1yM-0007bc-7u for emacs-orgmode@gnu.org; Tue, 19 May 2020 09:07:54 -0400 Received: from relay1-d.mail.gandi.net ([217.70.183.193]:17971) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jb1yK-0004DP-Jn for emacs-orgmode@gnu.org; Tue, 19 May 2020 09:07:53 -0400 X-Originating-IP: 185.131.40.67 Received: from localhost (40-67.ipv4.commingeshautdebit.fr [185.131.40.67]) (Authenticated sender: admin@nicolasgoaziou.fr) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id 5579B24000F; Tue, 19 May 2020 13:07:49 +0000 (UTC) From: Nicolas Goaziou To: Ihor Radchenko Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers References: <87h7x9e5jo.fsf@localhost> <875zdpia5i.fsf@nicolasgoaziou.fr> <87y2qi8c8w.fsf@localhost> <87r1vu5qmc.fsf@nicolasgoaziou.fr> <87imh5w1zt.fsf@localhost> <87blmxjckl.fsf@localhost> <87y2q13tgs.fsf@nicolasgoaziou.fr> <878si1j83x.fsf@localhost> <87d07bzvhd.fsf@nicolasgoaziou.fr> <87imh34usq.fsf@localhost> <87pnbby49m.fsf@nicolasgoaziou.fr> <87tv0efvyd.fsf@localhost> <874kse1seu.fsf@localhost> <87r1vhqpja.fsf@nicolasgoaziou.fr> <87tv0d2nk7.fsf@localhost> Mail-Followup-To: Ihor Radchenko , emacs-orgmode@gnu.org Date: Tue, 19 May 2020 15:07:47 +0200 In-Reply-To: <87tv0d2nk7.fsf@localhost> (Ihor Radchenko's message of "Tue, 19 May 2020 00:52:08 +0800") Message-ID: <87o8qkhy3g.fsf@nicolasgoaziou.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=217.70.183.193; envelope-from=mail@nicolasgoaziou.fr; helo=relay1-d.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/19 08:50:42 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: 0.49 X-TUID: hE2mjhrlMXO0 Hello, Ihor Radchenko writes: >> As you noticed, using Org Element is a no-go, unfortunately. Parsing an >> element is a O(N) operation by the number of elements before it in >> a section. In particular, it is not bounded, and not mitigated by >> a cache. For large documents, it is going to be unbearably slow, too. > > Ouch. I thought it is faster. > What do you mean by "not mitigated by a cache"? Parsing starts from the closest headline, every time. So, if Org parses the Nth element in the entry two times, it really parses 2N elements. With a cache, assuming the buffer wasn't modified, Org would parse N elements only. With a smarter cache, with fine grained cache invalidation, it could also reduce the number of subsequent parsed elements. > The reason I would like to utilise org-element parser to make tracking > modifications more robust. Using details of the syntax would make the > code fragile if any modifications are made to syntax in future. I don't think the code would be more fragile. Also, the syntax we're talking about is not going to be modified anytime soon. Moreover, if folding breaks, it is usually visible, so the bug will not be unnoticed. This code is going to be as low-level as it can be. > Debugging bugs in modification functions is not easy, according to my > experience. No, it's not. But this is not really related to whether you use Element or not. > One possible way to avoid performance issues during modification is > running parser in advance. For example, folding an element may > as well add information about the element to its text properties. > This will not degrade performance of folding since we are already > parsing the element during folding (at least, in > org-hide-drawer-toggle). We can use this information stored at fold time. But I'm not even sure we need it. > The problem with parsing an element during folding is that we cannot > easily detect changes like below without re-parsing. Of course we can. It is only necessary to focus on changes that would break the structure of the element. This does not entail a full parsing. > :PROPERTIES: > :CREATED: [2020-05-18 Mon] > :END: <- added line > :ID: test > :END: > > or even > > :PROPERTIES: > :CREATED: [2020-05-18 Mon] > :ID: test > :END: <- delete this line > > :DRAWER: > test > :END: Please have a look at the "sensitive parts" I wrote about. This takes care of this kind of breakage. > The re-parsing can be done via regexp, as you suggested, but I don't > like this idea, because it will end up re-implementing > org-element-*-parser. You may have misunderstood my suggestion. See below. > Would it be acceptable to run org-element-*-parser > in after-change-functions? I'd rather not do that. This is unnecessary consing, and matching, etc. > If I understand correctly, it is not as easy. > Consider the following example: > > :PROPERTIES: > :CREATED: [2020-05-18 Mon] > > :ID: example > :END: > > <... a lot of text, maybe containing other drawers ...> > > Nullam rutrum. > Pellentesque dapibus suscipit ligula. > > Proin quam nisl, tincidunt et, mattis eget, convallis nec, purus. > > If the region gets deleted, the modification hooks from chars inside > drawer will be called as (hook-function > ). So, there is still a need to find the drawer somehow to > mark it as about to be modified (modification hooks are ran before > actual modification). If we can stick with `after-change-functions' (or local equivalent), that's better. It is more predictable than `before-change-functions' and alike. If it is a deletion, here is the kind of checks we could do, depending on when they are performed. Before actual changes : 1. The deletion is happening within a folded drawer (unnecessary step in local functions). 2. The change deleted the sensitive line ":END:". 3. Conclusion : unfold. Or, after actual changes : 1. The deletion involves a drawer. 2. Text properties indicate that the beginning of the propertized part of the buffer start with org-drawer-regexp, but doesn't end with `org-property-end-re'. A "sensitive part" disappeared! 3. Conclusion : unfold This is far away from parsing. IMO, a few checks cover all cases. Let me know if you have questions about it. Also, note that the kind of change you describe will happen perhaps 0.01% of the time. Most change are about one character, or a single line, long. > The only difference between using modification hooks and > before-change-functions is that modification hooks will trigger less > frequently. Exactly. Much less frequently. But extra care is required, as you noted already. > Considering the performance of org-element-at-point, it is > probably worth doing. Initially, I wanted to avoid it because setting a > single before-change-functions hook sounded cleaner than setting > modification-hooks, insert-behind-hooks, and insert-in-front-hooks. Well, `before-change-fuctions' and `after-change-functions' are not clean at all: you modify an unrelated part of the buffer, but still call those to check if a drawer needs to be unfolded somewhere. And, more importantly, they are not meant to be used together, i.e., you cannot assume that a single call to `before-change-functions' always happens before calling `after-change-functions'. This can be tricky if you want to use the former to pass information to the latter. But I understand that they are easier to use than their local counterparts. If you stick with (before|after)-change-functions, the function being called needs to drop the ball very quickly if the modification is not about folding changes. Also, I very much suggest to stick to only `after-change-functions', if feasible (I think it is), per above. > Moreover, these text properties would be copied by default if one uses > buffer-substring. Then, the hooks will also trigger later in the yanked > text, which may cause all kinds of bugs. Indeed, that would be something to handle specifically. I.e., destructive modifications (i.e., those that unfold) could clear such properties. It is more work. I don't know if it is worth the trouble if we can get out quickly of `after-change-functions' for unrelated changes. > It was mostly an annoyance, because they returned different results on > the same element. Specifically, they returned different :post-blank and > :end properties, which does not sound right. OK. If you have a reproducible recipe, I can look into it and see what can be done. Regards, -- Nicolas Goaziou