From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id kOv6H1Mb1l4jAwAA0tVLHw (envelope-from ) for ; Tue, 02 Jun 2020 09:26:43 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id 0K3jG1Mb1l7JDAAAB5/wlQ (envelope-from ) for ; Tue, 02 Jun 2020 09:26:43 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id BBD6E94066C for ; Tue, 2 Jun 2020 09:26:42 +0000 (UTC) Received: from localhost ([::1]:45940 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jg3Bx-0005RN-Mm for larch@yhetil.org; Tue, 02 Jun 2020 05:26:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:43216) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jg3BE-0005QV-WD for emacs-orgmode@gnu.org; Tue, 02 Jun 2020 05:25:57 -0400 Received: from mail-pl1-x62c.google.com ([2607:f8b0:4864:20::62c]:37667) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jg3BC-00048L-GZ for emacs-orgmode@gnu.org; Tue, 02 Jun 2020 05:25:56 -0400 Received: by mail-pl1-x62c.google.com with SMTP id y18so1124265plr.4 for ; Tue, 02 Jun 2020 02:25:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=+QWiQtsUk0tU6ExPipLj0xctizfiVjQsywbTCAKpQcc=; b=ABI1Y9zENDrP1MjiCsVu/AotTpj7M0vgJUc3qY8T3vXb7PmZE0RnVFRuyhQPm4xmpX hBUESWzpHjP0wI1SqHKZLtVN135mGPx2D/FYMFsz7KYuN63vMhbsj1Hd4Je44trolwFy jPn6DGnYCBAN92AqmBthgln8XndVTD6HHGWgFYvj6t8QCG4Rno6B81JSX5hocvcvxK7E x7VmwiNA5bdrNRoS+rYqhsXrPeO6ciuWFwDguHAXs9nIvjMeL7fW28d/bTVmxe+EtepE bue0/oRYCC1pTUXvTlXOSWTcafVBcoY4fOHHJav/sdsiiFnmTSm0b7+wkxfnr+8dh+CZ tx8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=+QWiQtsUk0tU6ExPipLj0xctizfiVjQsywbTCAKpQcc=; b=YaveU0/6mF5piRQoQUlzI1J2DYDhaIC8rF+eg56CMtYhpUibPQO4+7fyhYLNCTVodx j9AhwKvrvVqO0nME/fKHXBBmis5+Y97wsvLJaqwrmc+NA31qDDKFwwJefy2FNEodYa/I kqTVZp3vcSsUR+O4zQrGvTqq4G13vKOk/0gQxZDB83C9DLV1MRKeCRtV2b/XfC/4l7mH 6fozHlcm1Wlk4WOgXywrvU11SK5nLBHFaQ0BbTkhqBRGBApUj9RmhLvBH5O54SWStDRu ZJpkt4TDvmOw/ilxxN9p4KqHMOQFp694R5V8/1/r+jT5E/Tt+tfY6I3Xw2lBXHmZNii5 wwVA== X-Gm-Message-State: AOAM533zNZVMPQu5c8lqejH8jnPDWutyrd9TQz1TsHTK0sve4TuPpgvz 1s1XEurLS5dWY8U7FtT5dzaGLl972wP8WzG1 X-Google-Smtp-Source: ABdhPJy69RygQ+/LrxYbuq3Kvyp+gJVoSkMFpz1C/as1vRUY/BHCABog89UPBxAx1UrVlpuFZgKl7w== X-Received: by 2002:a17:90b:4c4b:: with SMTP id np11mr4425506pjb.58.1591089952484; Tue, 02 Jun 2020 02:25:52 -0700 (PDT) Received: from localhost ([101.99.64.65]) by smtp.gmail.com with ESMTPSA id t22sm1777381pjy.32.2020.06.02.02.25.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jun 2020 02:25:51 -0700 (PDT) From: Ihor Radchenko To: Nicolas Goaziou Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers In-Reply-To: <87sgfn6qpc.fsf@nicolasgoaziou.fr> References: <87h7x9e5jo.fsf@localhost> <875zdpia5i.fsf@nicolasgoaziou.fr> <87y2qi8c8w.fsf@localhost> <87r1vu5qmc.fsf@nicolasgoaziou.fr> <87imh5w1zt.fsf@localhost> <87blmxjckl.fsf@localhost> <87y2q13tgs.fsf@nicolasgoaziou.fr> <878si1j83x.fsf@localhost> <87d07bzvhd.fsf@nicolasgoaziou.fr> <87imh34usq.fsf@localhost> <87pnbby49m.fsf@nicolasgoaziou.fr> <87tv0efvyd.fsf@localhost> <874kse1seu.fsf@localhost> <87r1vhqpja.fsf@nicolasgoaziou.fr> <87tv0d2nk7.fsf@localhost> <87o8qkhy3g.fsf@nicolasgoaziou.fr> <87sgfqu5av.fsf@localhost> <87sgfn6qpc.fsf@nicolasgoaziou.fr> Date: Tue, 02 Jun 2020 17:21:03 +0800 Message-ID: <87367d4ydc.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2607:f8b0:4864:20::62c; envelope-from=yantar92@gmail.com; helo=mail-pl1-x62c.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=ABI1Y9zE; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: -1.21 X-TUID: 5CPMvEzI9/wO Hello, [The patch itself will be provided in the following email] I have three updates from the previous version of the patch: 1. I managed to implement buffer-local text properties. Now, outline folding also uses text properties without a need to give up independent folding in indirect buffers. 2. The code handling modifications in folded drawers/blocks was rewritten. The new code uses after-change-functions to re-hide text inserted in the middle of folded regions; and text properties to unfold folded drawers/blocks if one changes BEGIN/END line. 3. [experimental] Started working on improving memory and cpu footprint of the old code related to folding/unfolding. org-hide-drawer-all now works significantly faster because I can utilise simplified drawer parser, which require a lot less memory. Overall, I managed to reduce Emacs memory footprint after loading all my agenda_files twice. The loading is also noticeably faster. ----------------------------------------------------------------------- ----------------------------------------------------------------------- More details on the buffer-local text properties: I have found char-property-alias-alist variable that controls how Emacs calculates text property value if the property is not set. This variable can be buffer-local, which allows independent 'invisible states in different buffers. All the implementation stays in org--get-buffer-local-text-property-symbol, which takes care about generating unique property name and mapping it to 'invisible (or any other) text property. ----------------------------------------------------------------------- ----------------------------------------------------------------------- More details on the new implementation for tracking changes: I simplified the code as suggested, without using pairs of before- and after-change-functions. Handling text inserted into folded/invisible region is handled by a simple after-change function. After testing, it turned out that simple re-hiding text based on 'invisible property of the text before/after the inserted region works pretty well. Modifications to BEGIN/END line of the drawers and blocks is handled via 'modification-hooks + 'insert-behind-hooks text properties (there is no after-change-functions analogue for text properties in Emacs). The property is applied during folding and the modification-hook function is made aware about the drawer/block boundaries (via apply-partially passing element containing :begin :end markers for the current drawer/block). Passing the element boundary is important because the 'modification-hook will not directly know where it belongs to. Only the modified region (which can be larger than the drawer) is passed to the function. In the worst case, the region can be the whole buffer (if one runs revert-buffer). It turned out that adding 'modification-hook text property takes a significant cpu time (partially, because we need to take care about possible existing 'modification-hook value, see org--add-to-list-text-property). For now, I decided to not clear the modification hooks during unfolding because of poor performance. However, this approach would lead to partial unfolding in the following case: :asd: :drawer: lksjdfksdfjl sdfsdfsdf :end: If :asd: was inserted in front of folded :drawer:, changes in :drawer: line of the new folded :asd: drawer would reveal the text between :drawer: and :end:. Let me know what you think on this. > You shouldn't be bothered by the case you're describing here, for > multiple reasons. >=20 > First, this issue already arises in the current implementation. No one > bothered so far: this change is very unlikely to happen. If it becomes > an issue, we could make sure that `org-reveal' handles this. >=20 > But, more importantly, we actually /want it/ as a feature. Indeed, if > DRAWER is expanded every time ":BLAH:" is inserted above, then inserting > a drawer manually would unfold /all/ drawers in the section. The user is > more likely to write first ":BLAH:" (everything is unfolded) then > ":END:" than ":END:", then ":BLAH:". Agree. This allowed me to simplify the code significantly. > It seems you're getting it backwards. `before-change-functions' are the > functions being called with a possibly wide, imprecise, region to > handle: >=20 > When that happens, the arguments to =E2=80=98before-change-functions= =E2=80=99 will > enclose a region in which the individual changes are made, but won=E2= =80=99t > necessarily be the minimal such region >=20 > however, after-change-functions calls are always minimal: >=20 > and the arguments to each successive call of > =E2=80=98after-change-functions=E2=80=99 will then delimit the part o= f text being > changed exactly. >=20 > If you stick to `after-change-functions', there will be no such thing as > you describe. You are right here, I missed that before-change-functions are likely to be called on large regions. I thought that the regions are same for before/after-change-functions, but after-change-functions could be called more than 1 time. After second thought, your vision that it is mostly 0 or 1 times should be the majority of cases in practice. ----------------------------------------------------------------------- ----------------------------------------------------------------------- More details on reducing cpu and memory footprint of org buffers: My simplified implementation of element boundary parser (org--get-element-region-at-point) appears to be much faster and also uses much less memory in comparison with org-element-at-point. Moreover, not all the places where org-element-at-point is called actually need the full parsed element. For example, org-hide-drawer-all, org-hide-drawer-toggle, org-hide-block-toggle, and org--hide-wrapper-toggle only need element type and some information about the element boundaries - the information we can get from org--get-element-region-at-point. The following version of org-hide-drawer-all seems to work much faster in comparison with original: (defun org-hide-drawer-all () "Fold all drawers in the current buffer." (save-excursion (goto-char (point-min)) (while (re-search-forward org-drawer-regexp nil t) (when-let* ((drawer (org--get-element-region-at-point '(property-draw= er drawer))) (type (org-element-type drawer))) (org-hide-drawer-toggle t nil drawer) ;; Make sure to skip drawer entirely or we might flag it ;; another time when matching its ending line with ;; `org-drawer-regexp'. (goto-char (org-element-property :end drawer)))))) What do you think about the idea of making use of org--get-element-region-at-point in org code base? ----------------------------------------------------------------------- ----------------------------------------------------------------------- Further work: 1. Look into other code using overlays. Specifically, org-toggle-custom-properties, Babel hashes, and narrowed table columns. Best, Ihor Nicolas Goaziou writes: > Hello, > > Ihor Radchenko writes: > >> I have five updates from the previous version of the patch: > > Thank you. > >> 1. I implemented a simplified version of element parsing to detect >> changes in folded drawers or blocks. No computationally expensive calls >> of org-element-at-point or org-element-parse-buffer are needed now. >> >> 2. The patch is now compatible with master (commit 2e96dc639). I >> reverted the earlier change in folding drawers and blocks. Now, they are >> back to using 'org-hide-block and 'org-hide-drawer. Using 'outline would >> achieve nothing when we use text properties. >> >> 3. 'invisible text property can now be nested. This is important, for >> example, when text inside drawers contains fontified links (which also >> use 'invisible text property to hide parts of the link). Now, the old >> 'invisible spec is recovered after unfolding. > > Interesting. I'm running out of time, so I cannot properly inspect the > code right now. I'll try to do that before the end of the week. > >> 4. Some outline-* function calls in org referred to outline-flag-region >> implementation, which is not in sync with org-flag-region in this patch. >> I have implemented their org-* versions and replaced the calls >> throughout .el files. Actually, some org-* versions were already >> implemented in org, but not used for some reason (or not mentioned in >> the manual). I have updated the relevant sections of manual. These >> changes might be relevant to org independently of this feature branch. > > Yes, we certainly want to move to org-specific versions in all cases. > >> 5. I have managed to get a working version of outline folding via text >> properties. However, that approach has a big downside - folding state >> cannot be different in indirect buffer when we use text properties. I >> have seen packages relying on this feature of org and I do not see any >> obvious way to achieve different folding state in indirect buffer while >> using text properties for outline folding. > > Hmm. Good point. This is a serious issue to consider. Even if we don't > use text properties for outline, this also affects drawers and blocks. > >> For now, I still used before/after-change-functions combination. > > You shouldn't. > >> I see the following problems with using only after-change-functions:=20 >> >> 1. They are not guaranteed to be called after every single change: > > Of course they are! See below. > >> From (elisp) Change Hooks: >> "... some complex primitives call =E2=80=98before-change-functions=E2=80= =99 once before >> making changes, and then call =E2=80=98after-change-functions=E2=80=99 z= ero or more >> times" > > "zero" means there are no changes at all, so, `after-change-functions' > are not called, which is expected. > >> The consequence of it is a possibility that region passed to the >> after-change-functions is quite big (including all the singular changes, >> even if they are distant). This region may contain changed drawers as >> well and unchanged drawers and needs to be parsed to determine which >> drawers need to be re-folded. > > It seems you're getting it backwards. `before-change-functions' are the > functions being called with a possibly wide, imprecise, region to > handle: > > When that happens, the arguments to =E2=80=98before-change-functions= =E2=80=99 will > enclose a region in which the individual changes are made, but won=E2= =80=99t > necessarily be the minimal such region > > however, after-change-functions calls are always minimal: > > and the arguments to each successive call of > =E2=80=98after-change-functions=E2=80=99 will then delimit the part o= f text being > changed exactly. > > If you stick to `after-change-functions', there will be no such thing as > you describe. > >>> And, more importantly, they are not meant to be used together, i.e., you >>> cannot assume that a single call to `before-change-functions' always >>> happens before calling `after-change-functions'. This can be tricky if >>> you want to use the former to pass information to the latter. >> >> The fact that before-change-functions can be called multiple times >> before after-change-functions, is trivially solved by using buffer-local >> changes register (see org--modified-elements). > > Famous last words. Been there, done that, and it failed. > > Let me quote the manual: > > In general, we advise to use either before- or the after-change > hooks, but not both. > > So, let me insist: don't do that. If you don't agree with me, let's at > least agree with Emacs developers. > >> The register is populated by before-change-functions and cleared by >> after-change-functions. > > You cannot expect `after-change-functions' to clear what > `before-change-functions' did. This is likely to introduce pernicious > bugs. Sorry if it sounds like FUD, but bugs in those areas are just > horrible to squash. > >>> Well, `before-change-fuctions' and `after-change-functions' are not >>> clean at all: you modify an unrelated part of the buffer, but still call >>> those to check if a drawer needs to be unfolded somewhere. >> >> 2. As you pointed, instead of global before-change-functions, we can use >> modification-hooks text property on sensitive parts of the >> drawers/blocks. This would work, but I am concerned about one annoying >> special case: >> >> ------------------------------------------------------------------------- >> :BLAH: >> >> >> >> :DRAWER: >> Donec at pede. >> :END: >> ------------------------------------------------------------------------- >> In this example, the user would not be able to unfold the folder DRAWER >> because it will technically become a part of a new giant BLAH drawer. >> This may be especially annoying if is more than one screen >> long and there is no easy way to identify why unfolding does not work >> (with point at :DRAWER:). > > You shouldn't be bothered by the case you're describing here, for > multiple reasons. > > First, this issue already arises in the current implementation. No one > bothered so far: this change is very unlikely to happen. If it becomes > an issue, we could make sure that `org-reveal' handles this. > > But, more importantly, we actually /want it/ as a feature. Indeed, if > DRAWER is expanded every time ":BLAH:" is inserted above, then inserting > a drawer manually would unfold /all/ drawers in the section. The user is > more likely to write first ":BLAH:" (everything is unfolded) then > ":END:" than ":END:", then ":BLAH:". > >> Because of this scenario, limiting before-change-functions to folded >> drawers is not sufficient. Any change in text may need to trigger >> unfolding. > > after-change-functions is more appropriate than before-change-functions, > and local parsing, as explained in this thread, is more efficient than > re-inventing the parser. > >> In the patch, I always register possible modifications in the >> blocks/drawers intersecting with the modified region + a drawer/block >> right next to the region. >> >> ----------------------------------------------------------------------- >> ----------------------------------------------------------------------- >> >> More details on the nested 'invisible text property implementation. >> >> The idea is to keep 'invisible property stack push and popping from it >> as we add/remove 'invisible text property. All the work is done in >> org-flag-region. > > This sounds like a good idea. > >> This was originally intended for folding outlines via text properties. >> Since using text properties for folding outlines is not a good idea, >> nested text properties have much less use. > > AFAIU, they have. You mention link fontification, but there are other > pieces that we could switch to text properties instead of overlays, > e.g., Babel hashes, narrowed table columns=E2=80=A6 > >> 3. Multiple calls to before/after-change-functions is still a problem. I >> am looking into following ways to reduce this number: >> - reduce the number of elements registered as potentially modified >> + do not add duplicates to org--modified-elements >> + do not add unfolded elements to org--modified-elements >> + register after-change-function as post-command hook and remove it >> from global after-change-functions. This way, it will be called >> twice per command only. >> - determine common region containing org--modified-elements. if change >> is happening within that region, there is no need to parse >> drawers/blocks there again. > > This is over-engineering. Again, please focus on local changes, as > discussed before. > >> Recipe to have different (org-element-at-point) and >> (org-element-parse-buffer 'element) >> ------------------------------------------------------------------------- >> >> :PROPERTIES: >> :CREATED: [2020-05-23 Sat 02:32] >> :END: >> >> >> >> ------------------------------------------------------------------------- > > I didn't look at this situation in particular, but there are cases where > different :post-blank values are inevitable, for example at the end of > a section. > > Regards, > > --=20 > Nicolas Goaziou --=20 Ihor Radchenko, PhD, Center for Advancing Materials Performance from the Nanoscale (CAMP-nano) State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong U= niversity, Xi'an, China Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg