From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>
Received: from mp2 ([2001:41d0:2:4a6f::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms11 with LMTPS
	id qHHpMzcB2l4QGgAA0tVLHw
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Fri, 05 Jun 2020 08:24:23 +0000
Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp2 with LMTPS
	id GKPTLzcB2l7jYQAAB5/wlQ
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Fri, 05 Jun 2020 08:24:23 +0000
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 92CFC940607
	for <larch@yhetil.org>; Fri,  5 Jun 2020 08:24:18 +0000 (UTC)
Received: from localhost ([::1]:51058 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	id 1jh7eC-0005Tu-JA
	for larch@yhetil.org; Fri, 05 Jun 2020 04:24:16 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:60758)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <yantar92@gmail.com>)
 id 1jh7dp-0005Tb-9j
 for emacs-orgmode@gnu.org; Fri, 05 Jun 2020 04:23:53 -0400
Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]:42454)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <yantar92@gmail.com>)
 id 1jh7dn-0006l0-S2
 for emacs-orgmode@gnu.org; Fri, 05 Jun 2020 04:23:52 -0400
Received: by mail-pl1-x636.google.com with SMTP id x11so3346828plv.9
 for <emacs-orgmode@gnu.org>; Fri, 05 Jun 2020 01:23:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:to:cc:subject:in-reply-to:references:date:message-id
 :mime-version; bh=KVWhI63nEWegNjwI3EQXgYiDAhf6CDiS1/xTzn0Fe80=;
 b=BmNlTwNw/jiXZKu71Jyj7VpckGwfKED7DTwcwt9iM/2PsdIE+mkAkKGg1OQvphLhBk
 8JdW8MmmAJ/LTONdedHIun3JDNk/4X843TLfjwd/Yfw8c3iACF7dNVUAjZbZecYfU79K
 RC82wlA6vso0p+dFffccs+2XbXDzRcp8TA1IFr2uAYj1b5nj9NBdFlUcfq82VZ3kbls/
 46Riic09C1Q7J12h4YdCOD4fQq+GGFE2YItJhIvKJRS2wFegNYDfcuz+eIbCBvocgbdS
 TcF1oR+q5KfBH8Oyrv/Q6fqeNXTVuHMPTpZm0rb8URBsLrAUhoIxsK6j6fC4xWpN4mN7
 TJVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date
 :message-id:mime-version;
 bh=KVWhI63nEWegNjwI3EQXgYiDAhf6CDiS1/xTzn0Fe80=;
 b=Up7lxdPux8y9mvhtrz99FWKIMoVZgWpX8LnLYvPDO2p3eryBgcTG0X55Moru4A2wPi
 Fbv07BMPpGoOpKhRPcu12bIdTcsL59qcuwR+b7mj56jyaYQ+I1DzvbQ75z+6uF7KOm/U
 Wu5CsBPmr2K9l//UU5Rc55zToKXocSFpBjl97osNYziwLEQfGQaAd1/7GXPhNhJcsx0J
 Om0oXiDBgTdK/gxwf8DVBsJVKsCAOPHE0IEjUeDq8PqssIgLS12ZaG5B1stZlOdOJNPP
 HDuuwceWTIXTx7pGSQdTqP1MacLCRWoYb0zM7cIfAvCXrBe9Rxxsc/rmCGht9mQ4LrQz
 mn1A==
X-Gm-Message-State: AOAM530Fs60YC3ZeLa+3ftF1h/pLrZ9KIm5n24VKJSebtpF91GmRjihe
 8UbigtWMSyIm3SqKB4QvTpfECcDg6bA=
X-Google-Smtp-Source: ABdhPJxFDa4MO3fF8Gq2CvnBaAXvvX7FB3moMT3Q7PsDNxSZ1qUZYULqXeHrISky93isq+n7VhLksg==
X-Received: by 2002:a17:902:8e82:: with SMTP id
 bg2mr9017221plb.198.1591345427082; 
 Fri, 05 Jun 2020 01:23:47 -0700 (PDT)
Received: from localhost ([210.3.160.226])
 by smtp.gmail.com with ESMTPSA id b16sm6557215pfd.111.2020.06.05.01.23.44
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 05 Jun 2020 01:23:46 -0700 (PDT)
From: Ihor Radchenko <yantar92@gmail.com>
To: Nicolas Goaziou <mail@nicolasgoaziou.fr>
Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on
 huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
In-Reply-To: <87r1uuotw8.fsf@nicolasgoaziou.fr>
References: <87h7x9e5jo.fsf@localhost> <875zdpia5i.fsf@nicolasgoaziou.fr>
 <87y2qi8c8w.fsf@localhost> <87r1vu5qmc.fsf@nicolasgoaziou.fr>
 <87imh5w1zt.fsf@localhost> <87blmxjckl.fsf@localhost>
 <87y2q13tgs.fsf@nicolasgoaziou.fr> <878si1j83x.fsf@localhost>
 <87d07bzvhd.fsf@nicolasgoaziou.fr> <87imh34usq.fsf@localhost>
 <87pnbby49m.fsf@nicolasgoaziou.fr> <87tv0efvyd.fsf@localhost>
 <874kse1seu.fsf@localhost> <87r1vhqpja.fsf@nicolasgoaziou.fr>
 <87tv0d2nk7.fsf@localhost> <87o8qkhy3g.fsf@nicolasgoaziou.fr>
 <87sgfqu5av.fsf@localhost> <87sgfn6qpc.fsf@nicolasgoaziou.fr>
 <87367d4ydc.fsf@localhost> <87r1uuotw8.fsf@nicolasgoaziou.fr>
Date: Fri, 05 Jun 2020 16:18:59 +0800
Message-ID: <87mu5iq618.fsf@localhost>
MIME-Version: 1.0
Content-Type: text/plain
Received-SPF: pass client-ip=2607:f8b0:4864:20::636;
 envelope-from=yantar92@gmail.com; helo=mail-pl1-x636.google.com
X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache.
 That's all we know.
X-Spam_score_int: -17
X-Spam_score: -1.8
X-Spam_bar: -
X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001,
 URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN
X-Spam_action: no action
X-BeenThere: emacs-orgmode@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
 <mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
 <mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Cc: emacs-orgmode@gnu.org
Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org
Sender: "Emacs-orgmode" <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>
X-Scanner: scn0
Authentication-Results: aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20161025 header.b=BmNlTwNw;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org
X-Spam-Score: -1.21
X-TUID: vtiGpnXtF7C9

> See also `gensym'. Do we really need to use it for something else than
> `invisible'? If not, the tool doesn't need to be generic.

For now, I also use it for buffer-local 'invisible stack. The stack is
needed to preserve folding state of drawers/blocks inside folded
outline. Though I am thinking about replacing the stack with separate
text properties, like 'invisible-outline-buffer-local +
'invisible-drawer-buffer-local + 'invisible-block-buffer-local.
Maintaining stack takes a noticeable percentage of CPU time in profiler.

org--get-buffer-local-text-property-symbol must take care about
situation with indirect buffers. When an indirect buffer is created from
some org buffer, the old value of char-property-alias-alist is carried
over. We need to detect this case and create new buffer-local symbol,
which is unique to the newly created buffer (but not create it if the
buffer-local property is already there). Then, the new symbol must
replace the old alias in char-property-alias-alist + old folding state
must be preserved (via copying the old invisibility specs into the new
buffer-local text property). I do not see how gensym can benefit this
logic.

> OK, but this may not be sufficient if we want to do slightly better than
> overlays in that area. This is not mandatory, though.

Could you elaborate on what can be "slightly better"? 

> As discussed before, I don't think you need to use `modification-hooks'
> or `insert-behind-hooks' if you already use `after-change-functions'.
>
> `after-change-functions' are also triggered upon text properties
> changes. So, what is the use case for the other hooks?

The problem is that `after-change-functions' cannot be a text property.
Only `modification-hooks' and `insert-in-front/behind-hooks' can be a
valid text property. If we use `after-change-functions', they will
always be triggered, regardless if the change was made inside or outside
folded region.

>> :asd:
>> :drawer:
>> lksjdfksdfjl
>> sdfsdfsdf
>> :end:
>>
>> If :asd: was inserted in front of folded :drawer:, changes in :drawer:
>> line of the new folded :asd: drawer would reveal the text between
>> :drawer: and :end:.
>>
>> Let me know what you think on this.

> I have first to understand the use case for `modification-hook'. But
> I think unfolding is the right thing to do in this situation, isn't it?

That situation arises because the modification-hooks from ":drawer:"
(they are set via text properties) only have information about the
:drawer:...:end: drawer before the modifications (they were set when
:drawer: was folded last time). So, they will only unfold a part of the
new :asd: drawer. I do not see a simple way to unfold everything without
re-parsing the drawer around the changed text.

Actually, I am quite unhappy with the performance of modification-hooks
set via text properties (I am using this patch on my Emacs during this
week). It appears that setting the text properties costs a significant
CPU time in practice, even though running the hooks is pretty fast.
I will think about a way to handle modifications using global
after-change-functions.

> `org--get-element-region-at-point' is certainly faster, but it is also
> wrong, unfortunately.
>
> Org syntax is not context-free grammar. If you try to parse it locally,
> starting from anywhere, it will fail at some point. For example, your
> function would choke in the following case:
>
>     [fn:1] Def1
>     #+begin_something
>
>     [fn:2] Def2
>     #+end_something

I see. 

> AFAIK, the only proper way to parse it is to start from a known position
> in the buffer. If you have no information about the buffer, the headline
> above is the position you want. With cache could help to start below.
> Anyway, in this particular case, you should not use
> `org--get-element-region-at-point'.

OK

Best,
Ihor

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> Hello,
>
> Ihor Radchenko <yantar92@gmail.com> writes:
>
>> [The patch itself will be provided in the following email]
>
> Thank you.
>
>> I have found char-property-alias-alist variable that controls how Emacs
>> calculates text property value if the property is not set. This variable
>> can be buffer-local, which allows independent 'invisible states in
>> different buffers.
>
> Great. I didn't know about this variable!
>
>> All the implementation stays in
>> org--get-buffer-local-text-property-symbol, which takes care about
>> generating unique property name and mapping it to 'invisible (or any
>> other) text property.
>
> See also `gensym'. Do we really need to use it for something else than
> `invisible'? If not, the tool doesn't need to be generic.
>
>> I simplified the code as suggested, without using pairs of before- and
>> after-change-functions.
>
> Great!
>
>> Handling text inserted into folded/invisible region is handled by a
>> simple after-change function. After testing, it turned out that simple
>> re-hiding text based on 'invisible property of the text before/after the
>> inserted region works pretty well.
>
> OK, but this may not be sufficient if we want to do slightly better than
> overlays in that area. This is not mandatory, though.
>
>> Modifications to BEGIN/END line of the drawers and blocks is handled via
>> 'modification-hooks + 'insert-behind-hooks text properties (there is no
>> after-change-functions analogue for text properties in Emacs). The
>> property is applied during folding and the modification-hook function is
>> made aware about the drawer/block boundaries (via apply-partially
>> passing element containing :begin :end markers for the current
>> drawer/block). Passing the element boundary is important because the
>> 'modification-hook will not directly know where it belongs to. Only the
>> modified region (which can be larger than the drawer) is passed to the
>> function. In the worst case, the region can be the whole buffer (if one
>> runs revert-buffer).
>
> As discussed before, I don't think you need to use `modification-hooks'
> or `insert-behind-hooks' if you already use `after-change-functions'.
>
> `after-change-functions' are also triggered upon text properties
> changes. So, what is the use case for the other hooks?
>
>> It turned out that adding 'modification-hook text property takes a
>> significant cpu time (partially, because we need to take care about
>> possible existing 'modification-hook value, see
>> org--add-to-list-text-property). For now, I decided to not clear the
>> modification hooks during unfolding because of poor performance.
>> However, this approach would lead to partial unfolding in the following
>> case:
>>
>> :asd:
>> :drawer:
>> lksjdfksdfjl
>> sdfsdfsdf
>> :end:
>>
>> If :asd: was inserted in front of folded :drawer:, changes in :drawer:
>> line of the new folded :asd: drawer would reveal the text between
>> :drawer: and :end:.
>>
>> Let me know what you think on this.
>
> I have first to understand the use case for `modification-hook'. But
> I think unfolding is the right thing to do in this situation, isn't it?
>
>> My simplified implementation of element boundary parser
>> (org--get-element-region-at-point) appears to be much faster and also
>> uses much less memory in comparison with org-element-at-point.
>> Moreover, not all the places where org-element-at-point is called
>> actually need the full parsed element. For example, org-hide-drawer-all,
>> org-hide-drawer-toggle, org-hide-block-toggle, and
>> org--hide-wrapper-toggle only need element type and some information
>> about the element boundaries - the information we can get from
>> org--get-element-region-at-point.
>
> [...]
>
>> What do you think about the idea of making use of
>> org--get-element-region-at-point in org code base?
>
> `org--get-element-region-at-point' is certainly faster, but it is also
> wrong, unfortunately.
>
> Org syntax is not context-free grammar. If you try to parse it locally,
> starting from anywhere, it will fail at some point. For example, your
> function would choke in the following case:
>
>     [fn:1] Def1
>     #+begin_something
>
>     [fn:2] Def2
>     #+end_something
>
> AFAIK, the only proper way to parse it is to start from a known position
> in the buffer. If you have no information about the buffer, the headline
> above is the position you want. With cache could help to start below.
> Anyway, in this particular case, you should not use
> `org--get-element-region-at-point'.
>
> Hopefully, we don't need to parse anything. In an earlier message,
> I suggested a few checks to make on the modified text in order to decide
> if something should be unfolded, or not. I suggest to start from there,
> and fix any shortcomings we might encounter. We're replacing overlays:
> low-level is good in this area.
>
> WDYT?
>
>
> Regards,
>
> -- 
> Nicolas Goaziou

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg