From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>
Received: from mp1 ([2001:41d0:2:4a6f::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms11 with LMTPS
	id wDaENEbaw17rFgAA0tVLHw
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Tue, 19 May 2020 13:08:22 +0000
Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp1 with LMTPS
	id CL5FMEbaw17CcAAAbx9fmQ
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Tue, 19 May 2020 13:08:22 +0000
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 3B54F9404E0
	for <larch@yhetil.org>; Tue, 19 May 2020 13:08:22 +0000 (UTC)
Received: from localhost ([::1]:52842 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>)
	id 1jb1yn-0007dX-6S
	for larch@yhetil.org; Tue, 19 May 2020 09:08:21 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:57300)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <mail@nicolasgoaziou.fr>)
 id 1jb1yM-0007bc-7u
 for emacs-orgmode@gnu.org; Tue, 19 May 2020 09:07:54 -0400
Received: from relay1-d.mail.gandi.net ([217.70.183.193]:17971)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <mail@nicolasgoaziou.fr>)
 id 1jb1yK-0004DP-Jn
 for emacs-orgmode@gnu.org; Tue, 19 May 2020 09:07:53 -0400
X-Originating-IP: 185.131.40.67
Received: from localhost (40-67.ipv4.commingeshautdebit.fr [185.131.40.67])
 (Authenticated sender: admin@nicolasgoaziou.fr)
 by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id 5579B24000F;
 Tue, 19 May 2020 13:07:49 +0000 (UTC)
From: Nicolas Goaziou <mail@nicolasgoaziou.fr>
To: Ihor Radchenko <yantar92@gmail.com>
Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on huge
 org files: Do not use overlays for PROPERTY and LOGBOOK drawers
References: <87h7x9e5jo.fsf@localhost> <875zdpia5i.fsf@nicolasgoaziou.fr>
 <87y2qi8c8w.fsf@localhost> <87r1vu5qmc.fsf@nicolasgoaziou.fr>
 <87imh5w1zt.fsf@localhost> <87blmxjckl.fsf@localhost>
 <87y2q13tgs.fsf@nicolasgoaziou.fr> <878si1j83x.fsf@localhost>
 <87d07bzvhd.fsf@nicolasgoaziou.fr> <87imh34usq.fsf@localhost>
 <87pnbby49m.fsf@nicolasgoaziou.fr> <87tv0efvyd.fsf@localhost>
 <874kse1seu.fsf@localhost> <87r1vhqpja.fsf@nicolasgoaziou.fr>
 <87tv0d2nk7.fsf@localhost>
Mail-Followup-To: Ihor Radchenko <yantar92@gmail.com>, emacs-orgmode@gnu.org
Date: Tue, 19 May 2020 15:07:47 +0200
In-Reply-To: <87tv0d2nk7.fsf@localhost> (Ihor Radchenko's message of "Tue, 19
 May 2020 00:52:08 +0800")
Message-ID: <87o8qkhy3g.fsf@nicolasgoaziou.fr>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Received-SPF: pass client-ip=217.70.183.193;
 envelope-from=mail@nicolasgoaziou.fr; helo=relay1-d.mail.gandi.net
X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/19 08:50:42
X-ACL-Warn: Detected OS   = Linux 3.11 and newer
X-Spam_score_int: -25
X-Spam_score: -2.6
X-Spam_bar: --
X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7,
 RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001,
 SPF_PASS=-0.001 autolearn=_AUTOLEARN
X-Spam_action: no action
X-BeenThere: emacs-orgmode@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
 <mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
 <mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Cc: emacs-orgmode@gnu.org
Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org
Sender: "Emacs-orgmode" <emacs-orgmode-bounces+larch=yhetil.org@gnu.org>
X-Scanner: scn0
Authentication-Results: aspmx1.migadu.com;
	dkim=none;
	dmarc=none;
	spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org
X-Spam-Score: 0.49
X-TUID: hE2mjhrlMXO0

Hello,

Ihor Radchenko <yantar92@gmail.com> writes:

>> As you noticed, using Org Element is a no-go, unfortunately. Parsing an
>> element is a O(N) operation by the number of elements before it in
>> a section. In particular, it is not bounded, and not mitigated by
>> a cache. For large documents, it is going to be unbearably slow, too.
>
> Ouch. I thought it is faster.
> What do you mean by "not mitigated by a cache"?

Parsing starts from the closest headline, every time. So, if Org parses
the Nth element in the entry two times, it really parses 2N elements.

With a cache, assuming the buffer wasn't modified, Org would parse
N elements only. With a smarter cache, with fine grained cache
invalidation, it could also reduce the number of subsequent parsed
elements.

> The reason I would like to utilise org-element parser to make tracking
> modifications more robust. Using details of the syntax would make the
> code fragile if any modifications are made to syntax in future.

I don't think the code would be more fragile. Also, the syntax we're
talking about is not going to be modified anytime soon. Moreover, if
folding breaks, it is usually visible, so the bug will not be unnoticed.

This code is going to be as low-level as it can be.

> Debugging bugs in modification functions is not easy, according to my
> experience.

No, it's not. 

But this is not really related to whether you use Element or not.

> One possible way to avoid performance issues during modification is
> running parser in advance. For example, folding an element may
> as well add information about the element to its text properties.
> This will not degrade performance of folding since we are already
> parsing the element during folding (at least, in
> org-hide-drawer-toggle).

We can use this information stored at fold time. But I'm not even sure
we need it.

> The problem with parsing an element during folding is that we cannot
> easily detect changes like below without re-parsing.

Of course we can. It is only necessary to focus on changes that would
break the structure of the element. This does not entail a full parsing.

> :PROPERTIES: <folded>
> :CREATED: [2020-05-18 Mon]
> :END: <- added line
> :ID: test
> :END:
>
> or even
>
> :PROPERTIES:
> :CREATED: [2020-05-18 Mon]
> :ID: test
> :END: <- delete this line
>
> :DRAWER: <folded, cannot be unfolded if we don't re-parse after deletion>
> test
> :END:

Please have a look at the "sensitive parts" I wrote about. This takes
care of this kind of breakage.

> The re-parsing can be done via regexp, as you suggested, but I don't
> like this idea, because it will end up re-implementing
> org-element-*-parser.

You may have misunderstood my suggestion. See below.

> Would it be acceptable to run org-element-*-parser
> in after-change-functions?

I'd rather not do that. This is unnecessary consing, and matching, etc.

> If I understand correctly, it is not as easy.
> Consider the following example:
>
> :PROPERTIES:
> :CREATED: [2020-05-18 Mon]
> <region-beginning>
> :ID: example
> :END:
>
> <... a lot of text, maybe containing other drawers ...>
>
> Nullam rutrum.
> Pellentesque dapibus suscipit ligula.
> <region-end>
> Proin quam nisl, tincidunt et, mattis eget, convallis nec, purus.
>
> If the region gets deleted, the modification hooks from chars inside
> drawer will be called as (hook-function <region-beginning>
> <region-end>). So, there is still a need to find the drawer somehow to
> mark it as about to be modified (modification hooks are ran before
> actual modification).

If we can stick with `after-change-functions' (or local equivalent),
that's better. It is more predictable than `before-change-functions' and
alike.

If it is a deletion, here is the kind of checks we could do, depending
on when they are performed.

Before actual changes :

  1. The deletion is happening within a folded drawer (unnecessary step
     in local functions).
  2. The change deleted the sensitive line ":END:".
  3. Conclusion : unfold.

Or, after actual changes :

  1. The deletion involves a drawer.
  2. Text properties indicate that the beginning of the propertized part
     of the buffer start with org-drawer-regexp, but doesn't end with
     `org-property-end-re'. A "sensitive part" disappeared!
  3. Conclusion : unfold

This is far away from parsing. IMO, a few checks cover all cases. Let me
know if you have questions about it.

Also, note that the kind of change you describe will happen perhaps
0.01% of the time. Most change are about one character, or a single
line, long.

> The only difference between using modification hooks and
> before-change-functions is that modification hooks will trigger less
> frequently. 

Exactly. Much less frequently. But extra care is required, as you noted
already.

> Considering the performance of org-element-at-point, it is
> probably worth doing. Initially, I wanted to avoid it because setting a
> single before-change-functions hook sounded cleaner than setting
> modification-hooks, insert-behind-hooks, and insert-in-front-hooks.

Well, `before-change-fuctions' and `after-change-functions' are not
clean at all: you modify an unrelated part of the buffer, but still call
those to check if a drawer needs to be unfolded somewhere.

And, more importantly, they are not meant to be used together, i.e., you
cannot assume that a single call to `before-change-functions' always
happens before calling `after-change-functions'. This can be tricky if
you want to use the former to pass information to the latter.

But I understand that they are easier to use than their local
counterparts. If you stick with (before|after)-change-functions, the
function being called needs to drop the ball very quickly if the
modification is not about folding changes. Also, I very much suggest to
stick to only `after-change-functions', if feasible (I think it is), per
above.

> Moreover, these text properties would be copied by default if one uses 
> buffer-substring. Then, the hooks will also trigger later in the yanked
> text, which may cause all kinds of bugs.

Indeed, that would be something to handle specifically. I.e.,
destructive modifications (i.e., those that unfold) could clear such
properties.

It is more work. I don't know if it is worth the trouble if we can get
out quickly of `after-change-functions' for unrelated changes.

> It was mostly an annoyance, because they returned different results on
> the same element. Specifically, they returned different :post-blank and
> :end properties, which does not sound right.

OK. If you have a reproducible recipe, I can look into it and see what
can be done.

Regards,

-- 
Nicolas Goaziou