From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id ODU8AdfAwl5RegAA0tVLHw (envelope-from ) for ; Mon, 18 May 2020 17:07:35 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id AHKRONbAwl4tQQAAbx9fmQ (envelope-from ) for ; Mon, 18 May 2020 17:07:34 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 4843D94050C for ; Mon, 18 May 2020 17:07:34 +0000 (UTC) Received: from localhost ([::1]:50092 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jajEi-0006Tw-FD for larch@yhetil.org; Mon, 18 May 2020 13:07:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34280) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jaj4A-0001Bm-Uv for emacs-orgmode@gnu.org; Mon, 18 May 2020 12:56:38 -0400 Received: from mail-pj1-x1041.google.com ([2607:f8b0:4864:20::1041]:38728) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jaj48-0001t1-I5 for emacs-orgmode@gnu.org; Mon, 18 May 2020 12:56:38 -0400 Received: by mail-pj1-x1041.google.com with SMTP id t40so91453pjb.3 for ; Mon, 18 May 2020 09:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=6RlCgaXrYYoDGpCQosMA1IiXDW35D5Lm+FavC3ch/p4=; b=cZg1sje9n0ZbGxhW6QyAJdnpkrFRaIvWtGcgBrQXn3jEKp/7RSb3Xbc6pt8bOXxVxV O44oZe37PrV19LQOLFu2tdZsjeH1xMqtnNSH6UZtq7hXNZkv4mmT+wFk7tpTZjCxLXuX r1VdnJpo2nKHUE5BUKa1qUbdQy/t81tAJswx7p6NbB2d7urOP4ry89xyuNFLbz8xTAkX 8OV7PfDWzBqUb79Ous+h3KJm195wztU2x3LmaRxRNsCd4km3rJgCUQ3sVtEwYbh5ii76 cK9neV6qRI1cCRSfT6+frAk5cbOXIsmnlv1s2L67YbbmsNry9q0yBI6Rj2Rz/QQsNBgk xzQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=6RlCgaXrYYoDGpCQosMA1IiXDW35D5Lm+FavC3ch/p4=; b=Lea7Xa8zhco7Ng/p6NJ5tHqS1gKZ2bjG7rydVl4JsAcC+OYEwjFsFMod9UVBQYsAKk f4kZAcSPm7tt7Yr8sQV9onyU65dDFTAd6vYcI0GQmLR6cho/vXprQXQNcN43FpWpBSAA /+aZkymLvaSFYJYx21+1JDPcrxCairwQ3NBnkwAIU4cB7RE6TQgWNgCbCKaVHSKd6o8b PS6MtVONjH7JSN9ik5z02Ua2Vw+DAiXVdmcaDNw6eWKhoOWEQvTXvpATQfXqhAffuz0W s+MvhJGp9GViAhmAjF8JLp5PoebYIBImlSbjy2DWxQtihFBWxMH0HTX3UkbWgjujrQP7 Twow== X-Gm-Message-State: AOAM532CjRA83/VAetQTq9O4dP5fKyDk/STvUiRBVtL8R64Btb84e/Kf Kd+rqqCCCslC8HkEfhsU9AX0jv1IRnftyWpu X-Google-Smtp-Source: ABdhPJwitK6c34AlMFWYBLEYd5c4gNRI20Fo8Km6WJ6Ik4slkl4VC+CwFT4HsIjRzqAoLCbvkLrZqA== X-Received: by 2002:a17:90a:b10f:: with SMTP id z15mr324781pjq.188.1589820994625; Mon, 18 May 2020 09:56:34 -0700 (PDT) Received: from localhost (97.151-89-23.rdns.scalabledns.com. [23.89.151.97]) by smtp.gmail.com with ESMTPSA id n9sm87117pjt.29.2020.05.18.09.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 May 2020 09:56:34 -0700 (PDT) From: Ihor Radchenko To: Nicolas Goaziou Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers In-Reply-To: <87r1vhqpja.fsf@nicolasgoaziou.fr> References: <87h7x9e5jo.fsf@localhost> <875zdpia5i.fsf@nicolasgoaziou.fr> <87y2qi8c8w.fsf@localhost> <87r1vu5qmc.fsf@nicolasgoaziou.fr> <87imh5w1zt.fsf@localhost> <87blmxjckl.fsf@localhost> <87y2q13tgs.fsf@nicolasgoaziou.fr> <878si1j83x.fsf@localhost> <87d07bzvhd.fsf@nicolasgoaziou.fr> <87imh34usq.fsf@localhost> <87pnbby49m.fsf@nicolasgoaziou.fr> <87tv0efvyd.fsf@localhost> <874kse1seu.fsf@localhost> <87r1vhqpja.fsf@nicolasgoaziou.fr> Date: Tue, 19 May 2020 00:52:08 +0800 Message-ID: <87tv0d2nk7.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::1041; envelope-from=yantar92@gmail.com; helo=mail-pj1-x1041.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=cZg1sje9; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: -1.21 X-TUID: 6okkcIiJl423 > As you noticed, using Org Element is a no-go, unfortunately. Parsing an > element is a O(N) operation by the number of elements before it in > a section. In particular, it is not bounded, and not mitigated by > a cache. For large documents, it is going to be unbearably slow, too. Ouch. I thought it is faster. What do you mean by "not mitigated by a cache"? The reason I would like to utilise org-element parser to make tracking modifications more robust. Using details of the syntax would make the code fragile if any modifications are made to syntax in future. Debugging bugs in modification functions is not easy, according to my experience. One possible way to avoid performance issues during modification is running parser in advance. For example, folding an element may as well add information about the element to its text properties. This will not degrade performance of folding since we are already parsing the element during folding (at least, in org-hide-drawer-toggle). The problem with parsing an element during folding is that we cannot easily detect changes like below without re-parsing. :PROPERTIES: :CREATED: [2020-05-18 Mon] :END: <- added line :ID: test :END: or even :PROPERTIES: :CREATED: [2020-05-18 Mon] :ID: test :END: <- delete this line :DRAWER: test :END: The re-parsing can be done via regexp, as you suggested, but I don't like this idea, because it will end up re-implementing org-element-*-parser. Would it be acceptable to run org-element-*-parser in after-change-functions? > If you use modification-hooks and al., you don't need to parse anything, > because you can store information as text properties. Therefore, once > the modification happens, you already know where you are (or, at least > where you were before the change). > The ideas I suggested about sensitive parts of elements are worth > exploring, IMO. Do you have any issue with them? If I understand correctly, it is not as easy. Consider the following example: :PROPERTIES: :CREATED: [2020-05-18 Mon] :ID: example :END: <... a lot of text, maybe containing other drawers ...> Nullam rutrum. Pellentesque dapibus suscipit ligula. Proin quam nisl, tincidunt et, mattis eget, convallis nec, purus. If the region gets deleted, the modification hooks from chars inside drawer will be called as (hook-function ). So, there is still a need to find the drawer somehow to mark it as about to be modified (modification hooks are ran before actual modification). The only difference between using modification hooks and before-change-functions is that modification hooks will trigger less frequently. Considering the performance of org-element-at-point, it is probably worth doing. Initially, I wanted to avoid it because setting a single before-change-functions hook sounded cleaner than setting modification-hooks, insert-behind-hooks, and insert-in-front-hooks. Moreover, these text properties would be copied by default if one uses buffer-substring. Then, the hooks will also trigger later in the yanked text, which may cause all kinds of bugs. > `org-element-at-point' is local, `org-element-parse-buffer' is global. > They are not equivalent, but is it an issue? It was mostly an annoyance, because they returned different results on the same element. Specifically, they returned different :post-blank and :end properties, which does not sound right. Best, Ihor Nicolas Goaziou writes: > Hello, > > Ihor Radchenko writes: > >> Apparently my previous email was again refused by your mail server (I >> tried to add patch as attachment this time). > > Ah. This is annoying, for you and for me. > >> The patch is in >> https://gist.github.com/yantar92/6447754415457927293acda43a7fcaef > > Thank you. > >>> I have finished a seemingly stable implementation of handling changes >>> inside drawer and block elements. For now, I did not bother with >>> 'modification-hooks and 'insert-in-font/behind-hooks, but simply used >>> before/after-change-functions. >>> >>> The basic idea is saving parsed org-elements before the modification >>> (with :begin and :end replaced by markers) and comparing them with the >>> versions of the same elements after the modification. >>> Any valid org element can be examined in such way by an arbitrary >>> function (see org-track-modification-elements) [1]. > > As you noticed, using Org Element is a no-go, unfortunately. Parsing an > element is a O(N) operation by the number of elements before it in > a section. In particular, it is not bounded, and not mitigated by > a cache. For large documents, it is going to be unbearably slow, too. > > I don't think the solution is to use combine-after-change-calls either, > because even a single call to `org-element-at-point' can be noticeable > in a very large section. Such low-level code should avoid using the > Element library altogether, except for the initial folding part, which > is interactive. > > If you use modification-hooks and al., you don't need to parse anything, > because you can store information as text properties. Therefore, once > the modification happens, you already know where you are (or, at least > where you were before the change). > > The ideas I suggested about sensitive parts of elements are worth > exploring, IMO. Do you have any issue with them? > >>> For (2), I have introduced org--property-drawer-modified-re to override >>> org-property-drawer-re in relevant *-change-function. This seems to work >>> for property drawers. However, I am not sure if similar problem may >>> happen in some border cases with ordinary drawers or blocks. > > I already specified what parts were "sensitive" in a previous message. > >>> 2. I have noticed that results of org-element-at-point and >>> org-element-parse-buffer are not always consistent. > > `org-element-at-point' is local, `org-element-parse-buffer' is global. > They are not equivalent, but is it an issue? > > > Regards, > > -- > Nicolas Goaziou -- Ihor Radchenko, PhD, Center for Advancing Materials Performance from the Nanoscale (CAMP-nano) State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg