From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.devel Subject: Re: Exposing buffer text modifications to Lisp Date: Mon, 20 Jun 2022 19:58:31 +0800 Message-ID: <878rpr4kd4.fsf@localhost> References: <2c2746e5f2558a87e8eab6f0914264a020173a9d.camel@pm.me> <27630AA3-8026-4E24-8852-ACCD9325B99D@gmail.com> <0E9E702B-B07C-4794-8498-29B9320E14CC@gmail.com> <871qvorqvv.fsf@localhost> <83tu8jq2vl.fsf@gnu.org> <87sfo37etn.fsf@localhost> <834k0jplcm.fsf@gnu.org> <878rpuwm9w.fsf@localhost> <83mteao3oj.fsf@gnu.org> <87edzmv3i0.fsf@localhost> <83k09eo1p5.fsf@gnu.org> <878rpuv17q.fsf@localhost> <83fsk2nyrm.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29950"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jun 20 13:58:05 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o3G2f-0007c5-9T for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Jun 2022 13:58:05 +0200 Original-Received: from localhost ([::1]:36794 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o3G2e-0001Tg-3S for ged-emacs-devel@m.gmane-mx.org; Mon, 20 Jun 2022 07:58:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48448) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o3G20-0000mX-Sa for emacs-devel@gnu.org; Mon, 20 Jun 2022 07:57:24 -0400 Original-Received: from mail-qk1-x72b.google.com ([2607:f8b0:4864:20::72b]:45693) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1o3G1y-0002aP-O7; Mon, 20 Jun 2022 07:57:24 -0400 Original-Received: by mail-qk1-x72b.google.com with SMTP id x75so7581538qkb.12; Mon, 20 Jun 2022 04:57:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=ZJqK+TOT86uUTHIjdshezNyMk4mdt1QdszWLMEdWcaQ=; b=VmJtWIWDzhafQkgKuJTPK+JLowX9O1Xb3gboP9+Kt3ZEbTVIk0PiSm3WewFp475yj6 E6xA1xnUdxrwvrfq1V/Po2SYs0DO76/V0gr6oQsbIfCQrKePmcFS/Z/1vMw+4MHC5xe3 hJ1/aBgTvw8beCohGfUJ1zEQ2lYI7dVhp6Lr285e5PFLVtohQQjCfbFbmKalrA86nDAB sjzZpRUM9hb/wJhyDNUldFrYHyZP0gyOBtgVyVyYv7xCshdp7mWU8JVHcHGHiKK5Fulb lmb+GA3IVKONZZMarN98gXTA2EBrB7pS5wwDvpzcgAaUkVYRplN+v4EO0SFwJO7HrZ6f 4wEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=ZJqK+TOT86uUTHIjdshezNyMk4mdt1QdszWLMEdWcaQ=; b=UXn4L6BBLLGEj4Wbohqc2eb+lUJLpaYmfVlZhLe4cuqP14Qoxzh5IabsNP1fWqk8K5 kbrsDbhElOY5a7+fPFxJ+V9WbPvmNPDH8dSMhCe3VBvTBW3dkoa3X0qnP1ToFBa8uT4w zOuWOr0QzXrJI5UrrMc8BmAIPQqewwweKPgJoESWNuEPdF3BDf6ozDS0QCi2lQp085MX mALC96ehU6ngkrLzwbtCmaGsCgz1A5lBUbfNTlzJHs/aac/CwUZ0S0YKWi70Gc8PG99a UYBa7fVhUiClI5iruYTYeDpGd78InPazVIlOVPRL1M32x5+wxUqQweh8qFjJy5mVIVl6 pRZQ== X-Gm-Message-State: AJIora+o3GgU8RERJ6zuTp1EJuAWcc1rKVaNsnBbV7feNbejDR1sHOmC pt4am2cUOYErQ+4taiBnZHfF6uRfZb8= X-Google-Smtp-Source: AGRyM1vP+V1TAhgAGdNqLTRa1truXwGGFtRWwgRKmfaucLsqYSZjhEd23KL7KwqWk+KC4+e/R6B2VA== X-Received: by 2002:a05:620a:459f:b0:6a6:cde5:4f73 with SMTP id bp31-20020a05620a459f00b006a6cde54f73mr15666035qkb.450.1655726240469; Mon, 20 Jun 2022 04:57:20 -0700 (PDT) Original-Received: from localhost ([66.150.196.58]) by smtp.gmail.com with ESMTPSA id cm25-20020a05622a251900b00304eaca5e5csm7812727qtb.73.2022.06.20.04.57.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 04:57:19 -0700 (PDT) In-Reply-To: <83fsk2nyrm.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::72b; envelope-from=yantar92@gmail.com; helo=mail-qk1-x72b.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:291461 Archived-At: Eli Zaretskii writes: >> > We already have a TODO item for making markers more efficient; any >> > takers? >> >> This is trickier than it may appear. >> Each element in Org AST has 3-7 markers. >> My real-life large org buffer contains ~200k Org syntax elements >> (actually more, but not all the elements are ever queried). >> So, we are talking about 600k-1.4M markers in buffer if Org AST were to >> use markers. >> >> Now, imagine an edit somewhere near the beginning of Org buffer. Such >> edit means that Emacs will have to shift positions of nearly all the >> markers in the buffer. All the >1M markers. On every >> self-insert-command. > > The inner loop of adjust_markers_for_insert is just 40 machine > instructions. (This is in unoptimized code; it could be fewer > instruction in an optimized build.) Assuming a 3GHz CPU clock, 40 > instructions should take just 13 nsec, and 1 million of these should > take 13 milliseconds -- a very short time indeed. I expect that to be > between 5 and 7 msec in an optimized build. > > (Compare that with inserting the characters itself: the first > insertion could potentially mean moving the gap, which in a large > buffer means moving megabytes of bytes -- not a negligible feat.) Noted. Does Emacs C code provide any generic tree structure implementation? > So I don't think the performance degradation due to markers is because > the insert/delete operations on buffer text need to update many > markers. I think the real slowdown comes from the functions which > convert character positions to byte positions and vice versa: these > use markers. There are a lot of such calls all over our code, and > that's where the current linear-linked-list implementation of markers > slows us down. > > Of course, the right method to show the bottleneck(s) is to profile > the code with a tool like 'prof', and take it from there. So here's > one more interesting job for someone to volunteer. That's what I did in https://orgmode.org/list/87y21wkdwu.fsf@localhost: >>> The bottleneck appears to be buf_bytepos_to_charpos, called by >>> BYTE_TO_CHAR macro, which, in turn, is used by set_search_regs >>> buf_bytepos_to_charpos contains the following loop: >>> >>> for (tail = BUF_MARKERS (b); tail; tail = tail->next) >>> { >>> CONSIDER (tail->bytepos, tail->charpos); >>> >>> /* If we are down to a range of 50 chars, >>> don't bother checking any other markers; >>> scan the intervening chars directly now. */ >>> if (best_above - bytepos < distance >>> || bytepos - best_below < distance) >>> break; >>> else >>> distance += BYTECHAR_DISTANCE_INCREMENT; >>> } >>> >>> I am not sure if I understand the code correctly, but that loop is >>> clearly scaling performance with the number of markers >> Org parser goes around this issue by updating AST positions on idle and >> maintaining asynchronous request queue. This works relatively well >> because AST queries are skewed to be near the buffer region being >> edited. I am not sure if similar approach (not trivial to start with) >> can be efficiently utilized by Emacs. IDK the typical marker access >> pattern in Emacs core. > > If you already have a workaround for marker-related problems, then why > do you need to hook into insertion and deletion on the lowest level? Because the workaround relies on before/after-change-functions that may be suppressed by bad third-party code. Also, markers will not solve all the needs of Org parser even when they become more efficient. As I mentioned earlier, we also need to keep track whether terminal symbols appear in the changed text before/after modification. It boils down to matching regexps around changed region in buffer before/after each modification. Suppressed before/after-change-functions ruin this logic as well. > And that is my long-standing gripe aimed at developers of 3rd party > packages: they should come here (or bug-gnu-emacs@gnu.org) and present > the cases where they needed some missing infrastructure, instead of > trying to jump through hoops to work around what they perceive as > Emacs restrictions that (they think) cannot be possibly lifted. Doing > the former will have at least two benefits: (a) it will facilitate > Emacs development into a better platform, and (b) it will avoid giving > birth to some of the horrible kludges out there, which eventually > don't work well enough, and thus make Emacs seem less professional > than it should be. > > And if that is my expectation from developers of 3rd party packages, I > definitely expect that from packages that are bundled, such as Org. > Since Org is basically part of the core Emacs, it makes little sense > to me to realize that it goes to such lengths trying to work around > the limitations, instead of asking the core team to improve the > existing implementation or add some missing ones. I could perhaps > understand if the request existed, but no one volunteered to work on > it, but not having the requests in the first place I cannot > understand. I think I need to clarify my position here. The important thing you need to know about Org is that it does not only support Emacs version Org is bundled with. We currently support Emacs >=26. See https://orgmode.org/worg/org-maintenance.html#emacs-compatibility So, any major feature implemented in the development version of Emacs cannot be easily used. The new feature will mean doubling the relevant code on Org side: (1) supporting the new feature; (2) compatibility layer to support older Emacs versions. Which means extra maintenance. When I am also asked to implement the patch for this new feature for Emacs, I get triple work. Moreover, my previous attempt to propose a patch required for Org was sunk in the depths of emacs-devel threads. (It was a patch for isearch.el and it does not apply anymore onto master. I plan to re-submit it when I get more time and interest. Just FYI) Having said that, I do know that it is a better thing to reach Emacs when new feature is really beneficial. But I hope that my previous explanation clarifies why there is a friction (at least, it is the case for me personally) to contribute to Emacs. Emacs core-related items tend to go down towards the end of todo lists. Best, Ihor