From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.devel Subject: Re: Exposing buffer text modifications to Lisp (was: Tree-sitter integration on feature/tree-sitter) Date: Sat, 18 Jun 2022 13:52:59 +0800 Message-ID: <878rpuwm9w.fsf@localhost> References: <2c2746e5f2558a87e8eab6f0914264a020173a9d.camel@pm.me> <27630AA3-8026-4E24-8852-ACCD9325B99D@gmail.com> <0E9E702B-B07C-4794-8498-29B9320E14CC@gmail.com> <871qvorqvv.fsf@localhost> <83tu8jq2vl.fsf@gnu.org> <87sfo37etn.fsf@localhost> <834k0jplcm.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32371"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jun 18 07:54:42 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o2RPt-0008Ig-9G for ged-emacs-devel@m.gmane-mx.org; Sat, 18 Jun 2022 07:54:41 +0200 Original-Received: from localhost ([::1]:60030 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o2RPr-0005wi-W3 for ged-emacs-devel@m.gmane-mx.org; Sat, 18 Jun 2022 01:54:40 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:47952) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o2RND-0003vx-9a for emacs-devel@gnu.org; Sat, 18 Jun 2022 01:52:00 -0400 Original-Received: from mail-pl1-x62e.google.com ([2607:f8b0:4864:20::62e]:41473) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1o2RNB-0006oW-Js; Sat, 18 Jun 2022 01:51:55 -0400 Original-Received: by mail-pl1-x62e.google.com with SMTP id g8so5480028plt.8; Fri, 17 Jun 2022 22:51:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=/GV+rTJDrQ0pPa8Opa9e5eMbTicJjUIefl6HbkDBUdA=; b=mQUIqsVDrnXP5/tEduI98nbVnkTguJmxZnkwxs8EnS3vGZCvo5v46h1ZtRaXwUTLCE jUDrhq7tEt+fOJmWd1JkJUUxXYSxsFluFgwnz63vDUjVHJTDFzzsK1365zXZpPfHbYNh u7xpCQJkAc+3fdfAAXU3C/Smnk3NB+GHexpSrudHGNOG3qWWLl2zwcU4uM0r4NXwm1yC pWAhd3sZ8dT1lmFz8KJD4GdKtOuNlcptmR2HQxmEhe5i4HEuB0dpoAgLVIsL5QSJC4dj 9eJhUMZAKhrHYGX/nm+aSJ70XTo+wBZnDkxKhW+EE/Tms97BqcGKfubIfQxacG4ssEAW 9HZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=/GV+rTJDrQ0pPa8Opa9e5eMbTicJjUIefl6HbkDBUdA=; b=uAUlrR7aorZnj06we7z2wtADNhmkjaM4shMlf4Jxdd1DVp427uYC3qqPxRVvX7lED8 JGe1YN6Q93w888+4+bniYoHv8hdoYOQsPZImDAHkB6EbX2wuL8KihaGf/CfmFkmMX672 Oot7k9XU/+5edDQ2YIfh1ukKPNpced749ETN72E50Mr3rGA1GquQu5nJtoZnq0W0Q430 fSeGKp2duqPMRO/E92mTnPRw92iU/wfPlFw3pw2XYl2e/+1HLTFIwcADUgbnRtLxlOHy ExoTt/QL3CM2po0niWwe5RhVJBRA1KWzK950jPe3NH+r4GjiYRdtF1++EA41PdYKsyjI yaXQ== X-Gm-Message-State: AJIora+naCFmesYzJmeJBifZWH+36wrhFohY182awzrNMlO0dgIrlS6r g+npqcfx/OPp2J45qd14xB+tZZWRM9N5dARo X-Google-Smtp-Source: AGRyM1sIj6XGP6zpk+9NIGSgl6KYk5j11eGXzwLghdErYOrlCCaWhILIYXTddv8J1CS3dLCS9nPxog== X-Received: by 2002:a17:902:7282:b0:164:17f6:e36a with SMTP id d2-20020a170902728200b0016417f6e36amr12808855pll.139.1655531511474; Fri, 17 Jun 2022 22:51:51 -0700 (PDT) Original-Received: from localhost ([192.161.177.252]) by smtp.gmail.com with ESMTPSA id a6-20020a1709027e4600b0015e8d4eb218sm4455263pln.98.2022.06.17.22.51.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 22:51:50 -0700 (PDT) In-Reply-To: <834k0jplcm.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::62e; envelope-from=yantar92@gmail.com; helo=mail-pl1-x62e.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:291333 Archived-At: Eli Zaretskii writes: > [I've changed the Subject, since this is not longer about tree-sitter.] Well. I had some hope that we can generalize the tree-sitter interface to allow Elisp-based parsers, but it is just a wish. > OK, but that still doesn't tell what you need from the Emacs core. > Can you describe those needs? I presume that modification hooks (of > any kind) are just the means; the real need is something else. What > is it? If (as I presume) you need to know about changes to the > buffer, then can you enumerate the changes that are of interest? For > example, are changes in text properties and overlays of interest, and > if so, what kind of properties/overlays? (But please don't limit your > answers to just text properties and overlays, because I asked about > them explicitly.) Valid question. I am a bit too familiar with Org parser code and assume that some things are "obvious" when they are not. I will first answer about AST. > Next, what kind of ASTs do you want to build, and how do you > represent text as AST? In particular, is the AST defined by regexps > or some other Lisp data structures? Org AST represents semantic objects using nested lists. Similar to tree-sitter (AFAIU), each object in the tree is represented by (object-type (object-plist) object-children ...) for example: * test headline :tag: is represented as (headline (:raw-value "test headline" :begin 292 :end 314 ... :tags ("tag") ... :parent (...)) ;; no children ) Upon modifying text inside the headline, we need to update :begin/:end properties to reflect the new headline boundaries in buffer and possibly update headline properties (e.g. :tags). The same should be done for all the elements containing the headline. Updating the elements require the following information: 1. Whether modified text contained terminal symbols or text contributing to object-plist _before_ modification. 2. The boundaries of the edited text in buffer and change in the text length. 3. Whether the modified text contain terminal symbols/text contributing to object-plist _after_ modification. Org does not care about text property changes or overlay changes. We just perform a series of regexp searches over the changed parts of buffer (possibly with extended boundaries) before and after the modification + know which region of text has been modified (its begin, end, and change in length). Missing any significant change (the one involving terminal symbols or changing region length) will make the AST invalid. Hope it clarifies the needs. Best, Ihor