From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) Date: Thu, 02 Apr 2020 01:24:48 -0800 Message-ID: <86lfnegrz3.fsf@stephe-leake.org> References: <83369o1khx.fsf@gnu.org> <83imijz68s.fsf@gnu.org> <831rp7ypam.fsf@gnu.org> <86wo6yhj4d.fsf@stephe-leake.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="73540"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (windows-nt) To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Apr 02 11:25:39 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jJw6V-000IzJ-DD for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 11:25:39 +0200 Original-Received: from localhost ([::1]:35916 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJw6U-0007c6-Fo for ged-emacs-devel@m.gmane-mx.org; Thu, 02 Apr 2020 05:25:38 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53329) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jJw5p-0006mB-JQ for emacs-devel@gnu.org; Thu, 02 Apr 2020 05:24:59 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jJw5n-0001Rh-FH for emacs-devel@gnu.org; Thu, 02 Apr 2020 05:24:56 -0400 Original-Received: from gateway21.websitewelcome.com ([192.185.45.191]:37047) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jJw5n-0001RB-5b for emacs-devel@gnu.org; Thu, 02 Apr 2020 05:24:55 -0400 Original-Received: from cm16.websitewelcome.com (cm16.websitewelcome.com [100.42.49.19]) by gateway21.websitewelcome.com (Postfix) with ESMTP id 9F756400EC48F for ; Thu, 2 Apr 2020 04:24:53 -0500 (CDT) Original-Received: from host2007.hostmonster.com ([67.20.76.71]) by cmsmtp with SMTP id Jw5ljQbQf8vkBJw5ljFFI5; Thu, 02 Apr 2020 04:24:53 -0500 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=stephe-leake.org; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:In-Reply-To:Date:References:Subject:To:From:Sender: Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=WzhzKL8bche6PZ9FPodT8NbQaQrPcDkiZ3Cz4cHstOw=; b=dfbzVq4pswVrDfl6Ha1m1PPJL4 gSXGP4Vs/oB9cVL99J3LRaJ6xVPOpYAbsLoQm/veN5iemWVdhKwpLVhqCGK+X1AIGmEtov13tKcn/ q1MV30P7W190k4i/d0MSvaFBhL5XPP73w5i2qXmdSi1akAKoXbU9VSzKNn877Yeppp3V0/HY23Plj GNIicqzJb6zfYKTemHEzsD4sOl/5i1fA1+X4xwQUqex5QeLHRoKdfrAAPXtuLWHtFTxaiJP1dAlyT Sn5SryepjuslVWTmFmtN5srPTWa5WpwOVO7ly2w/bkHd1Sp4kOZvXd4NoWPYNktyL2Y0qoRRhTr4B l/AXkzBQ==; Original-Received: from [76.77.182.20] (port=63159 helo=Takver4) by host2007.hostmonster.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1jJw5l-003S7p-1E for emacs-devel@gnu.org; Thu, 02 Apr 2020 03:24:53 -0600 In-Reply-To: (=?utf-8?Q?=22Tu=E1=BA=A5n-Anh_Nguy=E1=BB=85n=22's?= message of "Thu, 2 Apr 2020 12:21:49 +0700") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host2007.hostmonster.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stephe-leake.org X-BWhitelist: no X-Source-IP: 76.77.182.20 X-Source-L: No X-Exim-ID: 1jJw5l-003S7p-1E X-Source-Sender: (Takver4) [76.77.182.20]:63159 X-Source-Auth: stephen_leake@stephe-leake.org X-Email-Count: 1 X-Source-Cap: c3RlcGhlbGU7c3RlcGhlbGU7aG9zdDIwMDcuaG9zdG1vbnN0ZXIuY29t X-Local-Domain: yes X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 192.185.45.191 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:246242 Archived-At: Tu=E1=BA=A5n-Anh Nguy=E1=BB=85n writes: >> > My suggestion is first to figure out how to do this stuff efficiently >> > from within Emacs itself, as if the module interface were not part of >> > the equation. We can add that aspect back later. >> >> There are two times the wisi code that wraps the parser needs access to >> the buffer; first to copy the text, second to add text properties >> (faces, indent values, navigation markers). There are usually many text >> properties output by each parse. >> >> The positions and values of the text properties are computed by >> functions that run after the complete syntax tree has been produced. In >> wisi, those functions are added directly in the grammar source file >> (where they are called "post-parse grammar actions"). In tree-sitter, I >> assume they are called from some mode-author-written code that traverses >> the syntax tree (wisi provides that internally). Except I see below that >> the emacs tree-sitter package stores the syntax tree in the buffer. >> > > The preferred approach with tree-sitter is querying: > https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-= with-queries For access to the syntax tree, yes. There must be code somewhere that computes face, indent and navigation (and code completion, ...). That code will build on top of the syntax tree access; it could be in Rust (in the module) or in elisp (using the module functions). Or in C linked directly into Emacs, as Eli suggests. But I don't think he meant that as an actual implementation approach, just as a design approach. The wisi Ada code that computes the text properties accesses the syntax tree more directly, but that's just an implementation detail. I think it makes sense at this point to try to merge wisi and emacs-tree-sitter. There are several approaches: 1. rewrite the wisi grammar actions in elisp, using the emacs-tree-sitter module functions to access the syntax tree. 2. rewrite the wisi grammar actions in Rust, using Rust functions to access the syntax tree 3. rewrite the emacs-tree-sitter module in Ada, using an Ada binding to the Tree-Sitter C API. Then the Emacs module would provide the current wisi Ada code, modified to work with a Tree-Sitter parser. 4. There would also be value in doing an independent design and implementation of code to compute face, indent and navigation using the tree-sitter syntax tree; there might be a better approach than what wisi does. 1 is probably the quickest path to getting something working, but 2 or 3 will probably provide faster execution time. Ideally we'd do all three (or four) and get some good metrics. After doing one of the above, we must still write the calls to the grammar actions for each language of interest. In wisi, this is done by adding grammar actions to the grammar source code; for example, here is the indent action for the Ada 'if then end if' statement: if_statement : IF expression_opt THEN sequence_of_statements_opt END IF SEMICOLON %((wisi-indent-action [nil [(wisi-hanging% ada-indent-broken (* 2 ada-i= ndent-broken)) ada-indent-broken] nil [ada-indent ada-indent] nil nil nil]))% There is one lisp form for each token in the grammar production. IF is not indented by this action; it is indented by the enclosing Ada statement. The conditional expression is indented by wisi-hanging; comments within the expression (assuming it is multi-line) are indented by ada-indent-broken. wisi-hanging takes care of indenting the second line in a long expression. THEN, END IF SEMICOLON are not indented. The statements in the true branch are indented by ada-indent. If ada-indent is 3, ada-broken-indent 2, this produces: if a or b -- a comment then statement_1; statement_2; -- another comment end if; In the upstream development repository for the wisi package (https://savannah.nongnu.org/projects/ada-mode/), there is a user guide to the grammar actions. I can provide an html or info version on request. (or get my act together and do another wisi/ada-mode release). In Tree-Sitter, the calls to the grammar actions are written in code that traverses the syntax tree; this would be a higher level elisp or Rust function, for 1 and 2 above. --=20 -- Stephe