From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: Using incremental parsing in Emacs Date: Fri, 03 Jan 2020 11:39:50 -0800 Message-ID: <86zhf4gwhl.fsf@stephe-leake.org> References: <83blrkj1o1.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="218219"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (windows-nt) To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Jan 03 20:40:10 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1inSnp-000uUb-7u for ged-emacs-devel@m.gmane.org; Fri, 03 Jan 2020 20:40:09 +0100 Original-Received: from localhost ([::1]:56084 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1inSno-0008CX-17 for ged-emacs-devel@m.gmane.org; Fri, 03 Jan 2020 14:40:08 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59067) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1inSnh-0008CK-Rb for emacs-devel@gnu.org; Fri, 03 Jan 2020 14:40:03 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1inSnf-00050X-Fb for emacs-devel@gnu.org; Fri, 03 Jan 2020 14:40:00 -0500 Original-Received: from gateway31.websitewelcome.com ([192.185.143.36]:19778) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1inSne-0004tt-QO for emacs-devel@gnu.org; Fri, 03 Jan 2020 14:39:59 -0500 Original-Received: from cm14.websitewelcome.com (cm14.websitewelcome.com [100.42.49.7]) by gateway31.websitewelcome.com (Postfix) with ESMTP id 0F5F92B46 for ; Fri, 3 Jan 2020 13:39:56 -0600 (CST) Original-Received: from host2007.hostmonster.com ([67.20.76.71]) by cmsmtp with SMTP id nSnbiiD5K4kpjnSnbiCndE; Fri, 03 Jan 2020 13:39:56 -0600 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=stephe-leake.org; s=default; h=Content-Type:MIME-Version:Message-ID: In-Reply-To:Date:References:Subject:To:From:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=oAwXKcd5EwysrM6JwXvdlSJ4rA/Dfk8qXWMUotGAROA=; b=KDr/R7W0bMxN5OnlmBK1VHAag Y/ERevEXQFJlc8xR7Ja3X5e3T3wash/qpXb5le8BGWu8xAIqodbQ16DpuMs9tkoOE5AaJUgisL5Ls KMl4Yvrng3Y7Z0ZaBVkkQOYx15gynlN80xo51n6FW0qVt0q06kuEKtNp2HKniB6tK7RJio4aqsDcs UZ0L3bTomTSVIL/TWQgACu23UXzQ2EOECWpg80S5OuNr6mwUxHIzLy6t5BU8Brn+Y8pR0R3I+ObWu 1xfuY24iIaL6SxYz1bjgK0UWXGiFPsEQDoor0IXYCY84r9WmLmlOQ4u6GXCQSSKq10KRdjnf/PcfA oYAOsiiGQ==; Original-Received: from [76.77.182.20] (port=54900 helo=Takver4) by host2007.hostmonster.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1inSnb-000Gy9-ER for emacs-devel@gnu.org; Fri, 03 Jan 2020 12:39:55 -0700 In-Reply-To: <83blrkj1o1.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 03 Jan 2020 12:05:02 +0200") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host2007.hostmonster.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stephe-leake.org X-BWhitelist: no X-Source-IP: 76.77.182.20 X-Source-L: No X-Exim-ID: 1inSnb-000Gy9-ER X-Source-Sender: (Takver4) [76.77.182.20]:54900 X-Source-Auth: stephen_leake@stephe-leake.org X-Email-Count: 1 X-Source-Cap: c3RlcGhlbGU7c3RlcGhlbGU7aG9zdDIwMDcuaG9zdG1vbnN0ZXIuY29t X-Local-Domain: yes X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 192.185.143.36 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:243898 Archived-At: Eli Zaretskii writes: > Would someone like to try to figure out how we could use the > incremental parsing technology in Emacs for making our > programming-language support more accurate and efficient? GNU ELPA ada-mode is an existing example; it has a full language parser (error-correcting generalized LR), that supports some advanced navigation. It could be extended to do some code completion. Instead of "incremental parsing" (which updates an existing syntax tree given source changes) it uses "partial parsing" (parsing only part of a file) and very robust error handling. It works very well on very large Ada files (it is in production use by Eurocontrol and others). Error correction is critical, since buffers are normally not syntactically correct during editing. I've tried using the same parser generator on Java and Python; the results are not as good as for Ada (apparently Ada lends itself to LR parsing better than those languages). That might be improved by massaging the grammar, but that risks implementing not-quite-Java, not-quite-Python. Others mentioned LSP (https://langserver.org/); that method supports incremental parsing, since it is centered on sending source edits from the editor to the language server (after sending the full text once). It also supports algorithms that require more than one source file, since all files involved in a project can be loaded into the same language server instance (the ada-mode parser is strictly one file). That allows providing completion on parameters for functions declared in other files, for example. Many editors are moving to support LSP; that allows them to take advantage of any parser technology developed independently. ada-mode has its own protocol between elisp and the external parser, provided by the GNU ELPA wisi package (the ada-mode parser was started before LSP). The parser in ada-mode could be used in an LSP language server. So I think the short answer to your post is "GNU ELPA eglot", with possibly some work importing some of that into core to make it more efficient. eglot is currently listed as "incompat" in *Packages* (in both emacs 27 and 26); I don't know why. I have not tried eglot; I don't know how complete it is. There is also https://github.com/emacs-lsp/lsp-mode. The syntax used for expressing the grammar is usually fairly tightly tied to the language and/or the parser generator; trying to generalize that for all languages supported by Emacs is a huge task, not worth doing. With LSP, building a grammar for a langauge is done once for each language server. Whether the language server is implemented as an external process, or as a loadable module, is an implementation detail. ada-mode uses an external process, mostly because it was started before modules were stablilized. The communications between the language server and elisp (whether ada-mode style or LSP) involves sending text, not binary data (and _not_ pointers into the emacs buffer!). Doing that via the module interface vs pipes to a process is a wash for speed. Using a process fully isolates the server code from emacs, eliminating any possible third-party library version conflicts. It could be possible to implent an LSP language server in elisp, running in a separate thread (or even the same thread; it can be used synchonously). That might be an interesting excercise, and would eliminate other language dependencies. ada-mode used to support an elisp parser generated from the same grammar, but that never supported error correction; implementing very complex algorithms is just easier in a more advanced language (and certainly faster at run time; critical for error correction). -- -- Stephe