From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Matt Armstrong Newsgroups: gmane.emacs.devel Subject: Re: Handling extensions of programming languages Date: Sat, 20 Mar 2021 10:02:45 -0700 Message-ID: <87im5lhi6i.fsf@rfc20.org> References: <87o8ff560t.fsf@hajtower> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16925"; mail-complaints-to="usenet@ciao.gmane.io" To: Harald =?utf-8?Q?J=C3=B6rg?= , Emacs Developer List Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Mar 20 18:04:22 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lNf1Q-0004Gl-Mh for ged-emacs-devel@m.gmane-mx.org; Sat, 20 Mar 2021 18:04:20 +0100 Original-Received: from localhost ([::1]:54728 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lNf1P-00018z-Ms for ged-emacs-devel@m.gmane-mx.org; Sat, 20 Mar 2021 13:04:19 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59724) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNf05-0008Ja-DH for emacs-devel@gnu.org; Sat, 20 Mar 2021 13:02:57 -0400 Original-Received: from relay11.mail.gandi.net ([217.70.178.231]:38263) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNf02-0005YC-NC for emacs-devel@gnu.org; Sat, 20 Mar 2021 13:02:57 -0400 Original-Received: from mdeb (24-113-169-116.wavecable.com [24.113.169.116]) (Authenticated sender: matt@rfc20.org) by relay11.mail.gandi.net (Postfix) with ESMTPSA id 23888100002; Sat, 20 Mar 2021 17:02:49 +0000 (UTC) Original-Received: from matt by mdeb with local (Exim 4.94) (envelope-from ) id 1lNezt-0000uE-8D; Sat, 20 Mar 2021 10:02:45 -0700 In-Reply-To: <87o8ff560t.fsf@hajtower> Received-SPF: pass client-ip=217.70.178.231; envelope-from=matt@rfc20.org; helo=relay11.mail.gandi.net X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:266655 Archived-At: haj@posteo.de (Harald J=C3=B6rg) writes: > today I'm looking for advice or hints how to deal with a task for > CPerl mode which might have been solved for other programming > languages: How to handle extensions of the language. [...] > Background: In Perl, adding new syntax to the language is easy enough > so that many developers have done this [...] > My first approach was to keep all the code in one place and evaluate > all the font-lock and indenting variables at runtime, as buffer-local > variables, for the different versions. This works to some extent for > highlightingq, but fails if an extension needs different logic for > indentation. I'm not an expert in this topic it pertains to Emacs itself, but I've always editor and development tools interesting and so have paid attention to these issues over the years. Very good Emacs support for languages with flexible syntax, which have a high level of faithfulness to the language, or even "perfect" faithfulness, all seem to rely on tools native to the language and external to Emacs, usually by way of some sort of external server. Examples: SLIME and Sly for Common Lisp, https://www.racket-mode.com/ for Racket, and, to a lesser degree of functionality, every language with LSP support, especially C++ (which is known to be effectively impossible to parse faithfully without what amounts to an entire compiler frontend). Indentation (formatting) source code is part of the LSP protocol. The common theme seems to be using the interpreter/compiler itself to parse, without relying on the editor to understand the code deeply. For a different approach, you have examples of complete or nearly complete parsers written in Emacs Lisp. There is at least one parser for Javascript that was at one time fully compliant with the language standard to the point of providing a full parse tree to Lisp (https://elpa.gnu.org/packages/js2-mode.html). The CEDIT package has some complex parser technology. cc-mode for the C family of languages is surprisingly good. The drawback here is that, by design, any syntax extensions and local mini-DSLs, etc., must also have parsers written in Emacs Lisp. You see this issue with js2-mode, where it lags the current language standard a bit. (info "(ccmode)Custom Macros") is an example of how cc-mode supports a limited form of syntax extension. I think most modes in Emacs Lisp take a pragmatic approach, using heuristics that get the job done most of the time without being too computationally expensive. The SMIE package is a generalization of this idea, see (info "(elisp)SMIE"). I am not aware of anything like SMIE that allows for languages extensions to be "plugged in" in a general way. In languages that support 'embeddng' other languages in sub-sections of code (e.g. CSS or PHP in HTML), the kinds of approaches seen at https://www.emacswiki.org/emacs/MultipleModes have been tried.