From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: "Raw" string literals for elisp Date: Wed, 8 Sep 2021 11:30:35 +0000 Message-ID: References: <4209edd83cfee7c84b2d75ebfcd38784fa21b23c.camel@crossproduct.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27361"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Anna Glasgall Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Sep 08 13:31:49 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mNvnx-00071q-St for ged-emacs-devel@m.gmane-mx.org; Wed, 08 Sep 2021 13:31:49 +0200 Original-Received: from localhost ([::1]:49844 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mNvnw-0000WS-Pf for ged-emacs-devel@m.gmane-mx.org; Wed, 08 Sep 2021 07:31:48 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57402) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mNvms-0008BY-0X for emacs-devel@gnu.org; Wed, 08 Sep 2021 07:30:46 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:47597 helo=mail.muc.de) by eggs.gnu.org with smtp (Exim 4.90_1) (envelope-from ) id 1mNvmo-0004oz-G8 for emacs-devel@gnu.org; Wed, 08 Sep 2021 07:30:41 -0400 Original-Received: (qmail 45160 invoked by uid 3782); 8 Sep 2021 11:30:35 -0000 Original-Received: from acm.muc.de (p4fe15ce6.dip0.t-ipconnect.de [79.225.92.230]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Wed, 08 Sep 2021 13:30:35 +0200 Original-Received: (qmail 7063 invoked by uid 1000); 8 Sep 2021 11:30:35 -0000 Content-Disposition: inline In-Reply-To: <4209edd83cfee7c84b2d75ebfcd38784fa21b23c.camel@crossproduct.net> X-Submission-Agent: TMDA/1.3.x (Ph3nix) X-Primary-Address: acm@muc.de Received-SPF: pass client-ip=193.149.48.1; envelope-from=acm@muc.de; helo=mail.muc.de X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:274328 Archived-At: Hello, Anna. Just as a matter of context, I implemented C++ raw strings, and recently enhanced the code also to handle other CC Mode derived languages such as C# and Vala. On Tue, Sep 07, 2021 at 21:49:33 -0400, Anna Glasgall wrote: > [My previous message appears to have been eaten, or at least it's not > showing up in the archive; resending from a different From: address. > Apologies for any duplication] > Hello Emacs developers, > I've long been annoyed by the number of backslashes needed when using > string literals in elisp for certain things (regexes, UNC paths, etc), > so I started work on a patch (WIP attached) to implement support for > "raw" string literals, a la Python r-strings. These are string literals > that work exactly like normal string literals, with the exception that > backslash escapes (except for \") are not processed; \ may freely > appear in the string without need to escape. I've made good progress, > but unfortunately I've run into a roadblock and am not sure what to do > next. One not so small point. How do you put a backslash as the _last_ character in a raw string? If this is difficult, it may well be worth comparing other languages with raw strings. C++ Mode has a complicated system of identifiers at each end of the raw string (I'm sure you know this). C# represents a " inside a multi-line string as "". Vala (and, I believe, Python) have triple quote delimters """ and cannot represent three quotes in a row inside the multi-line string. It is probably worth while stating explicitly that Elisp raw strings can be continued across line breaks without having to escape the \n. > I've successfully taught the elisp reader (read1 in lread.c) how to > read r-strings. I thought I had managed to make lisp-mode/elisp-mode > happy by allowing "r" to be a prefix character (C-x C-e and the > underlying forward-sexp/backward-sexp seemed to work fine at first), > but realized that I ran into trouble with strings containing the > sequence of characters '\\"'. > The reader correctly reads r"a\\"" as a string containing the sequence > of characters 'a', '\', '"', and M-: works. Unfortunately, if I try > sexp-based navigation or e.g. C-x C-e, it falls apart. The parser in > syntax.c, which afaict is what lisp-mode is using to try and find sexps > in buffer text, doesn't seem to know what to do with this expression. > I've spent some time staring at syntax.c, but I must confess that I'm > entirely defeated in terms of what changes need to be made here to > teach this other parser about prefixed strings in where the prefix has > meaning that affects the interpretation of the characters between > string fences. You probably want to use syntax-table text properties. See the page "Syntax Properties" in the Elisp manual. In short, you would put, say, a "punctuation" property on most backslashes to nullify their normal action. Possibly, you might want such a property on a double quote inside the string. You might also want a property on the linefeeds inside a raw string. With these properties, C-M-n and friends will work properly. Bear in mind that you will also need to apply and remove these properties as the user changes the Lisp text, for example by removing a \ before a ". There is an established mechanism in Emacs for this sort of action (which CC Mode doesn't use) which I would advise you to use. > I've attached a copy of my WIP patch; it's definitely not near final > code quality and doesn't have documentation yet, all of which I would > take care of before submitting for inclusion. I also haven't filled out > the copyright assignment paperwork yet, but should this work reach a > point where it was likely to be accepted, I'd be happy to do that. Thanks! > I'd very much appreciate some pointers on what to try next here, or > some explanation of how syntax.c/syntax.el works beyond what's in the > reference manual. If this is a fool's errand I'm tilting at here, I'd > also appreciate being told that before I sink more time into it :) It is definitely NOT a fool's errand. There may be some resistance to the idea of raw strings from traditionalists, but I hope not. It would be worth your while really to understand the section in the Elisp manual on syntax and all the things it can (and can't) do. Help is always available on emacs-devel. You're going to have quite a bit of Lisp programming to do. For example, font-lock needs to be taught how to fontify a raw string. But at the end of the exercise, you will have learnt so much about Emacs that you will qualify as a fully fledged contributor. :-) > thanks, > Anna Glasgall -- Alan Mackenzie (Nuremberg, Germany).