unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: Anna Glasgall <anna@crossproduct.net>
Cc: emacs-devel@gnu.org
Subject: Re: "Raw" string literals for elisp
Date: Wed, 8 Sep 2021 11:30:35 +0000	[thread overview]
Message-ID: <YTie26fKdA+2rWP7@ACM> (raw)
In-Reply-To: <4209edd83cfee7c84b2d75ebfcd38784fa21b23c.camel@crossproduct.net>

Hello, Anna.

Just as a matter of context, I implemented C++ raw strings, and recently
enhanced the code also to handle other CC Mode derived languages such as
C# and Vala.

On Tue, Sep 07, 2021 at 21:49:33 -0400, Anna Glasgall wrote:
> [My previous message appears to have been eaten, or at least it's not
> showing up in the archive; resending from a different From: address.
> Apologies for any duplication]

> Hello Emacs developers,

> I've long been annoyed by the number of backslashes needed when using
> string literals in elisp for certain things (regexes, UNC paths, etc),
> so I started work on a patch (WIP attached) to implement support for
> "raw" string literals, a la Python r-strings. These are string literals
> that work exactly like normal string literals, with the exception that
> backslash escapes (except for \") are not processed; \ may freely
> appear in the string without need to escape. I've made good progress,
> but unfortunately I've run into a roadblock and am not sure what to do
> next.

One not so small point.  How do you put a backslash as the _last_
character in a raw string?

If this is difficult, it may well be worth comparing other languages
with raw strings.  C++ Mode has a complicated system of identifiers at
each end of the raw string (I'm sure you know this).  C# represents a "
inside a multi-line string as "".  Vala (and, I believe, Python) have
triple quote delimters """ and cannot represent three quotes in a row
inside the multi-line string.

It is probably worth while stating explicitly that Elisp raw strings can
be continued across line breaks without having to escape the \n.

> I've successfully taught the elisp reader (read1 in lread.c) how to
> read r-strings. I thought I had managed to make lisp-mode/elisp-mode
> happy by allowing "r" to be a prefix character (C-x C-e and the
> underlying forward-sexp/backward-sexp seemed to work fine at first),
> but realized that I ran into trouble with strings containing the
> sequence of characters '\\"'.

> The reader correctly reads r"a\\"" as a string containing the sequence
> of characters 'a', '\', '"', and M-: works. Unfortunately, if I try
> sexp-based navigation or e.g. C-x C-e, it falls apart. The parser in
> syntax.c, which afaict is what lisp-mode is using to try and find sexps
> in buffer text, doesn't seem to know what to do with this expression.
> I've spent some time staring at syntax.c, but I must confess that I'm
> entirely defeated in terms of what changes need to be made here to
> teach this other parser about prefixed strings in where the prefix has
> meaning that affects the interpretation of the characters between
> string fences.

You probably want to use syntax-table text properties.  See the page
"Syntax Properties" in the Elisp manual.  In short, you would put, say,
a "punctuation" property on most backslashes to nullify their normal
action.  Possibly, you might want such a property on a double quote
inside the string.  You might also want a property on the linefeeds
inside a raw string.  With these properties, C-M-n and friends will work
properly.

Bear in mind that you will also need to apply and remove these
properties as the user changes the Lisp text, for example by removing a
\ before a ".  There is an established mechanism in Emacs for this sort
of action (which CC Mode doesn't use) which I would advise you to use.

> I've attached a copy of my WIP patch; it's definitely not near final
> code quality and doesn't have documentation yet, all of which I would
> take care of before submitting for inclusion. I also haven't filled out
> the copyright assignment paperwork yet, but should this work reach a
> point where it was likely to be accepted, I'd be happy to do that.

Thanks!

> I'd very much appreciate some pointers on what to try next here, or
> some explanation of how syntax.c/syntax.el works beyond what's in the
> reference manual. If this is a fool's errand I'm tilting at here, I'd
> also appreciate being told that before I sink more time into it :)

It is definitely NOT a fool's errand.  There may be some resistance to
the idea of raw strings from traditionalists, but I hope not.  It would
be worth your while really to understand the section in the Elisp manual
on syntax and all the things it can (and can't) do.

Help is always available on emacs-devel.

You're going to have quite a bit of Lisp programming to do.  For
example, font-lock needs to be taught how to fontify a raw string.

But at the end of the exercise, you will have learnt so much about Emacs
that you will qualify as a fully fledged contributor.  :-)

> thanks,

> Anna Glasgall

-- 
Alan Mackenzie (Nuremberg, Germany).



  parent reply	other threads:[~2021-09-08 11:30 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08  1:49 "Raw" string literals for elisp Anna Glasgall
2021-09-08  7:10 ` Po Lu
2021-09-08 14:19   ` Anna Glasgall
2021-09-08  7:12 ` Lars Ingebrigtsen
2021-09-08 14:20   ` Anna Glasgall
2021-09-08 11:30 ` Alan Mackenzie [this message]
2021-09-08 14:27   ` Anna Glasgall
2021-09-08 11:34 ` Adam Porter
2021-09-08 13:59   ` Clément Pit-Claudel
2021-09-08 14:12     ` Adam Porter
2021-09-09  3:09   ` Richard Stallman
2021-09-08 13:10 ` Stefan Monnier
2021-09-08 14:31   ` Anna Glasgall
2021-09-08 15:27     ` Mattias Engdegård
2021-09-08 15:41       ` Stefan Kangas
2021-09-08 16:45         ` Mattias Engdegård
2021-09-08 16:01       ` Alan Mackenzie
2021-09-08 18:24         ` Mattias Engdegård
2021-09-08 19:00           ` Alan Mackenzie
2021-09-08 19:22         ` Philip Kaludercic
2021-09-08 19:36           ` Alan Mackenzie
2021-09-08 21:11           ` Stefan Kangas
2021-09-08 21:24             ` Philip Kaludercic
2021-09-09  6:52             ` tomas
2021-09-08 15:54     ` Stefan Kangas
2021-09-08 16:05     ` tomas
2021-09-08 16:42       ` Lars Ingebrigtsen
2021-09-08 20:08         ` Stefan Monnier
2021-09-08 20:18       ` Stefan Monnier
2021-09-09  7:04         ` tomas
2021-09-09 10:30         ` Mattias Engdegård
2021-09-09 11:36           ` Stefan Kangas
2021-09-09 13:33             ` Mattias Engdegård
2021-09-09 14:32               ` tomas
2021-09-14 10:43               ` Augusto Stoffel
2021-09-14 11:42                 ` Ihor Radchenko
2021-09-14 13:18                   ` Stefan Monnier
2021-09-14 13:22                     ` Stefan Kangas
2021-09-14 14:01                       ` Ihor Radchenko
2021-09-14 14:39                       ` Clément Pit-Claudel
2021-09-14 15:33                         ` Amin Bandali
2021-09-14 16:05                         ` Eli Zaretskii
2021-09-14 17:49                   ` Jose E. Marchesi
2021-09-08 20:40 ` Anna Glasgall
2021-09-08 21:28   ` Alan Mackenzie
2021-10-02 21:03   ` Daniel Brooks
2021-10-04  0:13     ` Richard Stallman
2021-10-04  0:36       ` Daniel Brooks
2021-10-04 12:00         ` Eli Zaretskii
2021-10-04 15:36           ` character sets as they relate to “Raw” " Daniel Brooks
2021-10-04 16:34             ` Stefan Monnier
2021-10-04 20:49               ` Daniel Brooks
2021-10-04 21:19                 ` Alan Mackenzie
2021-10-04 22:19                   ` Daniel Brooks
2021-10-05 11:20                     ` Alan Mackenzie
2021-10-05 17:08                       ` Daniel Brooks
2021-10-06 20:54                         ` Richard Stallman
2021-10-07  7:01                           ` Eli Zaretskii
2021-10-05  8:55                 ` Yuri Khan
2021-10-05 16:25                   ` Juri Linkov
2021-10-05 17:15                     ` Eli Zaretskii
2021-10-05 18:40                       ` [External] : " Drew Adams
2021-10-06 20:54                       ` Richard Stallman
2021-10-07  6:54                         ` Eli Zaretskii
2021-10-07 13:14                           ` Stefan Kangas
2021-10-07 13:34                             ` Eli Zaretskii
2021-10-07 14:48                               ` Stefan Kangas
2021-10-07 16:00                                 ` Eli Zaretskii
2021-10-08  0:37                                   ` Stefan Kangas
2021-10-08  6:53                                     ` Eli Zaretskii
2021-10-08 15:09                                       ` Display of em dashes in our documentation Stefan Kangas
2021-10-08 16:12                                         ` Eli Zaretskii
2021-10-08 17:17                                           ` Stefan Kangas
2021-10-10  8:00                                             ` Juri Linkov
2021-10-08 17:27                                           ` Daniel Brooks
2021-10-08 18:26                                           ` [External] : " Drew Adams
2021-10-08 17:17                                       ` character sets as they relate to “Raw” string literals for elisp Alan Mackenzie
2021-10-08 17:42                                         ` Eli Zaretskii
2021-10-08 18:47                                           ` Eli Zaretskii
2021-10-08 20:01                                             ` Alan Mackenzie
2021-10-09  6:18                                               ` Eli Zaretskii
2021-10-09 10:57                                                 ` Alan Mackenzie
2021-10-09 11:49                                                   ` Eli Zaretskii
2021-10-09 13:08                                                     ` Alan Mackenzie
2021-10-09 13:15                                                       ` Eli Zaretskii
2021-10-09 15:07                                                         ` Alan Mackenzie
2021-10-11  0:45                                                           ` linux console limitations Daniel Brooks
2021-10-12 10:18                                                             ` Alan Mackenzie
2021-10-14  4:05                                                               ` Daniel Brooks
2021-10-10  8:03                                                   ` character sets as they relate to “Raw” string literals for elisp Juri Linkov
2021-10-05 18:23                     ` [External] : " Drew Adams
2021-10-05 19:13                       ` Stefan Kangas
2021-10-05 19:20                         ` Drew Adams
2021-10-05 17:13                   ` Daniel Brooks
2021-10-05 12:04                 ` Eli Zaretskii
2021-10-05 21:20                 ` Richard Stallman
2021-10-05 22:13                   ` Daniel Brooks
2021-10-06 12:13                     ` Eli Zaretskii
2021-10-06 18:57                       ` Daniel Brooks
2021-10-07  4:23                         ` Eli Zaretskii
2021-10-07 22:27                         ` Richard Stallman
2021-10-08 10:37                         ` Po Lu
2021-10-08 10:53                           ` Basil L. Contovounesios
2021-10-08 11:27                             ` tomas
2021-10-05 22:25                   ` character sets as they relate to “Raw†" Stefan Kangas
2021-10-06  6:21                     ` Daniel Brooks
2021-10-07 22:20                       ` Richard Stallman
2021-10-06 12:29                     ` Eli Zaretskii
2021-10-06 12:52                       ` Stefan Kangas
2021-10-06 13:10                         ` Jean-Christophe Helary
2021-10-06 11:53                   ` character sets as they relate to “Raw” " Eli Zaretskii
2021-10-04 18:57             ` Eli Zaretskii
2021-10-04 19:14               ` Yuri Khan
2021-10-05 21:20                 ` Richard Stallman
2021-10-06  3:48                   ` character sets as they relate to “Raw†" Matthew Carter
2021-10-04 22:29         ` "Raw" " Richard Stallman
2021-10-05  5:39           ` Daniel Brooks
2021-10-05  5:43             ` Jean-Christophe Helary
2021-10-05  8:24               ` Richard Stallman
2021-10-05 12:23               ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTie26fKdA+2rWP7@ACM \
    --to=acm@muc.de \
    --cc=anna@crossproduct.net \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).