unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Anna Glasgall <anna@crossproduct.net>
To: Alan Mackenzie <acm@muc.de>
Cc: emacs-devel@gnu.org
Subject: Re: "Raw" string literals for elisp
Date: Wed, 08 Sep 2021 10:27:17 -0400	[thread overview]
Message-ID: <8ac544527b7f8767cf562fba86fbf19d3414d720.camel@crossproduct.net> (raw)
In-Reply-To: <YTie26fKdA+2rWP7@ACM>

On Wed, 2021-09-08 at 11:30 +0000, Alan Mackenzie wrote:
> Hello, Anna.
> 
> Just as a matter of context, I implemented C++ raw strings, and
> recently
> enhanced the code also to handle other CC Mode derived languages such
> as
> C# and Vala.
> 

Great, I'll definitely take a look at that.

> On Tue, Sep 07, 2021 at 21:49:33 -0400, Anna Glasgall wrote:
> > [My previous message appears to have been eaten, or at least it's
> > not
> > showing up in the archive; resending from a different From:
> > address.
> > Apologies for any duplication]
> 
> > Hello Emacs developers,
> 
> > I've long been annoyed by the number of backslashes needed when
> > using
> > string literals in elisp for certain things (regexes, UNC paths,
> > etc),
> > so I started work on a patch (WIP attached) to implement support
> > for
> > "raw" string literals, a la Python r-strings. These are string
> > literals
> > that work exactly like normal string literals, with the exception
> > that
> > backslash escapes (except for \") are not processed; \ may freely
> > appear in the string without need to escape. I've made good
> > progress,
> > but unfortunately I've run into a roadblock and am not sure what to
> > do
> > next.
> 
> One not so small point.  How do you put a backslash as the _last_
> character in a raw string?

That is an excellent question. I'll need to take a look at how some
other languages handle that :/

Thanks for giving me another test case!

> 
> If this is difficult, it may well be worth comparing other languages
> with raw strings.  C++ Mode has a complicated system of identifiers
> at
> each end of the raw string (I'm sure you know this).  C# represents a
> "
> inside a multi-line string as "".  Vala (and, I believe, Python) have
> triple quote delimters """ and cannot represent three quotes in a row
> inside the multi-line string.
> 
> It is probably worth while stating explicitly that Elisp raw strings
> can
> be continued across line breaks without having to escape the \n.
> 
> > I've successfully taught the elisp reader (read1 in lread.c) how to
> > read r-strings. I thought I had managed to make lisp-mode/elisp-
> > mode
> > happy by allowing "r" to be a prefix character (C-x C-e and the
> > underlying forward-sexp/backward-sexp seemed to work fine at
> > first),
> > but realized that I ran into trouble with strings containing the
> > sequence of characters '\\"'.
> 
> > The reader correctly reads r"a\\"" as a string containing the
> > sequence
> > of characters 'a', '\', '"', and M-: works. Unfortunately, if I try
> > sexp-based navigation or e.g. C-x C-e, it falls apart. The parser
> > in
> > syntax.c, which afaict is what lisp-mode is using to try and find
> > sexps
> > in buffer text, doesn't seem to know what to do with this
> > expression.
> > I've spent some time staring at syntax.c, but I must confess that
> > I'm
> > entirely defeated in terms of what changes need to be made here to
> > teach this other parser about prefixed strings in where the prefix
> > has
> > meaning that affects the interpretation of the characters between
> > string fences.
> 
> You probably want to use syntax-table text properties.  See the page
> "Syntax Properties" in the Elisp manual.  In short, you would put,
> say,
> a "punctuation" property on most backslashes to nullify their normal
> action.  Possibly, you might want such a property on a double quote
> inside the string.  You might also want a property on the linefeeds
> inside a raw string.  With these properties, C-M-n and friends will
> work
> properly.
> 
> Bear in mind that you will also need to apply and remove these
> properties as the user changes the Lisp text, for example by removing
> a
> \ before a ".  There is an established mechanism in Emacs for this
> sort
> of action (which CC Mode doesn't use) which I would advise you to
> use.
> 

It was unclear to me how much additional processing during typing would
be acceptable here as opposed to just running the existing C code.
Hopefully native compilation support will to some extent nullify any
penalty from adding additional logic in Lisp here?

> > I've attached a copy of my WIP patch; it's definitely not near
> > final
> > code quality and doesn't have documentation yet, all of which I
> > would
> > take care of before submitting for inclusion. I also haven't filled
> > out
> > the copyright assignment paperwork yet, but should this work reach
> > a
> > point where it was likely to be accepted, I'd be happy to do that.
> 
> Thanks!
> 
> > I'd very much appreciate some pointers on what to try next here, or
> > some explanation of how syntax.c/syntax.el works beyond what's in
> > the
> > reference manual. If this is a fool's errand I'm tilting at here,
> > I'd
> > also appreciate being told that before I sink more time into it :)
> 
> It is definitely NOT a fool's errand.  There may be some resistance
> to
> the idea of raw strings from traditionalists, but I hope not.  It
> would
> be worth your while really to understand the section in the Elisp
> manual
> on syntax and all the things it can (and can't) do.
> 
> Help is always available on emacs-devel.
> 
> You're going to have quite a bit of Lisp programming to do.  For
> example, font-lock needs to be taught how to fontify a raw string.
> 

I am already moderately familiar with writing elisp at this point, but
yes, I still have a lot to learn :)

> But at the end of the exercise, you will have learnt so much about
> Emacs
> that you will qualify as a fully fledged contributor.  :-)
> 

thanks,

Anna


> > thanks,
> 
> > Anna Glasgall
> 





  reply	other threads:[~2021-09-08 14:27 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08  1:49 "Raw" string literals for elisp Anna Glasgall
2021-09-08  7:10 ` Po Lu
2021-09-08 14:19   ` Anna Glasgall
2021-09-08  7:12 ` Lars Ingebrigtsen
2021-09-08 14:20   ` Anna Glasgall
2021-09-08 11:30 ` Alan Mackenzie
2021-09-08 14:27   ` Anna Glasgall [this message]
2021-09-08 11:34 ` Adam Porter
2021-09-08 13:59   ` Clément Pit-Claudel
2021-09-08 14:12     ` Adam Porter
2021-09-09  3:09   ` Richard Stallman
2021-09-08 13:10 ` Stefan Monnier
2021-09-08 14:31   ` Anna Glasgall
2021-09-08 15:27     ` Mattias Engdegård
2021-09-08 15:41       ` Stefan Kangas
2021-09-08 16:45         ` Mattias Engdegård
2021-09-08 16:01       ` Alan Mackenzie
2021-09-08 18:24         ` Mattias Engdegård
2021-09-08 19:00           ` Alan Mackenzie
2021-09-08 19:22         ` Philip Kaludercic
2021-09-08 19:36           ` Alan Mackenzie
2021-09-08 21:11           ` Stefan Kangas
2021-09-08 21:24             ` Philip Kaludercic
2021-09-09  6:52             ` tomas
2021-09-08 15:54     ` Stefan Kangas
2021-09-08 16:05     ` tomas
2021-09-08 16:42       ` Lars Ingebrigtsen
2021-09-08 20:08         ` Stefan Monnier
2021-09-08 20:18       ` Stefan Monnier
2021-09-09  7:04         ` tomas
2021-09-09 10:30         ` Mattias Engdegård
2021-09-09 11:36           ` Stefan Kangas
2021-09-09 13:33             ` Mattias Engdegård
2021-09-09 14:32               ` tomas
2021-09-14 10:43               ` Augusto Stoffel
2021-09-14 11:42                 ` Ihor Radchenko
2021-09-14 13:18                   ` Stefan Monnier
2021-09-14 13:22                     ` Stefan Kangas
2021-09-14 14:01                       ` Ihor Radchenko
2021-09-14 14:39                       ` Clément Pit-Claudel
2021-09-14 15:33                         ` Amin Bandali
2021-09-14 16:05                         ` Eli Zaretskii
2021-09-14 17:49                   ` Jose E. Marchesi
2021-09-08 20:40 ` Anna Glasgall
2021-09-08 21:28   ` Alan Mackenzie
2021-10-02 21:03   ` Daniel Brooks
2021-10-04  0:13     ` Richard Stallman
2021-10-04  0:36       ` Daniel Brooks
2021-10-04 12:00         ` Eli Zaretskii
2021-10-04 15:36           ` character sets as they relate to “Raw” " Daniel Brooks
2021-10-04 16:34             ` Stefan Monnier
2021-10-04 20:49               ` Daniel Brooks
2021-10-04 21:19                 ` Alan Mackenzie
2021-10-04 22:19                   ` Daniel Brooks
2021-10-05 11:20                     ` Alan Mackenzie
2021-10-05 17:08                       ` Daniel Brooks
2021-10-06 20:54                         ` Richard Stallman
2021-10-07  7:01                           ` Eli Zaretskii
2021-10-05  8:55                 ` Yuri Khan
2021-10-05 16:25                   ` Juri Linkov
2021-10-05 17:15                     ` Eli Zaretskii
2021-10-05 18:40                       ` [External] : " Drew Adams
2021-10-06 20:54                       ` Richard Stallman
2021-10-07  6:54                         ` Eli Zaretskii
2021-10-07 13:14                           ` Stefan Kangas
2021-10-07 13:34                             ` Eli Zaretskii
2021-10-07 14:48                               ` Stefan Kangas
2021-10-07 16:00                                 ` Eli Zaretskii
2021-10-08  0:37                                   ` Stefan Kangas
2021-10-08  6:53                                     ` Eli Zaretskii
2021-10-08 15:09                                       ` Display of em dashes in our documentation Stefan Kangas
2021-10-08 16:12                                         ` Eli Zaretskii
2021-10-08 17:17                                           ` Stefan Kangas
2021-10-10  8:00                                             ` Juri Linkov
2021-10-08 17:27                                           ` Daniel Brooks
2021-10-08 18:26                                           ` [External] : " Drew Adams
2021-10-08 17:17                                       ` character sets as they relate to “Raw” string literals for elisp Alan Mackenzie
2021-10-08 17:42                                         ` Eli Zaretskii
2021-10-08 18:47                                           ` Eli Zaretskii
2021-10-08 20:01                                             ` Alan Mackenzie
2021-10-09  6:18                                               ` Eli Zaretskii
2021-10-09 10:57                                                 ` Alan Mackenzie
2021-10-09 11:49                                                   ` Eli Zaretskii
2021-10-09 13:08                                                     ` Alan Mackenzie
2021-10-09 13:15                                                       ` Eli Zaretskii
2021-10-09 15:07                                                         ` Alan Mackenzie
2021-10-11  0:45                                                           ` linux console limitations Daniel Brooks
2021-10-12 10:18                                                             ` Alan Mackenzie
2021-10-14  4:05                                                               ` Daniel Brooks
2021-10-10  8:03                                                   ` character sets as they relate to “Raw” string literals for elisp Juri Linkov
2021-10-05 18:23                     ` [External] : " Drew Adams
2021-10-05 19:13                       ` Stefan Kangas
2021-10-05 19:20                         ` Drew Adams
2021-10-05 17:13                   ` Daniel Brooks
2021-10-05 12:04                 ` Eli Zaretskii
2021-10-05 21:20                 ` Richard Stallman
2021-10-05 22:13                   ` Daniel Brooks
2021-10-06 12:13                     ` Eli Zaretskii
2021-10-06 18:57                       ` Daniel Brooks
2021-10-07  4:23                         ` Eli Zaretskii
2021-10-07 22:27                         ` Richard Stallman
2021-10-08 10:37                         ` Po Lu
2021-10-08 10:53                           ` Basil L. Contovounesios
2021-10-08 11:27                             ` tomas
2021-10-05 22:25                   ` character sets as they relate to “Raw†" Stefan Kangas
2021-10-06  6:21                     ` Daniel Brooks
2021-10-07 22:20                       ` Richard Stallman
2021-10-06 12:29                     ` Eli Zaretskii
2021-10-06 12:52                       ` Stefan Kangas
2021-10-06 13:10                         ` Jean-Christophe Helary
2021-10-06 11:53                   ` character sets as they relate to “Raw” " Eli Zaretskii
2021-10-04 18:57             ` Eli Zaretskii
2021-10-04 19:14               ` Yuri Khan
2021-10-05 21:20                 ` Richard Stallman
2021-10-06  3:48                   ` character sets as they relate to “Raw†" Matthew Carter
2021-10-04 22:29         ` "Raw" " Richard Stallman
2021-10-05  5:39           ` Daniel Brooks
2021-10-05  5:43             ` Jean-Christophe Helary
2021-10-05  8:24               ` Richard Stallman
2021-10-05 12:23               ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ac544527b7f8767cf562fba86fbf19d3414d720.camel@crossproduct.net \
    --to=anna@crossproduct.net \
    --cc=acm@muc.de \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).