From: "Aurélien Aptel" <aurelien.aptel+emacs@gmail.com>
To: Emacs development discussions <emacs-devel@gnu.org>
Subject: Raw strings (experimental patches inside)
Date: Fri, 3 Aug 2012 04:02:58 +0200 [thread overview]
Message-ID: <CA+5B0FN_EFq6CO0RnGDL9m0Bjb7G=eznaurF2ZTicesT0hUC4w@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 864 bytes --]
Hi all,
When I type a regex I'm always annoyed by the amount of escaping I have to do.
I've always wished Emacs Lisp had raw strings ie. a syntax to type
litteral text without interpretation.
I've made 2 patch for the reader (src/lread.c). There are proofs of
concepts, they should work on correct input but don't expect much.
raw-string-python.diff use a syntax similar to python:
$ ./emacs -Q -batch --eval '(message #r"""ha"\nha""")'
ha"\nha
raw-string-sed.diff use a syntax similar to sed or perl quotes. You
can chose any delimiter.
$ ./emacs -Q -batch --eval '(message #r,ha"\nha,)'
ha"\nha
$ ./emacs -Q -batch --eval '(message #r~ha"\nha~)'
ha"\nha
You get the idea.
Although the reader works, this breaks several things. C-x C-e doesn't
work well, sexp navigation is broken, etc. There is work to do to make
the rest of emacs aware of raw strings.
[-- Attachment #2: raw-string-python.diff --]
[-- Type: application/octet-stream, Size: 2819 bytes --]
=== modified file 'src/lread.c'
--- src/lread.c 2012-08-01 20:51:44 +0000
+++ src/lread.c 2012-08-03 01:45:01 +0000
@@ -2384,6 +2384,84 @@
case '#':
c = READCHAR;
+
+ /* raw string with """ delimiters
+ #r"""foo bar""" */
+ if (c == 'r')
+ {
+ int old;
+ int ndelim = 0;
+ char *p = read_buffer;
+ char *end = read_buffer + read_buffer_size;
+ register int ch;
+ /* Nonzero if we saw an escape sequence specifying
+ a multibyte character. */
+ int force_multibyte = 0;
+ /* Nonzero if we saw an escape sequence specifying
+ a single-byte character. */
+ int force_singlebyte = 0;
+ int cancel = 0;
+ ptrdiff_t nchars = 0;
+
+ /* read 3 first delim */
+ if (READCHAR != '"' || READCHAR != '"' || READCHAR != '"')
+ invalid_syntax ("#r\"\"\"...\"\"\"");
+
+ while (1)
+ {
+ ch = READCHAR;
+ if (ch < 0)
+ break;
+
+ if (ch == '"')
+ {
+ ndelim++;
+ if (ndelim == 3)
+ break;
+ }
+ else
+ {
+ ndelim = 0;
+ }
+
+ if (end - p < MAX_MULTIBYTE_LENGTH)
+ {
+ ptrdiff_t offset = p - read_buffer;
+ if (min (PTRDIFF_MAX, SIZE_MAX) / 2 < read_buffer_size)
+ memory_full (SIZE_MAX);
+ read_buffer = xrealloc (read_buffer, read_buffer_size * 2);
+ read_buffer_size *= 2;
+ p = read_buffer + offset;
+ end = read_buffer + read_buffer_size;
+ }
+
+ p += CHAR_STRING (ch, (unsigned char *) p);
+ if (CHAR_BYTE8_P (ch))
+ force_singlebyte = 1;
+ else if (! ASCII_CHAR_P (ch))
+ force_multibyte = 1;
+
+ nchars++;
+ }
+
+ /* last " was not added, only remove 2 */
+ p -= 2;
+ nchars -= 2;
+
+ if (! force_multibyte && force_singlebyte)
+ {
+ /* READ_BUFFER contains raw 8-bit bytes and no multibyte
+ forms. Convert it to unibyte. */
+ nchars = str_as_unibyte ((unsigned char *) read_buffer,
+ p - read_buffer);
+ p = read_buffer + nchars;
+ }
+
+ return make_specified_string (read_buffer, nchars, p - read_buffer,
+ (force_multibyte
+ || (p - read_buffer != nchars)));
+ }
+
if (c == 's')
{
c = READCHAR;
[-- Attachment #3: raw-string-sed.diff --]
[-- Type: application/octet-stream, Size: 2306 bytes --]
=== modified file 'src/lread.c'
--- src/lread.c 2012-08-01 20:51:44 +0000
+++ src/lread.c 2012-08-03 00:27:01 +0000
@@ -2384,6 +2384,60 @@
case '#':
c = READCHAR;
+
+ /* raw string with custom delimiter
+ #r(foo) #r,foo, etc */
+ if (c == 'r')
+ {
+ int delimiter = READCHAR;
+ char *p = read_buffer;
+ char *end = read_buffer + read_buffer_size;
+ register int ch;
+ /* Nonzero if we saw an escape sequence specifying
+ a multibyte character. */
+ int force_multibyte = 0;
+ /* Nonzero if we saw an escape sequence specifying
+ a single-byte character. */
+ int force_singlebyte = 0;
+ int cancel = 0;
+ ptrdiff_t nchars = 0;
+
+ while ((ch = READCHAR) >= 0
+ && ch != delimiter)
+ {
+ if (end - p < MAX_MULTIBYTE_LENGTH)
+ {
+ ptrdiff_t offset = p - read_buffer;
+ if (min (PTRDIFF_MAX, SIZE_MAX) / 2 < read_buffer_size)
+ memory_full (SIZE_MAX);
+ read_buffer = xrealloc (read_buffer, read_buffer_size * 2);
+ read_buffer_size *= 2;
+ p = read_buffer + offset;
+ end = read_buffer + read_buffer_size;
+ }
+
+ p += CHAR_STRING (ch, (unsigned char *) p);
+ if (CHAR_BYTE8_P (ch))
+ force_singlebyte = 1;
+ else if (! ASCII_CHAR_P (ch))
+ force_multibyte = 1;
+
+ nchars++;
+ }
+ if (! force_multibyte && force_singlebyte)
+ {
+ /* READ_BUFFER contains raw 8-bit bytes and no multibyte
+ forms. Convert it to unibyte. */
+ nchars = str_as_unibyte ((unsigned char *) read_buffer,
+ p - read_buffer);
+ p = read_buffer + nchars;
+ }
+
+ return make_specified_string (read_buffer, nchars, p - read_buffer,
+ (force_multibyte
+ || (p - read_buffer != nchars)));
+ }
+
if (c == 's')
{
c = READCHAR;
next reply other threads:[~2012-08-03 2:02 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-03 2:02 Aurélien Aptel [this message]
2012-08-03 9:45 ` Raw strings (experimental patches inside) Pascal J. Bourguignon
2012-08-03 17:45 ` Aurélien Aptel
2012-08-04 19:41 ` Pascal J. Bourguignon
2012-08-05 0:16 ` Aurélien Aptel
2012-08-05 11:36 ` Pascal J. Bourguignon
2012-08-05 7:13 ` Lars Brinkhoff
2012-08-06 1:55 ` Stefan Monnier
2012-08-06 10:55 ` Pascal J. Bourguignon
2012-08-06 16:16 ` Stefan Monnier
2012-08-06 16:40 ` Pascal J. Bourguignon
2012-08-03 22:43 ` Stefan Monnier
2012-08-04 14:38 ` Ivan Andrus
2012-08-04 23:47 ` Stefan Monnier
2012-08-05 0:13 ` Aurélien Aptel
2012-08-06 16:17 ` Stefan Monnier
2012-08-10 1:33 ` Vr Rm
2012-08-10 5:08 ` Stephen J. Turnbull
2012-08-10 6:07 ` [OT] " Jambunathan K
2012-08-10 15:13 ` Stefan Monnier
2012-08-10 17:28 ` Stephen J. Turnbull
2012-08-10 18:50 ` Stefan Monnier
2012-08-11 7:27 ` Stephen J. Turnbull
2012-08-11 11:05 ` Dmitri Paduchikh
2012-08-12 0:29 ` Stephen J. Turnbull
2012-08-10 21:11 ` Vr Rm
2012-08-10 23:03 ` Davis Herring
2012-08-10 23:24 ` Learning Emacs regexp (was: Re: Raw strings (experimental patches inside)) chad
2012-08-11 7:39 ` Raw strings (experimental patches inside) Stephen J. Turnbull
-- strict thread matches above, loose matches on Subject: below --
2012-08-10 22:33 Dmitry Gutov
2012-08-11 7:49 ` Stephen J. Turnbull
2012-08-11 17:05 ` Dmitry Gutov
2012-08-11 17:57 ` Andreas Schwab
2012-08-11 18:22 ` Dmitry Gutov
2012-08-12 0:23 ` Stephen J. Turnbull
2012-08-11 13:30 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+5B0FN_EFq6CO0RnGDL9m0Bjb7G=eznaurF2ZTicesT0hUC4w@mail.gmail.com' \
--to=aurelien.aptel+emacs@gmail.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.