unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: Raw strings (experimental patches inside)
@ 2012-08-10 22:33 Dmitry Gutov
  2012-08-11  7:49 ` Stephen J. Turnbull
  2012-08-11 13:30 ` Stefan Monnier
  0 siblings, 2 replies; 34+ messages in thread
From: Dmitry Gutov @ 2012-08-10 22:33 UTC (permalink / raw)
  To: monnier; +Cc: stephen, vrrm00, emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
 >> If insults won't persuade you, here are some real benefits.
 >> raw-strings are a huge convenience when writing.[1]  They correspond
 >> to the way you enter a regexp to isearch, and to the documentation.
 >
 > And here we're back at regexps.  I already agreed that they're
 > convenient for regexps, but pointed out that a better solution would
 > be to fix the regexp syntax so it doesn't backslash-escape every
 > special character.

This would be a backward-incompatible change to the regexp engine,
wouldn't it?

--Dmitry



^ permalink raw reply	[flat|nested] 34+ messages in thread
* Raw strings (experimental patches inside)
@ 2012-08-03  2:02 Aurélien Aptel
  2012-08-03  9:45 ` Pascal J. Bourguignon
  2012-08-03 22:43 ` Stefan Monnier
  0 siblings, 2 replies; 34+ messages in thread
From: Aurélien Aptel @ 2012-08-03  2:02 UTC (permalink / raw)
  To: Emacs development discussions

[-- Attachment #1: Type: text/plain, Size: 864 bytes --]

Hi all,

When I type a regex I'm always annoyed by the amount of escaping I have to do.
I've always wished Emacs Lisp had raw strings ie. a syntax to type
litteral text without interpretation.

I've made 2 patch for the reader (src/lread.c). There are proofs of
concepts, they should work on correct input but don't expect much.

raw-string-python.diff use a syntax similar to python:

$ ./emacs -Q -batch --eval '(message #r"""ha"\nha""")'
ha"\nha

raw-string-sed.diff use a syntax similar to sed or perl quotes. You
can chose any delimiter.

$ ./emacs -Q -batch --eval '(message #r,ha"\nha,)'
ha"\nha
$ ./emacs -Q -batch --eval '(message #r~ha"\nha~)'
ha"\nha

You get the idea.

Although the reader works, this breaks several things. C-x C-e doesn't
work well, sexp navigation is broken, etc. There is work to do to make
the rest of emacs aware of raw strings.

[-- Attachment #2: raw-string-python.diff --]
[-- Type: application/octet-stream, Size: 2819 bytes --]

=== modified file 'src/lread.c'
--- src/lread.c	2012-08-01 20:51:44 +0000
+++ src/lread.c	2012-08-03 01:45:01 +0000
@@ -2384,6 +2384,84 @@
 
     case '#':
       c = READCHAR;
+
+      /* raw string with """ delimiters
+         #r"""foo bar""" */
+      if (c == 'r')
+        {
+          int old;
+          int ndelim = 0;
+          char *p = read_buffer;
+          char *end = read_buffer + read_buffer_size;
+          register int ch;
+          /* Nonzero if we saw an escape sequence specifying
+             a multibyte character.  */
+          int force_multibyte = 0;
+          /* Nonzero if we saw an escape sequence specifying
+             a single-byte character.  */
+          int force_singlebyte = 0;
+          int cancel = 0;
+          ptrdiff_t nchars = 0;
+
+          /* read 3 first delim */
+          if (READCHAR != '"' || READCHAR != '"' || READCHAR != '"')
+            invalid_syntax ("#r\"\"\"...\"\"\"");
+
+          while (1)
+            {
+              ch = READCHAR;
+              if (ch < 0)
+                break;
+
+              if (ch == '"')
+                {
+                  ndelim++;
+                  if (ndelim == 3)
+                    break;
+                }
+              else
+                {
+                  ndelim = 0;
+                }
+
+              if (end - p < MAX_MULTIBYTE_LENGTH)
+                {
+                  ptrdiff_t offset = p - read_buffer;
+                  if (min (PTRDIFF_MAX, SIZE_MAX) / 2 < read_buffer_size)
+                    memory_full (SIZE_MAX);
+                  read_buffer = xrealloc (read_buffer, read_buffer_size * 2);
+                  read_buffer_size *= 2;
+                  p = read_buffer + offset;
+                  end = read_buffer + read_buffer_size;
+                }
+
+              p += CHAR_STRING (ch, (unsigned char *) p);
+              if (CHAR_BYTE8_P (ch))
+                force_singlebyte = 1;
+              else if (! ASCII_CHAR_P (ch))
+                force_multibyte = 1;
+
+              nchars++;
+            }
+
+          /* last " was not added, only remove 2 */
+          p -= 2;
+          nchars -= 2;
+
+          if (! force_multibyte && force_singlebyte)
+            {
+              /* READ_BUFFER contains raw 8-bit bytes and no multibyte
+                 forms.  Convert it to unibyte.  */
+              nchars = str_as_unibyte ((unsigned char *) read_buffer,
+                                       p - read_buffer);
+              p = read_buffer + nchars;
+            }
+
+          return make_specified_string (read_buffer, nchars, p - read_buffer,
+                                        (force_multibyte
+                                         || (p - read_buffer != nchars)));
+        }
+
       if (c == 's')
 	{
 	  c = READCHAR;


[-- Attachment #3: raw-string-sed.diff --]
[-- Type: application/octet-stream, Size: 2306 bytes --]

=== modified file 'src/lread.c'
--- src/lread.c	2012-08-01 20:51:44 +0000
+++ src/lread.c	2012-08-03 00:27:01 +0000
@@ -2384,6 +2384,60 @@
 
     case '#':
       c = READCHAR;
+
+      /* raw string with custom delimiter 
+         #r(foo) #r,foo, etc */
+      if (c == 'r')
+        {
+          int delimiter = READCHAR;
+          char *p = read_buffer;
+          char *end = read_buffer + read_buffer_size;
+          register int ch;
+          /* Nonzero if we saw an escape sequence specifying
+             a multibyte character.  */
+          int force_multibyte = 0;
+          /* Nonzero if we saw an escape sequence specifying
+             a single-byte character.  */
+          int force_singlebyte = 0;
+          int cancel = 0;
+          ptrdiff_t nchars = 0;
+
+          while ((ch = READCHAR) >= 0
+                 && ch != delimiter)
+            {
+              if (end - p < MAX_MULTIBYTE_LENGTH)
+                {
+                  ptrdiff_t offset = p - read_buffer;
+                  if (min (PTRDIFF_MAX, SIZE_MAX) / 2 < read_buffer_size)
+                    memory_full (SIZE_MAX);
+                  read_buffer = xrealloc (read_buffer, read_buffer_size * 2);
+                  read_buffer_size *= 2;
+                  p = read_buffer + offset;
+                  end = read_buffer + read_buffer_size;
+                }
+              
+              p += CHAR_STRING (ch, (unsigned char *) p);
+              if (CHAR_BYTE8_P (ch))
+                force_singlebyte = 1;
+              else if (! ASCII_CHAR_P (ch))
+                force_multibyte = 1;
+              
+              nchars++;
+            }
+          if (! force_multibyte && force_singlebyte)
+            {
+              /* READ_BUFFER contains raw 8-bit bytes and no multibyte
+                 forms.  Convert it to unibyte.  */
+              nchars = str_as_unibyte ((unsigned char *) read_buffer,
+                                       p - read_buffer);
+              p = read_buffer + nchars;
+            }
+
+          return make_specified_string (read_buffer, nchars, p - read_buffer,
+                                        (force_multibyte
+                                         || (p - read_buffer != nchars)));
+        }
+
       if (c == 's')
 	{
 	  c = READCHAR;


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2012-08-12  0:29 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-10 22:33 Raw strings (experimental patches inside) Dmitry Gutov
2012-08-11  7:49 ` Stephen J. Turnbull
2012-08-11 17:05   ` Dmitry Gutov
2012-08-11 17:57     ` Andreas Schwab
2012-08-11 18:22       ` Dmitry Gutov
2012-08-12  0:23     ` Stephen J. Turnbull
2012-08-11 13:30 ` Stefan Monnier
  -- strict thread matches above, loose matches on Subject: below --
2012-08-03  2:02 Aurélien Aptel
2012-08-03  9:45 ` Pascal J. Bourguignon
2012-08-03 17:45   ` Aurélien Aptel
2012-08-04 19:41     ` Pascal J. Bourguignon
2012-08-05  0:16       ` Aurélien Aptel
2012-08-05 11:36         ` Pascal J. Bourguignon
2012-08-05  7:13       ` Lars Brinkhoff
2012-08-06  1:55       ` Stefan Monnier
2012-08-06 10:55         ` Pascal J. Bourguignon
2012-08-06 16:16           ` Stefan Monnier
2012-08-06 16:40             ` Pascal J. Bourguignon
2012-08-03 22:43 ` Stefan Monnier
2012-08-04 14:38   ` Ivan Andrus
2012-08-04 23:47     ` Stefan Monnier
2012-08-05  0:13       ` Aurélien Aptel
2012-08-06 16:17         ` Stefan Monnier
2012-08-10  1:33       ` Vr Rm
2012-08-10  5:08         ` Stephen J. Turnbull
2012-08-10 15:13           ` Stefan Monnier
2012-08-10 17:28             ` Stephen J. Turnbull
2012-08-10 18:50               ` Stefan Monnier
2012-08-11  7:27                 ` Stephen J. Turnbull
2012-08-11 11:05                 ` Dmitri Paduchikh
2012-08-12  0:29                   ` Stephen J. Turnbull
2012-08-10 21:11           ` Vr Rm
2012-08-10 23:03             ` Davis Herring
2012-08-11  7:39             ` Stephen J. Turnbull

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).