unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Raw string literals in Emacs lisp.
@ 2014-07-25 19:47 Matthew Plant
  2014-07-25 19:56 ` Tassilo Horn
                   ` (4 more replies)
  0 siblings, 5 replies; 51+ messages in thread
From: Matthew Plant @ 2014-07-25 19:47 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

I think that raw string literals would be a really nice thing to add to
Emacs
lisp. The most immediate benefit is that writing regexps would be much
easier.
And since most of the work that goes into major modes is writing regexp,
writing
major modes would become a lot faster.

Obviously it can't be done in any way that's really consistent with the
language
(it'd be super nice if ``string'' could be used, but alas). However,
perhaps I
have found a reasonable approach.

What if we assume that any string surrounded immediately by parenthesis is
a raw
string literal? I'm pretty sure every instance of ("...") is currently
illegal,
and it would be almost certainly trivial to extend the Emacs' lexer/parser
to
support it. I can do it myself if everyone thinks this is a good idea.

Please let me know what your thoughts are on this.

-Matt

[-- Attachment #2: Type: text/html, Size: 963 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-25 19:47 Raw string literals in Emacs lisp Matthew Plant
@ 2014-07-25 19:56 ` Tassilo Horn
  2014-07-25 20:06   ` Matthew Plant
  2014-07-25 20:33 ` Tom Tromey
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 51+ messages in thread
From: Tassilo Horn @ 2014-07-25 19:56 UTC (permalink / raw)
  To: Matthew Plant; +Cc: emacs-devel

Matthew Plant <maplant2@illinois.edu> writes:

Hi Matthew,

> I think that raw string literals would be a really nice thing to add
> to Emacs lisp.

Yes, indeed.

> What if we assume that any string surrounded immediately by
> parenthesis is a raw string literal?  I'm pretty sure every instance
> of ("...") is currently illegal,...

Nope, inside a `cond', ("default") is a short alternative for (t
"default").

Bye,
Tassilo



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-25 19:56 ` Tassilo Horn
@ 2014-07-25 20:06   ` Matthew Plant
  2014-07-25 20:15     ` Tassilo Horn
  0 siblings, 1 reply; 51+ messages in thread
From: Matthew Plant @ 2014-07-25 20:06 UTC (permalink / raw)
  To: Matthew Plant, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

I would argue that is still workable, through various hacks. In the cond
case if you wanted to specify I raw string literal you would do
(("default")), which I think is still illegal.


On Fri, Jul 25, 2014 at 12:56 PM, Tassilo Horn <tsdh@gnu.org> wrote:

> Matthew Plant <maplant2@illinois.edu> writes:
>
> Hi Matthew,
>
> > I think that raw string literals would be a really nice thing to add
> > to Emacs lisp.
>
> Yes, indeed.
>
> > What if we assume that any string surrounded immediately by
> > parenthesis is a raw string literal?  I'm pretty sure every instance
> > of ("...") is currently illegal,...
>
> Nope, inside a `cond', ("default") is a short alternative for (t
> "default").
>
> Bye,
> Tassilo
>

[-- Attachment #2: Type: text/html, Size: 1223 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-25 20:06   ` Matthew Plant
@ 2014-07-25 20:15     ` Tassilo Horn
  2014-07-25 20:24       ` Matthew Plant
  0 siblings, 1 reply; 51+ messages in thread
From: Tassilo Horn @ 2014-07-25 20:15 UTC (permalink / raw)
  To: Matthew Plant; +Cc: emacs-devel

Matthew Plant <maplant2@illinois.edu> writes:

> I would argue that is still workable, through various hacks. In the cond
> case if you wanted to specify I raw string literal you would do
> (("default")), which I think is still illegal.

Yes, that's illegal.  But why not #"foo" (like in Clojure regexps)?  Or
SXEmacs version of raw strings #r"foo"?  To me, that reads much better
than ("foo") and is much less ambiguous.

Bye,
Tassilo

>> > What if we assume that any string surrounded immediately by
>> > parenthesis is a raw string literal?  I'm pretty sure every instance
>> > of ("...") is currently illegal,...
>>
>> Nope, inside a `cond', ("default") is a short alternative for (t
>> "default").



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-25 20:15     ` Tassilo Horn
@ 2014-07-25 20:24       ` Matthew Plant
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Plant @ 2014-07-25 20:24 UTC (permalink / raw)
  To: Matthew Plant, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

I was under the impression that any ASCII  character (with a few
exceptions, but not including "#") could be used to define a variable. I
see know that was a mistake. I also support #"foo", although #r"foo" seems
unnecessarily verbose.


On Fri, Jul 25, 2014 at 1:15 PM, Tassilo Horn <tsdh@gnu.org> wrote:

> Matthew Plant <maplant2@illinois.edu> writes:
>
> > I would argue that is still workable, through various hacks. In the cond
> > case if you wanted to specify I raw string literal you would do
> > (("default")), which I think is still illegal.
>
> Yes, that's illegal.  But why not #"foo" (like in Clojure regexps)?  Or
> SXEmacs version of raw strings #r"foo"?  To me, that reads much better
> than ("foo") and is much less ambiguous.
>
> Bye,
> Tassilo
>
> >> > What if we assume that any string surrounded immediately by
> >> > parenthesis is a raw string literal?  I'm pretty sure every instance
> >> > of ("...") is currently illegal,...
> >>
> >> Nope, inside a `cond', ("default") is a short alternative for (t
> >> "default").
>

[-- Attachment #2: Type: text/html, Size: 1692 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-25 19:47 Raw string literals in Emacs lisp Matthew Plant
  2014-07-25 19:56 ` Tassilo Horn
@ 2014-07-25 20:33 ` Tom Tromey
  2014-07-25 21:40   ` Matthew Plant
  2014-07-26  1:19 ` Stephen J. Turnbull
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 51+ messages in thread
From: Tom Tromey @ 2014-07-25 20:33 UTC (permalink / raw)
  To: Matthew Plant; +Cc: emacs-devel

Matthew> What if we assume that any string surrounded immediately by
Matthew> parenthesis is a raw string literal? I'm pretty sure every
Matthew> instance of ("...") is currently illegal, and it would be
Matthew> almost certainly trivial to extend the Emacs' lexer/parser to
Matthew> support it. I can do it myself if everyone thinks this is a
Matthew> good idea.

That kind of thing is valid in quoted contexts though.

(defvar whatever '("hi"))

FWIW there was a previous discussion about raw strings:

http://comments.gmane.org/gmane.emacs.devel/152132

I think this killed the idea:

http://permalink.gmane.org/gmane.emacs.devel/152155

Tom



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-25 20:33 ` Tom Tromey
@ 2014-07-25 21:40   ` Matthew Plant
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Plant @ 2014-07-25 21:40 UTC (permalink / raw)
  To: Tom Tromey; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]

Although I no longer think the idea is useful, quoted contexts would be
very easy to detect and avoid.
I think that, and this is the impression I got from skimming that thread,
is that raw string literals are:
1. it's only really used for regex
2. it would be better to have a function that adds escapes to regexp
strings, rather than make it easier to add them.
3. it would require more work than just updating the reader.

This reasoning is solid but I think that it still falls short of
justification of avoidance.
I can think of a lot of common cases which show number one isn't correct. I
think the best case would be for doc strings. It would be a lot nicer to
write \[func] each instead of \\[func].
Number two I disagree with on the grounds that I just don't think it's the
case. For example, I do not think escaped parens appear significantly more
than non-espaced parens, especially when it comes to writing major modes.
Additionally, because number one isn't really the case, this reasoning
doesn't work out entirely either.

Number three is just hogwash, because the "it would require more work than
that" argument only works if the additional work is to make the current
code handle old cases that have somehow become harder to handle. And this
isn't the case. Sure, there will be come cases where old code cannot handle
raw strings properly. But people can just file a bug report.

-Matt


On Fri, Jul 25, 2014 at 1:33 PM, Tom Tromey <tromey@redhat.com> wrote:

> Matthew> What if we assume that any string surrounded immediately by
> Matthew> parenthesis is a raw string literal? I'm pretty sure every
> Matthew> instance of ("...") is currently illegal, and it would be
> Matthew> almost certainly trivial to extend the Emacs' lexer/parser to
> Matthew> support it. I can do it myself if everyone thinks this is a
> Matthew> good idea.
>
> That kind of thing is valid in quoted contexts though.
>
> (defvar whatever '("hi"))
>
> FWIW there was a previous discussion about raw strings:
>
> http://comments.gmane.org/gmane.emacs.devel/152132
>
> I think this killed the idea:
>
> http://permalink.gmane.org/gmane.emacs.devel/152155
>
> Tom
>

[-- Attachment #2: Type: text/html, Size: 3016 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Raw string literals in Emacs lisp.
  2014-07-25 19:47 Raw string literals in Emacs lisp Matthew Plant
  2014-07-25 19:56 ` Tassilo Horn
  2014-07-25 20:33 ` Tom Tromey
@ 2014-07-26  1:19 ` Stephen J. Turnbull
  2014-07-26  5:28   ` Matthew Plant
  2014-07-26 21:37 ` Thorsten Jolitz
  2014-07-29  6:32 ` William Xu
  4 siblings, 1 reply; 51+ messages in thread
From: Stephen J. Turnbull @ 2014-07-26  1:19 UTC (permalink / raw)
  To: Matthew Plant; +Cc: emacs-devel

Matthew Plant writes:

 > What if we assume that any string surrounded immediately by
 > parenthesis is a raw string literal?

Please don't.  SXEmacs and XEmacs have had rawstring literals for many
years using the syntax #r"...".  It may not be the best way to do
this, but it's (Common) Lisp-y, the prefix notation is familiar from
at least one popular non-Lisp language (Python), and it's about as
short a notation as you can imagine (I don't recall why #"..." was
out, though).

Use of parens for this purpose is likely to have wide-ranging
implications, as it means that they no longer have unambiguous
semantics, but require lookahead to interpret.






^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-26  1:19 ` Stephen J. Turnbull
@ 2014-07-26  5:28   ` Matthew Plant
  2014-07-26  5:45     ` chad
  0 siblings, 1 reply; 51+ messages in thread
From: Matthew Plant @ 2014-07-26  5:28 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

Yes, I agree. I suggested it due to a misinterpretation of lisps rules. I
would still like to implement #"..." but I'm not sure it would be accepted.

On Friday, July 25, 2014, Stephen J. Turnbull <stephen@xemacs.org> wrote:

> Matthew Plant writes:
>
>  > What if we assume that any string surrounded immediately by
>  > parenthesis is a raw string literal?
>
> Please don't.  SXEmacs and XEmacs have had rawstring literals for many
> years using the syntax #r"...".  It may not be the best way to do
> this, but it's (Common) Lisp-y, the prefix notation is familiar from
> at least one popular non-Lisp language (Python), and it's about as
> short a notation as you can imagine (I don't recall why #"..." was
> out, though).
>
> Use of parens for this purpose is likely to have wide-ranging
> implications, as it means that they no longer have unambiguous
> semantics, but require lookahead to interpret.
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 1206 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-26  5:28   ` Matthew Plant
@ 2014-07-26  5:45     ` chad
  2014-07-26 19:39       ` Matthew Plant
  0 siblings, 1 reply; 51+ messages in thread
From: chad @ 2014-07-26  5:45 UTC (permalink / raw)
  To: Matthew Plant; +Cc: emacs-devel@gnu.org

It might be helpful to canvas the use of #r"string" in [S]XEmacs
and see if anything especially nifty shows up. I think Stefan's
reservations mostly come from a feeling that the obvious problem
has a better solution elsewhere, but they have some actual experience
which might shed a different light on the topic.

~Chad



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-26  5:45     ` chad
@ 2014-07-26 19:39       ` Matthew Plant
  2014-07-27 12:27         ` Stephen J. Turnbull
  0 siblings, 1 reply; 51+ messages in thread
From: Matthew Plant @ 2014-07-26 19:39 UTC (permalink / raw)
  To: chad; +Cc: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]

I did a grep on the latest xemacs code base I could find, funnily enough,
almost half of the instances of #r appeared in test cases. Not in
facilitating them I might add, they were the test case. All of the non-test
cases were regexps.

Although this data is convincing in some respects, I would like to note
that xemacs is dead. The download off their main page did not even have any
raw string literals.

I will still content that it is a useful feature to have. The cost of
adding it (very minimal) are is to the benefit of having it. And why not?
Emacs is also a language, unfortunately. We could all switch to guile and
be done with it, but it appears the consensus is that elisp is finely tuned
to do text processing. Elisp is a text processing language, and it should
have as many features to facilitate in the processing of text as possible,
this included.

-Matt

On Friday, July 25, 2014, chad <yandros@gmail.com> wrote:

> It might be helpful to canvas the use of #r"string" in [S]XEmacs
> and see if anything especially nifty shows up. I think Stefan's
> reservations mostly come from a feeling that the obvious problem
> has a better solution elsewhere, but they have some actual experience
> which might shed a different light on the topic.
>
> ~Chad
>

[-- Attachment #2: Type: text/html, Size: 1565 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-25 19:47 Raw string literals in Emacs lisp Matthew Plant
                   ` (2 preceding siblings ...)
  2014-07-26  1:19 ` Stephen J. Turnbull
@ 2014-07-26 21:37 ` Thorsten Jolitz
  2014-07-29  6:32 ` William Xu
  4 siblings, 0 replies; 51+ messages in thread
From: Thorsten Jolitz @ 2014-07-26 21:37 UTC (permalink / raw)
  To: emacs-devel

Matthew Plant <maplant2@illinois.edu> writes:

> I think that raw string literals would be a really nice thing to add
> to Emacs
> lisp. The most immediate benefit is that writing regexps would be much
> easier.
> And since most of the work that goes into major modes is writing
> regexp, writing
> major modes would become a lot faster.

BTW, I recently wrote a little library called

,----
| drx.el --- declarative dynamic regular expressions
`----

available on github (https://github.com/tj64/drx). 

Its main purpose was enabling one more level of abstraction when writing
(org-mode) regexps, i.e. replace the hardcoded 

,----
| "^" (BOL)
| "$" (EOL)
| "\*" (Org STAR)
`----

in regexps strings like

,----
| "^\\* foo$"
`----

with variables

,----
|  (defvar drx-BOL "^")
|  (defvar drx-EOL "$")
|  (defvar drx-STAR (regexp-quote "*"))
`----

and build regexps with functions calls like
 
,----
| (drx " foo" t t t)
`----

The idea was based on an analysis of what would be needed for a true Org
Minor Mode, i.e. the application of Org's core functionality outside of
the Org major-mode. At the lowest level, the core obstacle is in the
hard-coded regexp snippets spread all over the Org sources that don't
match anymore when the org elements are in comment sections of
programming major-modes.

E.g. this would match 'old-school' headers in emacs-lisp-mode:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^;;")
        (drx-STAR ";"))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^;;; foo$"

and this 'outshine' (outcommented org-mode) headers:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^;; "))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^;; \\* foo$"

and this 'outshine' headers in css-mode:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^/\\* ")
        (drx-EOL "\\*/$"))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^/\\* \\* foo\\*/$"

The idea was rejected by the Org maintainers, but the library does
exist now, and the reason I mention it here is that it makes writing
regexps much faster and easier (with a different approach than rx.el,
the regexps itself are still written as strings, only the plumbing is
done declaratively. 

Here are a few more complex examples from the drx.el test section:

#+begin_src emacs-lisp
(format "%S"
  (let ((drx-BOL "^;;")
        (drx-STAR ";"))
    (drx " foo" t '(2 2) nil)))
#+end_src

#+results:
: "^;;\\(;\\{2\\}\\)\\{2\\} foo"


#+begin_src emacs-lisp
  (format "%S" (drx "foo" t t t t))
#+end_src

#+results:
: "^\\*\\(foo\\)$"

#+begin_src emacs-lisp
  (format "%S" (drx "foo" nil nil nil 'alt "bar"))
#+end_src

#+results:
: "\\(foo\\|bar\\)"


#+begin_src emacs-lisp
  (format "%S" (drx "foo" nil nil nil 'shy "bar"))
#+end_src

#+results:
: "\\(?:foo\\)\\(?:bar\\)"


#+begin_src emacs-lisp
 (format "%S" (drx "foo" t 2 t 'app "\\(bar\\)" "loo"))
#+end_src

#+results:
: "^\\*\\{2\\}\\(foo\\)\\(bar\\)\\(loo\\)$"

#+begin_src emacs-lisp
(format "%S" (drx "foo" t '(t t t) t '(t t t) "bar" "loo"))
#+end_src

#+results:
: "^\\(\\(\\*\\)\\(\\*\\)\\)\\(foo\\(bar\\)\\(loo\\)\\)$"

so even without raw strings, this helps to avoid typing all these
parens and backslashes. By nesting 'drx calls one can create really
complex regexps that contain only a few and simple string literals. 

I don't know (but would be curious to know) how writing regexps this
way would affect a library's execution speed, expecially if the 'drx
calls appear in low level functions that are called all the time. 

PS
For the sake of completeness, here the docstring of `drx':

,----[ C-h f drx RET ]
| drx is a Lisp function in `drx.el'.
| 
| (drx RGXP &optional BOLP STARS EOLP ENCLOSING &rest RGXPS)
| 
| Make regexp combining RGXP and optional RGXPS.
| 
| With BOLP non-nil, add 'drx-BOL' at beginning of regexp, with EOLP
| non-nil add 'drx-EOL' at end of regexp.
| 
| STARS, when non-nil, uses 'drx-STAR' and encloses and repeats it.
| 
| ENCLOSING, when non-nil, takes RGXP and optional RGXPS and combines,
| encloses and repeats them.
| 
| While BOLP and EOLP are switches that don't do nothing when nil and
| insert whatever value 'drx-BOL' and 'drx-EOL' are set to when
| non-nil, both arguments STARS and ENCLOSING take either symbols,
| numbers, strings or (nested) lists as values and act conditional on
| the type.
| 
| All the following 'atomic' argument values are valid for both STARS
| and ENCLOSING but with a slightly different meaning:
| 
| STARS: repeat 'drx-STAR' (without enclosing) conditional on argument
| value
| 
| ENCLOSING: repeat enclosed combination of RGXP and RGXPS conditional
| on argument value
| 
|   - nil :: do nothing (no repeater, no enclosing)
| 
|   - t :: (and any other symbol w/o special meaning) repeat once
| 
|   - n :: (number) repeat n times {n}
| 
|   - "n" :: (number-as-string) repeat n times {n}
| 
|   - "n," :: (string) repeat >= n times {n,}
| 
|   - ",m" :: (string) repeat <= m times {,m}
| 
|   - "n,m" :: (string) repeat n to m times {n,m}
|        
|   - "?" :: (string) repeat with ?
| 
|   - "*" :: (string) repeat with *
| 
|   - "+" :: (string) repeat with +
| 
|   - "??" :: (string) repeat with ??
| 
|   - "*?" :: (string) repeat with *?
| 
|   - "+?" :: (string) repeat with +?
| 
|   - "xyz" :: (any other string) repeat once
| 
| Note that, when used with STARS and ENCLOSING, t almost always
| means 'enclose and repeat once', while 1 and "1" stand for
| 'do not enclose, repeat once' - depending on the context.
| 
| These atomic values can be wrapped in a list and change their
| meaning then. In a list of length 1 they specify 'enclose element
| first, apply repeater then'. In a list of lenght > 1 the specifier
| in the car applies to the combination of all elements, while each of
| the specifiers in the cdr applies to one element only. In the case
| of argument STAR, an element is always 'drx-STAR'. In the case of
| argument ENCLOSING, a non-nil optional argument RGXPS represents the
| list of elements, each of them being a regexp string.
| 
| Here are two calls of 'drx' with interchanged list arguments to
| STARS and ENCLOSING and their return values, demonstrating the
| above:
| 
|   ,------------------------------------------------------------
|   | (drx "foo" t '(nil t (2)) t '(t nil (2))
|   |      "bar" "loo")
|   | "^\(\*\)\(\*\)
| Uses keymap `2\', which is not currently defined.
| \(foobar\(loo\)
| Uses keymap `2\', which is not currently defined.
| \)$"
|   `------------------------------------------------------------
| 
|   ,------------------------------------------------------------
|   | (drx "foo" t '(t nil (2)) t '(nil t (2))
|   |       "bar" "loo")
|   | "^\(\*\(\*\)
| Uses keymap `2\', which is not currently defined.
| \)foo\(bar\)\(loo\)
| Uses keymap `2\', which is not currently defined.
| $"
|   `------------------------------------------------------------

ups, bug in boxquote.el?
should look like this:

  ,------------------------------------------------------------
  | (drx \"foo\" t '(nil t (2)) t '(t nil (2))
  |      \"bar\" \"loo\")
  | \"^\\(\\*\\)\\(\\*\\)\\{2\\}\\(foobar\\(loo\\)\\{2\\}\\)$\"
  `------------------------------------------------------------

  ,------------------------------------------------------------
  | (drx \"foo\" t '(t nil (2)) t '(nil t (2))
  |       \"bar\" \"loo\")
  | \"^\\(\\*\\(\\*\\)\\{2\\}\\)foo\\(bar\\)\\(loo\\)\\{2\\}$\"
  `------------------------------------------------------------

| 
| Many more usage examples with their expected outcome can be found as
| ERT tests in the test-section of drx.el and should be consulted in
| doubt.
| 
| There are a few symbols with special meaning as values of the
| ENCLOSING argument (when used as atomic argument or as car of a list
| argument), namely:
|  
|   - alt :: Concat and enclose RGXP and RGXPS as regexp alternatives.
|            Eventually add drx-BOL/STARS and drx-EOL before
|            first/after last alternative.
| 
|   - grp :: Concat and enclose RGXP and RGXPS. Eventually add
|              drx-BOL, STARS and drx-EOL as first/second/last group.
| 
|   - shy :: Concat and enclose RGXP and RGXPS as shy regexp
|            groups. Eventually add drx-BOL, STARS and drx-EOL as
|            first/second/last group.
| 
|   - app :: like 'grp', but rather append RGXP and RGXPS instead
|               of enclosing them if they are already regexp groups
|               themselves.
| 
| They create regexp groups but don't apply repeaters to them.
| 
| [back]
`----

-- 
cheers,
Thorsten




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-26 19:39       ` Matthew Plant
@ 2014-07-27 12:27         ` Stephen J. Turnbull
  2014-07-27 13:03           ` David Kastrup
  0 siblings, 1 reply; 51+ messages in thread
From: Stephen J. Turnbull @ 2014-07-27 12:27 UTC (permalink / raw)
  To: Matthew Plant; +Cc: chad, emacs-devel@gnu.org

Matthew Plant writes:

 > Although this data is convincing in some respects, I would like to note
 > that xemacs is dead.

The reports of the death of XEmacs are premature.

 > The download off their main page did not even have any
 > raw string literals.

XEmacs 21.4 will never have them.

Of course almost all of the uses of raw strings are for regexps.  Most
non-regexp strings don't use string escapes, except for the occasional
TAB or LF.  Format strings use an alternative operator character %, so
don't have the problem of string escape colliding with the operator
character.

Sure, you can do a lot for readability as PCRE or Python regexps have
done, but regexps are unreadable almost by design, and those regexp
syntaxes benefit from rawstrings, too.  Almost anything (that doesn't
involve changing the meaning of existing legal programs) that improves
readability of regexps is worthwhile.

Rawstrings are cheap and effective.





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-27 12:27         ` Stephen J. Turnbull
@ 2014-07-27 13:03           ` David Kastrup
  2014-07-27 20:58             ` David Caldwell
  2014-07-28  1:29             ` Stephen J. Turnbull
  0 siblings, 2 replies; 51+ messages in thread
From: David Kastrup @ 2014-07-27 13:03 UTC (permalink / raw)
  To: emacs-devel

"Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp> writes:

> Matthew Plant writes:
>
>  > Although this data is convincing in some respects, I would like to note
>  > that xemacs is dead.
>
> The reports of the death of XEmacs are premature.
>
>  > The download off their main page did not even have any
>  > raw string literals.
>
> XEmacs 21.4 will never have them.

Drawing the following excerpts from
<URL:http://www.xemacs.org/Releases/index.html>:

Arguably, having the last "stable release" made in 2009, having no
"Gamma release" described as

    Note: XEmacs 21.4 has been promoted to stable, and there currently
    is no gamma series. Plans for the next release are in the works.

    The gamma series of releases is satisfactorily stable for most
    sophisticated users. Most Linux or *BSD users should get the best
    results from the gamma series, and we strongly recommend it to the
    ``tester'' distributions like NetBSD current, Debian sid, Mandrake
    Cooker, Red Hat Rawhide, and so on. XEmacs will be ready when they
    are!

    The gamma series of releases is the candidate for promotion to a
    stable series. Although we do not promote the code base to gamma
    while there are known critical bugs in the code base, to attempt to
    meet schedules we also do promote fairly quickly once we've fixed
    the last known critical bug. Everybody does this, and everybody
    knows that despite the best efforts of the developers, ``point oh''
    releases typically still have bugs in them. The gamma concept simply
    acknowledges this.

at all slated to become stable, and having the current "Beta release"
branch 21.5 started in 2001 with the description

    The beta series of releases is for testers. Users should read the
    XEmacs Beta mailing list, <xemacs-beta@xemacs.org>. Users should
    prepare themselves for crashes, data loss, freezes, and other
    unpleasant events. The beta series contains much experimental code,
    and fairly large changes may be introduced directly into the code
    base. These are announced as they happen on xemacs-beta. Wannabe
    developers may also want to follow the XEmacs Patches
    <xemacs-patches@xemacs.org> and XEmacs CVS Commits
    <xemacs-cvs@xemacs.org> mailing lists for up-to-the-minute details
    about the state of the code base.

is making Debian look like a fast-paced project.  Reports of XEmacs
being dead may be exaggerated, but it does look a lot like suspended
animation.

> Sure, you can do a lot for readability as PCRE or Python regexps have
> done, but regexps are unreadable almost by design, and those regexp
> syntaxes benefit from rawstrings, too.  Almost anything (that doesn't
> involve changing the meaning of existing legal programs) that improves
> readability of regexps is worthwhile.
>
> Rawstrings are cheap and effective.

When rawstrings are supported, it becomes more expedient to recognize
things like \n and \t, probably also \f in regexps (\b is already
taken).  At the current point of time, they just evaluate to n and t.
That makes input of tabs and newlines in raw strings a nuisance and a
potential source of errors.

It's not actually an issue with rawstrings as such, but rather of their
use within regexps.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-27 13:03           ` David Kastrup
@ 2014-07-27 20:58             ` David Caldwell
  2014-07-27 23:17               ` Matthew Plant
                                 ` (2 more replies)
  2014-07-28  1:29             ` Stephen J. Turnbull
  1 sibling, 3 replies; 51+ messages in thread
From: David Caldwell @ 2014-07-27 20:58 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2340 bytes --]

On 7/27/14 6:03 AM, David Kastrup wrote:
> "Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp> writes:
>
>> Sure, you can do a lot for readability as PCRE or Python regexps have
>> done, but regexps are unreadable almost by design, and those regexp
>> syntaxes benefit from rawstrings, too.  Almost anything (that doesn't
>> involve changing the meaning of existing legal programs) that improves
>> readability of regexps is worthwhile.
>>
>> Rawstrings are cheap and effective.
> 
> When rawstrings are supported, it becomes more expedient to recognize
> things like \n and \t, probably also \f in regexps (\b is already
> taken).  At the current point of time, they just evaluate to n and t.
> That makes input of tabs and newlines in raw strings a nuisance and a
> potential source of errors.
> 
> It's not actually an issue with rawstrings as such, but rather of their
> use within regexps.

Why not, then, skip rawstrings completely and go directly to a regular
expression reader: #r// (or even just #//) instead of #r""?

Then you can add whatever semantics are needed for good regexp reading
(ie, let '\n', '\t', and others get escaped in the string reading, but
allow '\(' to go through unescaped). This will be just as easy to
implement as raw strings.

Languages like Javascript, Perl, Ruby, Bash, and Groovy have shown that
having a special support for regexps at a language level is a very
effective way of dealing with them. Plus it opens the door to
extensions: #r//p for PCRE/Perl syntax[1] or #r//x for more readable
regexps[2], etc.

I think using rawstrings is too generic an answer to the problem. Given
that so much of Emacs's functionality is reliant an regular expressions,
it makes sense to design something specifically for them. Doing that
means they can be tailored and tweaked for maximum functionality without
worrying about possible other usages that people might come up (which
will undoubtedly happen with rawstrings).

-David

[1] And practically every other language on the planet. Really, it seems
like only Emacs is left in the dark ages of basic POSIX regexps where
'(' means literal paren and not matching.

[2] Another Perl feature, it allows whitespace and comments in regexps,
for much improved readability. See http://perldoc.perl.org/perlre.html#/x


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4219 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-27 20:58             ` David Caldwell
@ 2014-07-27 23:17               ` Matthew Plant
  2014-07-28 18:27                 ` Richard Stallman
  2014-07-28  2:16               ` Stephen J. Turnbull
  2014-07-30 20:28               ` Ted Zlatanov
  2 siblings, 1 reply; 51+ messages in thread
From: Matthew Plant @ 2014-07-27 23:17 UTC (permalink / raw)
  To: David Caldwell; +Cc: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2740 bytes --]

I think this is a very good idea. However, agreeing upon which semantics
are needed may prove problematic. Do you have any suggestions on this
point? The easiest method would probably just go off some other predefined
rules like perl's (but definitely not perl's).

-Matt

On Sunday, July 27, 2014, David Caldwell <david@porkrind.org> wrote:

> On 7/27/14 6:03 AM, David Kastrup wrote:
> > "Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp <javascript:;>> writes:
> >
> >> Sure, you can do a lot for readability as PCRE or Python regexps have
> >> done, but regexps are unreadable almost by design, and those regexp
> >> syntaxes benefit from rawstrings, too.  Almost anything (that doesn't
> >> involve changing the meaning of existing legal programs) that improves
> >> readability of regexps is worthwhile.
> >>
> >> Rawstrings are cheap and effective.
> >
> > When rawstrings are supported, it becomes more expedient to recognize
> > things like \n and \t, probably also \f in regexps (\b is already
> > taken).  At the current point of time, they just evaluate to n and t.
> > That makes input of tabs and newlines in raw strings a nuisance and a
> > potential source of errors.
> >
> > It's not actually an issue with rawstrings as such, but rather of their
> > use within regexps.
>
> Why not, then, skip rawstrings completely and go directly to a regular
> expression reader: #r// (or even just #//) instead of #r""?
>
> Then you can add whatever semantics are needed for good regexp reading
> (ie, let '\n', '\t', and others get escaped in the string reading, but
> allow '\(' to go through unescaped). This will be just as easy to
> implement as raw strings.
>
> Languages like Javascript, Perl, Ruby, Bash, and Groovy have shown that
> having a special support for regexps at a language level is a very
> effective way of dealing with them. Plus it opens the door to
> extensions: #r//p for PCRE/Perl syntax[1] or #r//x for more readable
> regexps[2], etc.
>
> I think using rawstrings is too generic an answer to the problem. Given
> that so much of Emacs's functionality is reliant an regular expressions,
> it makes sense to design something specifically for them. Doing that
> means they can be tailored and tweaked for maximum functionality without
> worrying about possible other usages that people might come up (which
> will undoubtedly happen with rawstrings).
>
> -David
>
> [1] And practically every other language on the planet. Really, it seems
> like only Emacs is left in the dark ages of basic POSIX regexps where
> '(' means literal paren and not matching.
>
> [2] Another Perl feature, it allows whitespace and comments in regexps,
> for much improved readability. See http://perldoc.perl.org/perlre.html#/x
>
>

[-- Attachment #2: Type: text/html, Size: 3426 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-27 13:03           ` David Kastrup
  2014-07-27 20:58             ` David Caldwell
@ 2014-07-28  1:29             ` Stephen J. Turnbull
  1 sibling, 0 replies; 51+ messages in thread
From: Stephen J. Turnbull @ 2014-07-28  1:29 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

David Kastrup writes:

 > Reports of XEmacs being dead may be exaggerated, but it does look a
 > lot like suspended animation.

I'm OK with the latter description.

 > When rawstrings are supported, it becomes more expedient to recognize
 > things like \n and \t, probably also \f in regexps (\b is already
 > taken).

Sure.  AFAIK most of the modern regexp syntaxes do.  I guess that it's
possible that there are regexps out in the real world that contain
"\n" and work because "n" would work there too, so that is a change in
semantics.  (It's a shame that Emacs doesn't consider that kind of
thing an error, because it's almost certainly a bug.)

I don't really see a point in \f, though.  Emacs users (at least
old-timers) are used to seeing "^L" in their code, and I haven't seen
an Emacs configured to display that as an actual form feed in at least
20 years.  \t is useful because it displays the same as a number
(nondeterministic) of spaces, and \n is useful because embedding an
actual newline in a string messes up your indentation, often leaving a
lone double quote on the next line (when newline terminates the
string).







^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-27 20:58             ` David Caldwell
  2014-07-27 23:17               ` Matthew Plant
@ 2014-07-28  2:16               ` Stephen J. Turnbull
  2014-07-28  7:43                 ` Andreas Schwab
  2014-07-30 20:28               ` Ted Zlatanov
  2 siblings, 1 reply; 51+ messages in thread
From: Stephen J. Turnbull @ 2014-07-28  2:16 UTC (permalink / raw)
  To: David Caldwell; +Cc: emacs-devel

David Caldwell writes:

 > Why not, then, skip rawstrings completely and go directly to a regular
 > expression reader: #r// (or even just #//) instead of #r""?

It's unlispy.  Regular expressions *are* strings and can be
manipulated as strings; (almost) any string can be used as a regular
expression.  Therefore (in Lisp) we normally define separate functions
to deal with "string" use cases and "regexp" uses cases for the same
object.  And they mix and match well:

(defvar xft-xlfd-font-regexp
  (concat
   ;; XLFD specifies ISO 8859-1 encoding, but we can't handle non-ASCII
   ;; in Mule when this function is called.  So use HPC.
   ;; (xe_xlfd_prefix "\\(\\+[\040-\176\240-\377]*\\)?-")
   ;; (xe_xlfd_opt_text "\\([\040-\044\046-\176\240-\377]*\\)")
   ;; (xe_xlfd_text "\\([\040-\044\046-\176\240-\377]+\\)")
   "\\`"
   "\\(\\+[\040-\176]*\\)?-"		; prefix
   "\\([^-]+\\)"			; foundry
   "-"
   "\\([^-]+\\)"			; family
   "-"
   "\\([^-]+\\)"			; weight
   "-"
   "\\([0-9ior?*][iot]?\\)"		; slant
   "-"
   "\\([^-]+\\)"			; swidth
   "-"
   "\\([^-]*\\)"			; adstyle
   "-"
   "\\([0-9?*]+\\|\\[[ 0-9+~.e?*]+\\]\\)"    ; pixelsize
   "-"
   "\\([0-9?*]+\\|\\[[ 0-9+~.e?*]+\\]\\)"    ; pointsize
   "-"
   "\\([0-9?*]+\\)"			; resx
   "-"
   "\\([0-9?*]+\\)"			; resy
   "-"
   "\\([cmp?*]\\)"			; spacing
   "-"
   "~?"					; avgwidth
   "\\([0-9?*]+\\)"
   "-"
   "\\([^-]+\\)"			; registry
   "-"
   "\\([^-]+\\)"			; encoding
   "\\'")
  "The regular expression used to match XLFD font names.")

Of course that would be more readable with rawstrings (not used
because this code is shared with XEmacs 21.4), and even more readable
with PCRE, but it shows we don't really need /x to build regexps
readably.  If #r"..." generated something other than strings, you'd
have to write code to deal with issues like building regexps using
concat.  I think format would be a huge can of worms.

 > This will be just as easy to implement as raw strings.

No, it won't.  Raw strings are just a different read syntax for
strings, and have exactly the same internal representation.  At
present we don't have a regular expression type (although we do have a
compiled regular expression type internally).  If you're not proposing
to define a regular expression type (good luck getting that past
RMS!), then you're just proposing a rawstring syntax tuned for regexp
use.

But there's no reason that couldn't be used for other purposes.  For
example, some people (Python programmers) would probably appreciate a
#r"..."/x rawstring syntax that automatically dedents -- for use in
docstrings.

 > Languages like Javascript, Perl, Ruby, Bash, and Groovy have shown that
 > having a special support for regexps at a language level is a very
 > effective way of dealing with them.

Lisp is not those languages, and in fact it is very unlike those
languages.

 > Plus it opens the door to extensions: #r//p for PCRE/Perl syntax[1]
 > or #r//x for more readable regexps[2], etc.

(defun emacsify-pcre (s)
  "Convert a PCRE to Emacs notation, properly ;-) ignoring unknown backslash."
  ;; exercise for the reader
  )

or

(require 'pcre)                         ; SXEmacs may have implemented this.
(let ((cre (pcre-compile "...")))
  (while (pcre-search-forward cre)
    (do-something)))

and as shown above /x isn't really necessary.  Like it or not, that's
the way these things are done in the Emacs Lisp world.  If you don't
like it, there are languages like Javascript, Perl, Ruby, Bash, and
Groovy.  (Python is too much like Lisp for you, I suspect. ;-)

 > I think using rawstrings is too generic an answer to the problem.

I think using rawstrings is the only sane answer to the problem.  You
can call them "regular expressions" as suggested by the #r notation
and their most prominent application, but in Emacs Lisp representing
them internally as a type other than string would be way too much work
given the idioms we have for constructing regexps that would need to
be reimplemented.  Given that internally they are (Just String), why
specialize to regular expressions?  Would you error on #r/*.*/, which
is invalid syntax for a regular expression?

 > [1] And practically every other language on the planet. Really, it seems
 > like only Emacs is left in the dark ages of basic POSIX regexps where
 > '(' means literal paren and not matching.

Sure, but that's a different problem easily solved if anyone wants to
do it.  GNU grep shows how: use egrep.  (POSIX grep with its default
to basic REs and an argument -E to indicate modern syntax is a bad
example for Lisp, I think.)  The analog for Emacs is a suite of
"pcre-" functions.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-28  2:16               ` Stephen J. Turnbull
@ 2014-07-28  7:43                 ` Andreas Schwab
  0 siblings, 0 replies; 51+ messages in thread
From: Andreas Schwab @ 2014-07-28  7:43 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: emacs-devel, David Caldwell

There is also the rx macro.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-27 23:17               ` Matthew Plant
@ 2014-07-28 18:27                 ` Richard Stallman
  2014-07-28 19:32                   ` Matthew Plant
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Stallman @ 2014-07-28 18:27 UTC (permalink / raw)
  To: Matthew Plant; +Cc: emacs-devel, david

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

If we introduce a new syntax for regexps, we need to make the
sexp parsing code handle it both forwards and backwards.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-28 18:27                 ` Richard Stallman
@ 2014-07-28 19:32                   ` Matthew Plant
  2014-07-29 19:15                     ` Richard Stallman
  0 siblings, 1 reply; 51+ messages in thread
From: Matthew Plant @ 2014-07-28 19:32 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, David Caldwell

[-- Attachment #1: Type: text/plain, Size: 860 bytes --]

>
> If we introduce a new syntax for regexps, we need to make the
> sexp parsing code handle it both forwards and backwards.
>

I don't think this is especially difficult, but it is important to note.


On Mon, Jul 28, 2014 at 11:27 AM, Richard Stallman <rms@gnu.org> wrote:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> If we introduce a new syntax for regexps, we need to make the
> sexp parsing code handle it both forwards and backwards.
>
> --
> Dr Richard Stallman
> President, Free Software Foundation
> 51 Franklin St
> Boston MA 02110
> USA
> www.fsf.org  www.gnu.org
> Skype: No way! That's nonfree (freedom-denying) software.
>   Use Ekiga or an ordinary phone call.
>
>

[-- Attachment #2: Type: text/html, Size: 1617 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-25 19:47 Raw string literals in Emacs lisp Matthew Plant
                   ` (3 preceding siblings ...)
  2014-07-26 21:37 ` Thorsten Jolitz
@ 2014-07-29  6:32 ` William Xu
  2014-07-29  7:40   ` Andreas Schwab
  4 siblings, 1 reply; 51+ messages in thread
From: William Xu @ 2014-07-29  6:32 UTC (permalink / raw)
  To: emacs-devel

Matthew Plant <maplant2@illinois.edu> writes:

> I think that raw string literals would be a really nice thing to add to Emacs
> lisp. The most immediate benefit is that writing regexps would be much easier.
> And since most of the work that goes into major modes is writing regexp, writing
> major modes would become a lot faster.

Would love to have this!

Here is one of my recent use case: "quote bashslash in a shell command".
http://permalink.gmane.org/gmane.emacs.help/98550

    The shell command is:
      echo foo.bar | sed -e 's/\..*//'
    
    which will produce "foo" on bash.
    
    If i try to pass it to shell-command-to-string:
      (shell-command-to-string "echo foo.bar | sed -e 's/\..*//'")
    	=> "\n"
    
    Then i find i need to quote the backslash in emacs once more:
      (shell-command-to-string "echo foo.bar | sed -e 's/\\..*//'")
    	=> "foo\n"
    
    Is there a function or other way that can handle this kind of backslash
    quoting automatically?

-- 
William

http://xwl.appspot.com




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-29  6:32 ` William Xu
@ 2014-07-29  7:40   ` Andreas Schwab
  0 siblings, 0 replies; 51+ messages in thread
From: Andreas Schwab @ 2014-07-29  7:40 UTC (permalink / raw)
  To: William Xu; +Cc: emacs-devel

William Xu <william.xwl@gmail.com> writes:

> Here is one of my recent use case: "quote bashslash in a shell command".

This is not about shell command syntax, but about Lisp syntax.

> http://permalink.gmane.org/gmane.emacs.help/98550
>
>     The shell command is:
>       echo foo.bar | sed -e 's/\..*//'
>     
>     which will produce "foo" on bash.
>     
>     If i try to pass it to shell-command-to-string:
>       (shell-command-to-string "echo foo.bar | sed -e 's/\..*//'")
>     	=> "\n"
>     
>     Then i find i need to quote the backslash in emacs once more:
>       (shell-command-to-string "echo foo.bar | sed -e 's/\\..*//'")
>     	=> "foo\n"
>     
>     Is there a function or other way that can handle this kind of backslash
>     quoting automatically?

Since this is part of the Lisp syntax, so there is no way to solve that
programmatically.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-28 19:32                   ` Matthew Plant
@ 2014-07-29 19:15                     ` Richard Stallman
  2014-07-30  0:26                       ` Matthew Plant
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Stallman @ 2014-07-29 19:15 UTC (permalink / raw)
  To: Matthew Plant; +Cc: emacs-devel, david

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > If we introduce a new syntax for regexps, we need to make the
    > sexp parsing code handle it both forwards and backwards.
    >

    I don't think this is especially difficult, but it is important to note.

The point is we would need to design the syntax to make it possible.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-29 19:15                     ` Richard Stallman
@ 2014-07-30  0:26                       ` Matthew Plant
  2014-07-30  4:28                         ` Richard Stallman
  0 siblings, 1 reply; 51+ messages in thread
From: Matthew Plant @ 2014-07-30  0:26 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel@gnu.org, David Caldwell

[-- Attachment #1: Type: text/plain, Size: 817 bytes --]

If it can be parsed forwards it can be parsed backwards, although it might
not be immediately possible to do so. The parser might have to check to see
if it needs to re-parse a section, but I don't think there's any syntax we
could introduce that is impossible to parse backwards. It's a difficulty
thing.

However, a regex syntax would be much more difficult to parse backwards; I
think this is a convincing enough argument that only simple raw strings
should be implemented.

Pretty much every modern language that has regex has raw string literals.
Heck, when regex was added to the C++ standard, raw string literals where
added in the same spec.

If raw string literals were added, should they allow custom delimiters?
This would probably make the strings just as hard to parse backwards as
regexps, so I say no.

[-- Attachment #2: Type: text/html, Size: 943 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30  0:26                       ` Matthew Plant
@ 2014-07-30  4:28                         ` Richard Stallman
  2014-07-30 18:54                           ` Matthew Plant
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Stallman @ 2014-07-30  4:28 UTC (permalink / raw)
  To: Matthew Plant; +Cc: david, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    If it can be parsed forwards it can be parsed backwards,

Perhaps you mean it is possible in some theoretical sense.
That's not the issue here.  The Emacs sexp-scanning functions
scan backwards in a simple way, and the syntax has to be
suitable for them to handle.

							     although it might
    not be immediately possible to do so.

We need the backward scanning to work when the syntax is installed.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30  4:28                         ` Richard Stallman
@ 2014-07-30 18:54                           ` Matthew Plant
  0 siblings, 0 replies; 51+ messages in thread
From: Matthew Plant @ 2014-07-30 18:54 UTC (permalink / raw)
  To: rms; +Cc: David Caldwell, emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

Indeed I do not think that backward scanning will be possible with any kind
of string that has different delimiters than double quotes.


On Tue, Jul 29, 2014 at 9:28 PM, Richard Stallman <rms@gnu.org> wrote:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>     If it can be parsed forwards it can be parsed backwards,
>
> Perhaps you mean it is possible in some theoretical sense.
> That's not the issue here.  The Emacs sexp-scanning functions
> scan backwards in a simple way, and the syntax has to be
> suitable for them to handle.
>
>                                                              although it
> might
>     not be immediately possible to do so.
>
> We need the backward scanning to work when the syntax is installed.
>
> --
> Dr Richard Stallman
> President, Free Software Foundation
> 51 Franklin St
> Boston MA 02110
> USA
> www.fsf.org  www.gnu.org
> Skype: No way! That's nonfree (freedom-denying) software.
>   Use Ekiga or an ordinary phone call.
>
>

[-- Attachment #2: Type: text/html, Size: 1788 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-27 20:58             ` David Caldwell
  2014-07-27 23:17               ` Matthew Plant
  2014-07-28  2:16               ` Stephen J. Turnbull
@ 2014-07-30 20:28               ` Ted Zlatanov
  2014-07-30 20:41                 ` David Caldwell
  2 siblings, 1 reply; 51+ messages in thread
From: Ted Zlatanov @ 2014-07-30 20:28 UTC (permalink / raw)
  To: emacs-devel

On Sun, 27 Jul 2014 13:58:37 -0700 David Caldwell <david@porkrind.org> wrote: 

DC> Why not, then, skip rawstrings completely and go directly to a regular
DC> expression reader: #r// (or even just #//) instead of #r""?

For shell commands, for instance, it would be convenient to have
rawstrings because they often have internal backslash escapes.

Ted




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30 20:28               ` Ted Zlatanov
@ 2014-07-30 20:41                 ` David Caldwell
  2014-07-30 20:54                   ` Ted Zlatanov
  0 siblings, 1 reply; 51+ messages in thread
From: David Caldwell @ 2014-07-30 20:41 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]

On 7/30/14 1:28 PM, Ted Zlatanov wrote:
> On Sun, 27 Jul 2014 13:58:37 -0700 David Caldwell <david@porkrind.org> wrote: 
> 
> DC> Why not, then, skip rawstrings completely and go directly to a regular
> DC> expression reader: #r// (or even just #//) instead of #r""?
> 
> For shell commands, for instance, it would be convenient to have
> rawstrings because they often have internal backslash escapes.

That's precisely the point I made later in my email—rawstrings used in
shell don't want things like \n escaped, but regexps do (otherwise you
have to add "\n" literal support to the regexp engine). There's 2 usages
with competing semantics trying to use one generic interface. I still
posit that having a syntax directly for regexps would be beneficial. And
I think focusing on regexps is more important in Emacs as it happens
more than complicated shell commands.

Sadly it sounds like a the #r// would be a no-go due to the Emacs
requirements of parsing it in reverse (I assume because '/' is a valid
lisp symbol character).

-David



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4219 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30 20:41                 ` David Caldwell
@ 2014-07-30 20:54                   ` Ted Zlatanov
  2014-07-30 21:01                     ` Matthew Plant
                                       ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Ted Zlatanov @ 2014-07-30 20:54 UTC (permalink / raw)
  To: emacs-devel

On Wed, 30 Jul 2014 13:41:19 -0700 David Caldwell <david@porkrind.org> wrote: 

DC> On 7/30/14 1:28 PM, Ted Zlatanov wrote:
>> On Sun, 27 Jul 2014 13:58:37 -0700 David Caldwell <david@porkrind.org> wrote: 
>> 
DC> Why not, then, skip rawstrings completely and go directly to a regular
DC> expression reader: #r// (or even just #//) instead of #r""?
>> 
>> For shell commands, for instance, it would be convenient to have
>> rawstrings because they often have internal backslash escapes.

DC> That's precisely the point I made later in my email

Sorry I didn't see it.

DC> rawstrings used in shell don't want things like \n escaped, but
DC> regexps do (otherwise you have to add "\n" literal support to the
DC> regexp engine). There's 2 usages with competing semantics trying to
DC> use one generic interface. I still posit that having a syntax
DC> directly for regexps would be beneficial. And I think focusing on
DC> regexps is more important in Emacs as it happens more than
DC> complicated shell commands.

Heredocs are generally useful and popular and would also be supported by
this syntax.  But please don't take that as a knock against regexp
literal support, it's just not something I have needed.

DC> Sadly it sounds like a the #r// would be a no-go due to the Emacs
DC> requirements of parsing it in reverse (I assume because '/' is a valid
DC> lisp symbol character).

I have no opinion on that, I just want a simple syntax for literal data :)

How about using a Unicode character as the marker? (prepares for stoning)

Ted




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30 20:54                   ` Ted Zlatanov
@ 2014-07-30 21:01                     ` Matthew Plant
  2014-07-30 21:16                       ` Ted Zlatanov
  2014-08-02  8:47                     ` Alan Mackenzie
  2014-08-02  9:17                     ` Andreas Schwab
  2 siblings, 1 reply; 51+ messages in thread
From: Matthew Plant @ 2014-07-30 21:01 UTC (permalink / raw)
  To: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2153 bytes --]

> How about using a Unicode character as the marker? (prepares for stoning)
I'm on the fence about this idea. It certainly would make parsing in
reverse possible (assuming the reverse parsing functions do not operate on
char *) and easy, but it would also possibly mess up formatting. It also
might not add much convenience because frankly typing non-unicode
characters is _hard_.

 One suggestion would be to use the unicode left and right double/single
quotation marks.


On Wed, Jul 30, 2014 at 1:54 PM, Ted Zlatanov <tzz@lifelogs.com> wrote:

> On Wed, 30 Jul 2014 13:41:19 -0700 David Caldwell <david@porkrind.org>
> wrote:
>
> DC> On 7/30/14 1:28 PM, Ted Zlatanov wrote:
> >> On Sun, 27 Jul 2014 13:58:37 -0700 David Caldwell <david@porkrind.org>
> wrote:
> >>
> DC> Why not, then, skip rawstrings completely and go directly to a regular
> DC> expression reader: #r// (or even just #//) instead of #r""?
> >>
> >> For shell commands, for instance, it would be convenient to have
> >> rawstrings because they often have internal backslash escapes.
>
> DC> That's precisely the point I made later in my email
>
> Sorry I didn't see it.
>
> DC> rawstrings used in shell don't want things like \n escaped, but
> DC> regexps do (otherwise you have to add "\n" literal support to the
> DC> regexp engine). There's 2 usages with competing semantics trying to
> DC> use one generic interface. I still posit that having a syntax
> DC> directly for regexps would be beneficial. And I think focusing on
> DC> regexps is more important in Emacs as it happens more than
> DC> complicated shell commands.
>
> Heredocs are generally useful and popular and would also be supported by
> this syntax.  But please don't take that as a knock against regexp
> literal support, it's just not something I have needed.
>
> DC> Sadly it sounds like a the #r// would be a no-go due to the Emacs
> DC> requirements of parsing it in reverse (I assume because '/' is a valid
> DC> lisp symbol character).
>
> I have no opinion on that, I just want a simple syntax for literal data :)
>
> How about using a Unicode character as the marker? (prepares for stoning)
>
> Ted
>
>
>

[-- Attachment #2: Type: text/html, Size: 2922 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30 21:01                     ` Matthew Plant
@ 2014-07-30 21:16                       ` Ted Zlatanov
  2014-07-30 21:19                         ` Matthew Plant
  0 siblings, 1 reply; 51+ messages in thread
From: Ted Zlatanov @ 2014-07-30 21:16 UTC (permalink / raw)
  To: emacs-devel

On Wed, 30 Jul 2014 14:01:52 -0700 Matthew Plant <maplant2@illinois.edu> wrote: 

>> How about using a Unicode character as the marker? (prepares for stoning)
MP> I'm on the fence about this idea. It certainly would make parsing in
MP> reverse possible (assuming the reverse parsing functions do not operate on
MP> char *) and easy, but it would also possibly mess up formatting. It also
MP> might not add much convenience because frankly typing non-unicode
MP> characters is _hard_.

Eh, it's really not hard. If it was the only problem with this approach,
it could be enabled in the default keybindings.

MP>  One suggestion would be to use the unicode left and right double/single
MP> quotation marks.

Oh, no. The markers shouldn't look like existing ASCII characters or
there will be lynchings. Maybe LEFT DOUBLE ANGLE BRACKET and RIGHT
DOUBLE ANGLE BRACKET would work (they look like << and >>).

Ted




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30 21:16                       ` Ted Zlatanov
@ 2014-07-30 21:19                         ` Matthew Plant
  2014-07-31 10:13                           ` Ted Zlatanov
  0 siblings, 1 reply; 51+ messages in thread
From: Matthew Plant @ 2014-07-30 21:19 UTC (permalink / raw)
  To: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]

> Maybe LEFT DOUBLE ANGLE BRACKET and RIGHT
DOUBLE ANGLE BRACKET would work (they look like << and >>).

I'm pretty sure that those look like existing ASCII characters as well ;-)


On Wed, Jul 30, 2014 at 2:16 PM, Ted Zlatanov <tzz@lifelogs.com> wrote:

> On Wed, 30 Jul 2014 14:01:52 -0700 Matthew Plant <maplant2@illinois.edu>
> wrote:
>
> >> How about using a Unicode character as the marker? (prepares for
> stoning)
> MP> I'm on the fence about this idea. It certainly would make parsing in
> MP> reverse possible (assuming the reverse parsing functions do not
> operate on
> MP> char *) and easy, but it would also possibly mess up formatting. It
> also
> MP> might not add much convenience because frankly typing non-unicode
> MP> characters is _hard_.
>
> Eh, it's really not hard. If it was the only problem with this approach,
> it could be enabled in the default keybindings.
>
> MP>  One suggestion would be to use the unicode left and right
> double/single
> MP> quotation marks.
>
> Oh, no. The markers shouldn't look like existing ASCII characters or
> there will be lynchings. Maybe LEFT DOUBLE ANGLE BRACKET and RIGHT
> DOUBLE ANGLE BRACKET would work (they look like << and >>).
>
> Ted
>
>
>

[-- Attachment #2: Type: text/html, Size: 1966 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30 21:19                         ` Matthew Plant
@ 2014-07-31 10:13                           ` Ted Zlatanov
  0 siblings, 0 replies; 51+ messages in thread
From: Ted Zlatanov @ 2014-07-31 10:13 UTC (permalink / raw)
  To: emacs-devel

On Wed, 30 Jul 2014 14:19:07 -0700 Matthew Plant <maplant2@illinois.edu> wrote: 

TZ> Maybe LEFT DOUBLE ANGLE BRACKET and RIGHT
TZ> DOUBLE ANGLE BRACKET would work (they look like << and >>).

MP> I'm pretty sure that those look like existing ASCII characters as well ;-)

(Your quoting is strange, could you please use a better MUA?)

I'm honestly happy with anything that works. I actually had the
French-style (used in many countries) guillemets in mind, like « for
LEFT-POINTING DOUBLE ANGLE QUOTATION MARK. Those:

* do not look too much like << in most fonts...
* ...but are close enough to be familiar
* are used in many locales to quote text already
* already have keyboard shortcuts on many platforms (see http://en.wikipedia.org/wiki/Guillemet#Typing_.22.C2.AB.22_and_.22.C2.BB.22_on_computers)
* are just extended ASCII, so they are already universally supported, both at the font and at the platform level

So, two questions:

- can anyone think of better markers for rawstring / raw regex literals?

- are the markers proposed here going to resolve the backwards-scanning
  issue?  And is that all that's blocking the proposal?

Ted




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30 20:54                   ` Ted Zlatanov
  2014-07-30 21:01                     ` Matthew Plant
@ 2014-08-02  8:47                     ` Alan Mackenzie
  2014-08-02  9:14                       ` David Kastrup
  2014-08-02  9:17                     ` Andreas Schwab
  2 siblings, 1 reply; 51+ messages in thread
From: Alan Mackenzie @ 2014-08-02  8:47 UTC (permalink / raw)
  To: emacs-devel

Hello, Ted.

On Wed, Jul 30, 2014 at 04:54:43PM -0400, Ted Zlatanov wrote:

> ....., I just want a simple syntax for literal data :)

> How about using a Unicode character as the marker? (prepares for stoning)

OK, it's taken time, and nobody else looks like they're about to do it,
so I will cast the first stone.

NO, NO, NO, NO!  The only Unicode characters to be used in Emacs are
those that are also ASCII characters, with a tiny number of essential
exceptions (for example, the non-European characters in the sentence-end
regexp, and, of course, people's names in comments).

A Non-ASCII character is difficult to type for most people.  Not all
setups can display it.  Adopting such a character would mean a lot of
work for a lot of people.

And using such characters as delimiters would introduce yet one more
incompatibility with XEmacs which, Stephen informs us, uses #r"..." for
raw strings.  Why not just adapt that convention?  Easy to type, easy to
read, easy to parse.

> Ted

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-02  8:47                     ` Alan Mackenzie
@ 2014-08-02  9:14                       ` David Kastrup
  2014-08-02 10:23                         ` Alan Mackenzie
  2014-08-03  6:50                         ` Stephen J. Turnbull
  0 siblings, 2 replies; 51+ messages in thread
From: David Kastrup @ 2014-08-02  9:14 UTC (permalink / raw)
  To: emacs-devel

Alan Mackenzie <acm@muc.de> writes:

> Hello, Ted.
>
> On Wed, Jul 30, 2014 at 04:54:43PM -0400, Ted Zlatanov wrote:
>
>> ....., I just want a simple syntax for literal data :)
>
>> How about using a Unicode character as the marker? (prepares for stoning)
>
> OK, it's taken time, and nobody else looks like they're about to do it,
> so I will cast the first stone.
>
> NO, NO, NO, NO!  The only Unicode characters to be used in Emacs are
> those that are also ASCII characters, with a tiny number of essential
> exceptions (for example, the non-European characters in the sentence-end
> regexp, and, of course, people's names in comments).
>
> A Non-ASCII character is difficult to type for most people.  Not all
> setups can display it.  Adopting such a character would mean a lot of
> work for a lot of people.
>
> And using such characters as delimiters would introduce yet one more
> incompatibility with XEmacs which, Stephen informs us, uses #r"..." for
> raw strings.  Why not just adapt that convention?  Easy to type, easy to
> read, easy to parse.

Easy to parse?

r#"?\" is a complete string.  How do you parse it backwards?

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-07-30 20:54                   ` Ted Zlatanov
  2014-07-30 21:01                     ` Matthew Plant
  2014-08-02  8:47                     ` Alan Mackenzie
@ 2014-08-02  9:17                     ` Andreas Schwab
  2 siblings, 0 replies; 51+ messages in thread
From: Andreas Schwab @ 2014-08-02  9:17 UTC (permalink / raw)
  To: emacs-devel

Ted Zlatanov <tzz@lifelogs.com> writes:

> How about using a Unicode character as the marker? (prepares for stoning)

That's not possible, since they are valid symbol characters.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-02  9:14                       ` David Kastrup
@ 2014-08-02 10:23                         ` Alan Mackenzie
  2014-08-02 15:51                           ` Richard Stallman
  2014-08-03  6:50                         ` Stephen J. Turnbull
  1 sibling, 1 reply; 51+ messages in thread
From: Alan Mackenzie @ 2014-08-02 10:23 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

'afternoon, David.

On Sat, Aug 02, 2014 at 11:14:08AM +0200, David Kastrup wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > Hello, Ted.

> > On Wed, Jul 30, 2014 at 04:54:43PM -0400, Ted Zlatanov wrote:

> >> ....., I just want a simple syntax for literal data :)

> >> How about using a Unicode character as the marker? (prepares for stoning)

> > OK, it's taken time, and nobody else looks like they're about to do it,
> > so I will cast the first stone.

> > NO, NO, NO, NO!  The only Unicode characters to be used in Emacs are
> > those that are also ASCII characters, with a tiny number of essential
> > exceptions (for example, the non-European characters in the sentence-end
> > regexp, and, of course, people's names in comments).

> > A Non-ASCII character is difficult to type for most people.  Not all
> > setups can display it.  Adopting such a character would mean a lot of
> > work for a lot of people.

> > And using such characters as delimiters would introduce yet one more
> > incompatibility with XEmacs which, Stephen informs us, uses #r"..." for
> > raw strings.  Why not just adapt that convention?  Easy to type, easy to
> > read, easy to parse.

> Easy to parse?

> r#"?\" is a complete string.  How do you parse it backwards?

Parsing practically _anything_ backwards (especially comments) is
difficult.  There's nothing particularly difficult about #r"?\" that
isn't shared by, e.g. /*  /*  */.  Heuristics will be needed for strings,
should raw strings come to exist, just as they are for comments.

> -- 
> David Kastrup

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-02 10:23                         ` Alan Mackenzie
@ 2014-08-02 15:51                           ` Richard Stallman
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Stallman @ 2014-08-02 15:51 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: dak, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

      There's nothing particularly difficult about #r"?\" that
    isn't shared by, e.g. /*  /*  */.

C parsing seems to understand that construction, but I am not sure how
it does so.  (Want to take a look?)

In any case, There is no such problem with comments in Lisp.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-02  9:14                       ` David Kastrup
  2014-08-02 10:23                         ` Alan Mackenzie
@ 2014-08-03  6:50                         ` Stephen J. Turnbull
  2014-08-03  7:29                           ` David Kastrup
  2014-08-04  1:55                           ` Richard Stallman
  1 sibling, 2 replies; 51+ messages in thread
From: Stephen J. Turnbull @ 2014-08-03  6:50 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

David Kastrup writes:

 > r#"?\" is a complete string.  How do you parse it backwards?

By catching the parse error when parsing it as a (normal) string, then
reparsing it as a raw string (ie, running backwards over the
characters until you hit the second ?"), and check for a leading #r
(two tokens of lookahead).

Thanks for the example, David, XEmacs is buggy here (or maybe
terminating a rawstring with \ will be declared illegal ;-).



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-03  6:50                         ` Stephen J. Turnbull
@ 2014-08-03  7:29                           ` David Kastrup
  2014-08-03 13:12                             ` Stephen J. Turnbull
  2014-08-04  1:55                           ` Richard Stallman
  1 sibling, 1 reply; 51+ messages in thread
From: David Kastrup @ 2014-08-03  7:29 UTC (permalink / raw)
  To: emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> David Kastrup writes:
>
>  > r#"?\" is a complete string.  How do you parse it backwards?
>
> By catching the parse error when parsing it as a (normal) string, then
> reparsing it as a raw string (ie, running backwards over the
> characters until you hit the second ?"), and check for a leading #r
> (two tokens of lookahead).
>
> Thanks for the example, David, XEmacs is buggy here (or maybe
> terminating a rawstring with \ will be declared illegal ;-).

Uh, I wasn't planning to trip up XEmacs.  At any rate, things can get
more complex, like with

(format "%s%c\n""r#"?\") => "r#\"
"

which is valid Elisp right now and would remain so, but which would look
like containing a valid string r#"?\" after the syntax change when
scanning backwards.  It's not like syntax highlighting etc don't revert
to heuristics (like with "(" in first column), but it's still obvious
that this choice is not conflict-free.  And I don't see how one could
reasonably get around that without also changing the ending delimiter.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-03  7:29                           ` David Kastrup
@ 2014-08-03 13:12                             ` Stephen J. Turnbull
  2014-08-03 13:27                               ` David Kastrup
  2014-08-03 13:40                               ` David Kastrup
  0 siblings, 2 replies; 51+ messages in thread
From: Stephen J. Turnbull @ 2014-08-03 13:12 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

David Kastrup writes:

 > At any rate, things can get more complex, like with
 > 
 > (format "%s%c\n""r#"?\") => "r#\"
 > "

I'm not really worried about more complex.  I am concerned about
whether there's an unambiguous answer to "what is the value -- or
error -- of eval-print-last-sexp at point?"

In the case of

    (format "%s%c\n""r#"?\")-!-

it's "r#\"\n".  But for

    (format "%s%c\n""r#"?\"-!-)

you could argue that it's ?\" (that's XEmacs's opinion) or "?\\".  I
guess for XEmacs (which already has this syntax in the wild) the rule
should be "longest match wins" (because otherwise there's no way to
evaluate r#"?\" in an interactive buffer), but for Emacs that looks
like a deal-killer, and it's already present with just r#"?\".

*sigh*





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-03 13:12                             ` Stephen J. Turnbull
@ 2014-08-03 13:27                               ` David Kastrup
  2014-08-03 15:01                                 ` Stephen J. Turnbull
  2014-08-03 13:40                               ` David Kastrup
  1 sibling, 1 reply; 51+ messages in thread
From: David Kastrup @ 2014-08-03 13:27 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> I'm not really worried about more complex.  I am concerned about
> whether there's an unambiguous answer to "what is the value -- or
> error -- of eval-print-last-sexp at point?"
>
> In the case of
>
>     (format "%s%c\n""r#"?\")-!-
>
> it's "r#\"\n".  But for
>
>     (format "%s%c\n""r#"?\"-!-)
>
> you could argue that it's ?\" (that's XEmacs's opinion) or "?\\".  I
> guess for XEmacs (which already has this syntax in the wild) the rule
> should be "longest match wins" (because otherwise there's no way to
> evaluate r#"?\" in an interactive buffer), but for Emacs that looks
> like a deal-killer, and it's already present with just r#"?\".

I don't understand the reason why this should be a deal-killer for Emacs
but not for XEmacs.  Is this because of different syntax infrastructure?
Or a different tolerance level for conveivable but unlikely problems?
Is XEmacs going to run into the same problems when ingesting some of
Emacs' highlighting/parsing stuff?

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-03 13:12                             ` Stephen J. Turnbull
  2014-08-03 13:27                               ` David Kastrup
@ 2014-08-03 13:40                               ` David Kastrup
  2014-08-03 15:06                                 ` Stephen J. Turnbull
  1 sibling, 1 reply; 51+ messages in thread
From: David Kastrup @ 2014-08-03 13:40 UTC (permalink / raw)
  To: emacs-devel

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> In the case of
>
>     (format "%s%c\n""r#"?\")-!-
>
> it's "r#\"\n".  But for
>
>     (format "%s%c\n""r#"?\"-!-)
>
> you could argue that it's ?\" (that's XEmacs's opinion)

Which is correct according the surrounding syntax.

> or "?\\".  I guess for XEmacs (which already has this syntax in the
> wild) the rule should be "longest match wins" (because otherwise
> there's no way to evaluate r#"?\" in an interactive buffer),

Longest single-sexp match would be r#"?\" since the correct
interpretation "r#"?\" are actually _two_ sexps.  So the "correct"
single sexp match in _this_ example would indeed be the shortest match
?\" here.  Obviously, depending on what transpires before, it is equally
easy to have the longer match be correct.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-03 13:27                               ` David Kastrup
@ 2014-08-03 15:01                                 ` Stephen J. Turnbull
  2014-08-04  1:55                                   ` Richard Stallman
  0 siblings, 1 reply; 51+ messages in thread
From: Stephen J. Turnbull @ 2014-08-03 15:01 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

David Kastrup writes:

 > I don't understand the reason why this should be a deal-killer for Emacs
 > but not for XEmacs.

XEmacs is stuck with it (backward compatibility with user code, as in
practice a lot of users are dependent on 21.5 features that we've
refused to backport, we can't really get away with saying "you knew it
was a beta").  If there's a better alternative (which I'm not sure
there is), Emacs has no backwards compatibility problem, and no XEmacs
compatibility problem either.

 > Is XEmacs going to run into the same problems when ingesting some of
 > Emacs' highlighting/parsing stuff?

Technically, yes.  I don't expect to see a lot of real-world code that
uses rawstrings that end in "\", though, so we can just document this
wart (or document that #r rawstrings that end in "\" have undefined
behavior).  But why should Emacs put up with such a wart?



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-03 13:40                               ` David Kastrup
@ 2014-08-03 15:06                                 ` Stephen J. Turnbull
  0 siblings, 0 replies; 51+ messages in thread
From: Stephen J. Turnbull @ 2014-08-03 15:06 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

David Kastrup writes:

 > Obviously, depending on what transpires before, it is equally easy
 > to have the longer match be correct.

eval-print-last-sexp doesn't (and practically speaking, can't) depend
on the the buffer or line being syntacticly correct in any way --
except for the sexp preceding point.  So I don't think that argument
holds.

Regards,




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-03  6:50                         ` Stephen J. Turnbull
  2014-08-03  7:29                           ` David Kastrup
@ 2014-08-04  1:55                           ` Richard Stallman
  1 sibling, 0 replies; 51+ messages in thread
From: Richard Stallman @ 2014-08-04  1:55 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: dak, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    By catching the parse error when parsing it as a (normal) string, then
    reparsing it as a raw string (ie, running backwards over the
    characters until you hit the second ?"), and check for a leading #r
    (two tokens of lookahead).

That is easier said than done.  People can give it a try.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-03 15:01                                 ` Stephen J. Turnbull
@ 2014-08-04  1:55                                   ` Richard Stallman
  2014-08-04  6:38                                     ` David Kastrup
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Stallman @ 2014-08-04  1:55 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: dak, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

I don't want a wart like this in Emacs.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-04  1:55                                   ` Richard Stallman
@ 2014-08-04  6:38                                     ` David Kastrup
  2014-08-05  1:41                                       ` Richard Stallman
  0 siblings, 1 reply; 51+ messages in thread
From: David Kastrup @ 2014-08-04  6:38 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Stephen J. Turnbull, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> I don't want a wart like this in Emacs.

Well, I am not sure about the size of the wart in practice.  It has not
apparently caused much of a disturbance for XEmacs.  It certainly seems
less relevant in practice than our traditional wart

(info "(emacs) Left Margin Paren")

with regard to reliable detection of strings out of context.  The Elisp
solution of providing a manual "\(" escape sequence does not work for
languages such as Scheme/Guile and various others.

I definitely see a use case for raw strings.  It's also worth noting
that python-mode appears to do a pretty good job finding and
highlighting the various Python raw strings, and those should have
similar problems.  There will probably be outliers like those I
constructed, but I have to admit that I have not run into them yet.

I most certainly have run into the "Left Margin Paren" problem numerous
times, in contrast.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-04  6:38                                     ` David Kastrup
@ 2014-08-05  1:41                                       ` Richard Stallman
  2014-08-05  6:15                                         ` David Kastrup
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Stallman @ 2014-08-05  1:41 UTC (permalink / raw)
  To: David Kastrup; +Cc: stephen, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    Well, I am not sure about the size of the wart in practice.  It has not
    apparently caused much of a disturbance for XEmacs.  It certainly seems
    less relevant in practice than our traditional wart

    (info "(emacs) Left Margin Paren")

    with regard to reliable detection of strings out of context.

That problem is in a different feature (finding the start of a
function), and we recommend a preventive measure to avoid it.  So it
is not a real problem.  In Elisp, it is a solved problem.

But even if it were a real problem, this argument is invalid in form.
The existence of one problem we can't fix does not make it good
to create another.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Raw string literals in Emacs lisp.
  2014-08-05  1:41                                       ` Richard Stallman
@ 2014-08-05  6:15                                         ` David Kastrup
  0 siblings, 0 replies; 51+ messages in thread
From: David Kastrup @ 2014-08-05  6:15 UTC (permalink / raw)
  To: Richard Stallman; +Cc: stephen, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>     Well, I am not sure about the size of the wart in practice.  It has not
>     apparently caused much of a disturbance for XEmacs.  It certainly seems
>     less relevant in practice than our traditional wart
>
>     (info "(emacs) Left Margin Paren")
>
>     with regard to reliable detection of strings out of context.
>
> That problem is in a different feature (finding the start of a
> function), and we recommend a preventive measure to avoid it.

The preventive measure is not working in source buffers other than Elisp
and it requires manual intervention.  M-q seems to avoid _moving_ an
opening parent to the front of the line in strings: that is already a
big help in avoiding them to creep in when reformatting code.
auto-fill-mode however doesn't, so you don't get help against
accidentally introducing them.

> So it is not a real problem.  In Elisp, it is a solved problem.

More like a "problem with known manual workarounds".

> But even if it were a real problem, this argument is invalid in form.
> The existence of one problem we can't fix does not make it good
> to create another.

Sure.  I was just putting it in perspective: in practice the ambiguity
of r#"?\" without leading context is not going to cause anywhere near
the pain users already have to deal with.  I am not saying that this is
a non-problem.  But in contrast to the paren problem, it is a fringe
problem not likely to occur in practice.  So I consider it likely to be
less annoying in its effects to users than a raw string syntax diverging
from that of XEmacs which would basically imply that any portable code
has to forego raw strings completely.

Of course, if Emacs can come up with a significantly better proposal,
there is some likelihood that it will eventually _also_ be adopted by
XEmacs.

But as long as strings and raw strings share the same ending delimiter
and/or the ending delimiter of a raw string has a valid other syntactic
interpretation on its own, the ambiguity will be there.  ASCII does not
offer a wealth of delimiter candidates, and having to write something
like #r"fa\fa d\fd \fd safa"#r would likely be more annoying than the
problem it is supposed to cure.

I am not saying that #r"..." is what we should ultimately take, just
that I don't see the counterargument as weighing all that strongly.  I
actually would likely prefer something like #"..." as input but that's
even more likely to trip up backward parsing.

-- 
David Kastrup



^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-08-05  6:15 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-25 19:47 Raw string literals in Emacs lisp Matthew Plant
2014-07-25 19:56 ` Tassilo Horn
2014-07-25 20:06   ` Matthew Plant
2014-07-25 20:15     ` Tassilo Horn
2014-07-25 20:24       ` Matthew Plant
2014-07-25 20:33 ` Tom Tromey
2014-07-25 21:40   ` Matthew Plant
2014-07-26  1:19 ` Stephen J. Turnbull
2014-07-26  5:28   ` Matthew Plant
2014-07-26  5:45     ` chad
2014-07-26 19:39       ` Matthew Plant
2014-07-27 12:27         ` Stephen J. Turnbull
2014-07-27 13:03           ` David Kastrup
2014-07-27 20:58             ` David Caldwell
2014-07-27 23:17               ` Matthew Plant
2014-07-28 18:27                 ` Richard Stallman
2014-07-28 19:32                   ` Matthew Plant
2014-07-29 19:15                     ` Richard Stallman
2014-07-30  0:26                       ` Matthew Plant
2014-07-30  4:28                         ` Richard Stallman
2014-07-30 18:54                           ` Matthew Plant
2014-07-28  2:16               ` Stephen J. Turnbull
2014-07-28  7:43                 ` Andreas Schwab
2014-07-30 20:28               ` Ted Zlatanov
2014-07-30 20:41                 ` David Caldwell
2014-07-30 20:54                   ` Ted Zlatanov
2014-07-30 21:01                     ` Matthew Plant
2014-07-30 21:16                       ` Ted Zlatanov
2014-07-30 21:19                         ` Matthew Plant
2014-07-31 10:13                           ` Ted Zlatanov
2014-08-02  8:47                     ` Alan Mackenzie
2014-08-02  9:14                       ` David Kastrup
2014-08-02 10:23                         ` Alan Mackenzie
2014-08-02 15:51                           ` Richard Stallman
2014-08-03  6:50                         ` Stephen J. Turnbull
2014-08-03  7:29                           ` David Kastrup
2014-08-03 13:12                             ` Stephen J. Turnbull
2014-08-03 13:27                               ` David Kastrup
2014-08-03 15:01                                 ` Stephen J. Turnbull
2014-08-04  1:55                                   ` Richard Stallman
2014-08-04  6:38                                     ` David Kastrup
2014-08-05  1:41                                       ` Richard Stallman
2014-08-05  6:15                                         ` David Kastrup
2014-08-03 13:40                               ` David Kastrup
2014-08-03 15:06                                 ` Stephen J. Turnbull
2014-08-04  1:55                           ` Richard Stallman
2014-08-02  9:17                     ` Andreas Schwab
2014-07-28  1:29             ` Stephen J. Turnbull
2014-07-26 21:37 ` Thorsten Jolitz
2014-07-29  6:32 ` William Xu
2014-07-29  7:40   ` Andreas Schwab

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).