unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Thorsten Jolitz <tjolitz@gmail.com>
To: emacs-devel@gnu.org
Subject: Re: Raw string literals in Emacs lisp.
Date: Sat, 26 Jul 2014 23:37:42 +0200	[thread overview]
Message-ID: <871tt7hkd5.fsf@gmail.com> (raw)
In-Reply-To: CAMbiG3_eorJe+71ZGaM33w+BqS12izYex4NdD_bMtORqb+x+Vg@mail.gmail.com

Matthew Plant <maplant2@illinois.edu> writes:

> I think that raw string literals would be a really nice thing to add
> to Emacs
> lisp. The most immediate benefit is that writing regexps would be much
> easier.
> And since most of the work that goes into major modes is writing
> regexp, writing
> major modes would become a lot faster.

BTW, I recently wrote a little library called

,----
| drx.el --- declarative dynamic regular expressions
`----

available on github (https://github.com/tj64/drx). 

Its main purpose was enabling one more level of abstraction when writing
(org-mode) regexps, i.e. replace the hardcoded 

,----
| "^" (BOL)
| "$" (EOL)
| "\*" (Org STAR)
`----

in regexps strings like

,----
| "^\\* foo$"
`----

with variables

,----
|  (defvar drx-BOL "^")
|  (defvar drx-EOL "$")
|  (defvar drx-STAR (regexp-quote "*"))
`----

and build regexps with functions calls like
 
,----
| (drx " foo" t t t)
`----

The idea was based on an analysis of what would be needed for a true Org
Minor Mode, i.e. the application of Org's core functionality outside of
the Org major-mode. At the lowest level, the core obstacle is in the
hard-coded regexp snippets spread all over the Org sources that don't
match anymore when the org elements are in comment sections of
programming major-modes.

E.g. this would match 'old-school' headers in emacs-lisp-mode:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^;;")
        (drx-STAR ";"))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^;;; foo$"

and this 'outshine' (outcommented org-mode) headers:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^;; "))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^;; \\* foo$"

and this 'outshine' headers in css-mode:

#+begin_src emacs-lisp 
  (let ((drx-BOL "^/\\* ")
        (drx-EOL "\\*/$"))
    (format "%S" (drx " foo" t t t)))
#+end_src

#+results:
: "^/\\* \\* foo\\*/$"

The idea was rejected by the Org maintainers, but the library does
exist now, and the reason I mention it here is that it makes writing
regexps much faster and easier (with a different approach than rx.el,
the regexps itself are still written as strings, only the plumbing is
done declaratively. 

Here are a few more complex examples from the drx.el test section:

#+begin_src emacs-lisp
(format "%S"
  (let ((drx-BOL "^;;")
        (drx-STAR ";"))
    (drx " foo" t '(2 2) nil)))
#+end_src

#+results:
: "^;;\\(;\\{2\\}\\)\\{2\\} foo"


#+begin_src emacs-lisp
  (format "%S" (drx "foo" t t t t))
#+end_src

#+results:
: "^\\*\\(foo\\)$"

#+begin_src emacs-lisp
  (format "%S" (drx "foo" nil nil nil 'alt "bar"))
#+end_src

#+results:
: "\\(foo\\|bar\\)"


#+begin_src emacs-lisp
  (format "%S" (drx "foo" nil nil nil 'shy "bar"))
#+end_src

#+results:
: "\\(?:foo\\)\\(?:bar\\)"


#+begin_src emacs-lisp
 (format "%S" (drx "foo" t 2 t 'app "\\(bar\\)" "loo"))
#+end_src

#+results:
: "^\\*\\{2\\}\\(foo\\)\\(bar\\)\\(loo\\)$"

#+begin_src emacs-lisp
(format "%S" (drx "foo" t '(t t t) t '(t t t) "bar" "loo"))
#+end_src

#+results:
: "^\\(\\(\\*\\)\\(\\*\\)\\)\\(foo\\(bar\\)\\(loo\\)\\)$"

so even without raw strings, this helps to avoid typing all these
parens and backslashes. By nesting 'drx calls one can create really
complex regexps that contain only a few and simple string literals. 

I don't know (but would be curious to know) how writing regexps this
way would affect a library's execution speed, expecially if the 'drx
calls appear in low level functions that are called all the time. 

PS
For the sake of completeness, here the docstring of `drx':

,----[ C-h f drx RET ]
| drx is a Lisp function in `drx.el'.
| 
| (drx RGXP &optional BOLP STARS EOLP ENCLOSING &rest RGXPS)
| 
| Make regexp combining RGXP and optional RGXPS.
| 
| With BOLP non-nil, add 'drx-BOL' at beginning of regexp, with EOLP
| non-nil add 'drx-EOL' at end of regexp.
| 
| STARS, when non-nil, uses 'drx-STAR' and encloses and repeats it.
| 
| ENCLOSING, when non-nil, takes RGXP and optional RGXPS and combines,
| encloses and repeats them.
| 
| While BOLP and EOLP are switches that don't do nothing when nil and
| insert whatever value 'drx-BOL' and 'drx-EOL' are set to when
| non-nil, both arguments STARS and ENCLOSING take either symbols,
| numbers, strings or (nested) lists as values and act conditional on
| the type.
| 
| All the following 'atomic' argument values are valid for both STARS
| and ENCLOSING but with a slightly different meaning:
| 
| STARS: repeat 'drx-STAR' (without enclosing) conditional on argument
| value
| 
| ENCLOSING: repeat enclosed combination of RGXP and RGXPS conditional
| on argument value
| 
|   - nil :: do nothing (no repeater, no enclosing)
| 
|   - t :: (and any other symbol w/o special meaning) repeat once
| 
|   - n :: (number) repeat n times {n}
| 
|   - "n" :: (number-as-string) repeat n times {n}
| 
|   - "n," :: (string) repeat >= n times {n,}
| 
|   - ",m" :: (string) repeat <= m times {,m}
| 
|   - "n,m" :: (string) repeat n to m times {n,m}
|        
|   - "?" :: (string) repeat with ?
| 
|   - "*" :: (string) repeat with *
| 
|   - "+" :: (string) repeat with +
| 
|   - "??" :: (string) repeat with ??
| 
|   - "*?" :: (string) repeat with *?
| 
|   - "+?" :: (string) repeat with +?
| 
|   - "xyz" :: (any other string) repeat once
| 
| Note that, when used with STARS and ENCLOSING, t almost always
| means 'enclose and repeat once', while 1 and "1" stand for
| 'do not enclose, repeat once' - depending on the context.
| 
| These atomic values can be wrapped in a list and change their
| meaning then. In a list of length 1 they specify 'enclose element
| first, apply repeater then'. In a list of lenght > 1 the specifier
| in the car applies to the combination of all elements, while each of
| the specifiers in the cdr applies to one element only. In the case
| of argument STAR, an element is always 'drx-STAR'. In the case of
| argument ENCLOSING, a non-nil optional argument RGXPS represents the
| list of elements, each of them being a regexp string.
| 
| Here are two calls of 'drx' with interchanged list arguments to
| STARS and ENCLOSING and their return values, demonstrating the
| above:
| 
|   ,------------------------------------------------------------
|   | (drx "foo" t '(nil t (2)) t '(t nil (2))
|   |      "bar" "loo")
|   | "^\(\*\)\(\*\)
| Uses keymap `2\', which is not currently defined.
| \(foobar\(loo\)
| Uses keymap `2\', which is not currently defined.
| \)$"
|   `------------------------------------------------------------
| 
|   ,------------------------------------------------------------
|   | (drx "foo" t '(t nil (2)) t '(nil t (2))
|   |       "bar" "loo")
|   | "^\(\*\(\*\)
| Uses keymap `2\', which is not currently defined.
| \)foo\(bar\)\(loo\)
| Uses keymap `2\', which is not currently defined.
| $"
|   `------------------------------------------------------------

ups, bug in boxquote.el?
should look like this:

  ,------------------------------------------------------------
  | (drx \"foo\" t '(nil t (2)) t '(t nil (2))
  |      \"bar\" \"loo\")
  | \"^\\(\\*\\)\\(\\*\\)\\{2\\}\\(foobar\\(loo\\)\\{2\\}\\)$\"
  `------------------------------------------------------------

  ,------------------------------------------------------------
  | (drx \"foo\" t '(t nil (2)) t '(nil t (2))
  |       \"bar\" \"loo\")
  | \"^\\(\\*\\(\\*\\)\\{2\\}\\)foo\\(bar\\)\\(loo\\)\\{2\\}$\"
  `------------------------------------------------------------

| 
| Many more usage examples with their expected outcome can be found as
| ERT tests in the test-section of drx.el and should be consulted in
| doubt.
| 
| There are a few symbols with special meaning as values of the
| ENCLOSING argument (when used as atomic argument or as car of a list
| argument), namely:
|  
|   - alt :: Concat and enclose RGXP and RGXPS as regexp alternatives.
|            Eventually add drx-BOL/STARS and drx-EOL before
|            first/after last alternative.
| 
|   - grp :: Concat and enclose RGXP and RGXPS. Eventually add
|              drx-BOL, STARS and drx-EOL as first/second/last group.
| 
|   - shy :: Concat and enclose RGXP and RGXPS as shy regexp
|            groups. Eventually add drx-BOL, STARS and drx-EOL as
|            first/second/last group.
| 
|   - app :: like 'grp', but rather append RGXP and RGXPS instead
|               of enclosing them if they are already regexp groups
|               themselves.
| 
| They create regexp groups but don't apply repeaters to them.
| 
| [back]
`----

-- 
cheers,
Thorsten




  parent reply	other threads:[~2014-07-26 21:37 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-25 19:47 Raw string literals in Emacs lisp Matthew Plant
2014-07-25 19:56 ` Tassilo Horn
2014-07-25 20:06   ` Matthew Plant
2014-07-25 20:15     ` Tassilo Horn
2014-07-25 20:24       ` Matthew Plant
2014-07-25 20:33 ` Tom Tromey
2014-07-25 21:40   ` Matthew Plant
2014-07-26  1:19 ` Stephen J. Turnbull
2014-07-26  5:28   ` Matthew Plant
2014-07-26  5:45     ` chad
2014-07-26 19:39       ` Matthew Plant
2014-07-27 12:27         ` Stephen J. Turnbull
2014-07-27 13:03           ` David Kastrup
2014-07-27 20:58             ` David Caldwell
2014-07-27 23:17               ` Matthew Plant
2014-07-28 18:27                 ` Richard Stallman
2014-07-28 19:32                   ` Matthew Plant
2014-07-29 19:15                     ` Richard Stallman
2014-07-30  0:26                       ` Matthew Plant
2014-07-30  4:28                         ` Richard Stallman
2014-07-30 18:54                           ` Matthew Plant
2014-07-28  2:16               ` Stephen J. Turnbull
2014-07-28  7:43                 ` Andreas Schwab
2014-07-30 20:28               ` Ted Zlatanov
2014-07-30 20:41                 ` David Caldwell
2014-07-30 20:54                   ` Ted Zlatanov
2014-07-30 21:01                     ` Matthew Plant
2014-07-30 21:16                       ` Ted Zlatanov
2014-07-30 21:19                         ` Matthew Plant
2014-07-31 10:13                           ` Ted Zlatanov
2014-08-02  8:47                     ` Alan Mackenzie
2014-08-02  9:14                       ` David Kastrup
2014-08-02 10:23                         ` Alan Mackenzie
2014-08-02 15:51                           ` Richard Stallman
2014-08-03  6:50                         ` Stephen J. Turnbull
2014-08-03  7:29                           ` David Kastrup
2014-08-03 13:12                             ` Stephen J. Turnbull
2014-08-03 13:27                               ` David Kastrup
2014-08-03 15:01                                 ` Stephen J. Turnbull
2014-08-04  1:55                                   ` Richard Stallman
2014-08-04  6:38                                     ` David Kastrup
2014-08-05  1:41                                       ` Richard Stallman
2014-08-05  6:15                                         ` David Kastrup
2014-08-03 13:40                               ` David Kastrup
2014-08-03 15:06                                 ` Stephen J. Turnbull
2014-08-04  1:55                           ` Richard Stallman
2014-08-02  9:17                     ` Andreas Schwab
2014-07-28  1:29             ` Stephen J. Turnbull
2014-07-26 21:37 ` Thorsten Jolitz [this message]
2014-07-29  6:32 ` William Xu
2014-07-29  7:40   ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871tt7hkd5.fsf@gmail.com \
    --to=tjolitz@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).