From: Thorsten Jolitz <tjolitz@gmail.com>
To: emacs-devel@gnu.org
Subject: Re: Raw string literals in Emacs lisp.
Date: Sat, 26 Jul 2014 23:37:42 +0200 [thread overview]
Message-ID: <871tt7hkd5.fsf@gmail.com> (raw)
In-Reply-To: CAMbiG3_eorJe+71ZGaM33w+BqS12izYex4NdD_bMtORqb+x+Vg@mail.gmail.com
Matthew Plant <maplant2@illinois.edu> writes:
> I think that raw string literals would be a really nice thing to add
> to Emacs
> lisp. The most immediate benefit is that writing regexps would be much
> easier.
> And since most of the work that goes into major modes is writing
> regexp, writing
> major modes would become a lot faster.
BTW, I recently wrote a little library called
,----
| drx.el --- declarative dynamic regular expressions
`----
available on github (https://github.com/tj64/drx).
Its main purpose was enabling one more level of abstraction when writing
(org-mode) regexps, i.e. replace the hardcoded
,----
| "^" (BOL)
| "$" (EOL)
| "\*" (Org STAR)
`----
in regexps strings like
,----
| "^\\* foo$"
`----
with variables
,----
| (defvar drx-BOL "^")
| (defvar drx-EOL "$")
| (defvar drx-STAR (regexp-quote "*"))
`----
and build regexps with functions calls like
,----
| (drx " foo" t t t)
`----
The idea was based on an analysis of what would be needed for a true Org
Minor Mode, i.e. the application of Org's core functionality outside of
the Org major-mode. At the lowest level, the core obstacle is in the
hard-coded regexp snippets spread all over the Org sources that don't
match anymore when the org elements are in comment sections of
programming major-modes.
E.g. this would match 'old-school' headers in emacs-lisp-mode:
#+begin_src emacs-lisp
(let ((drx-BOL "^;;")
(drx-STAR ";"))
(format "%S" (drx " foo" t t t)))
#+end_src
#+results:
: "^;;; foo$"
and this 'outshine' (outcommented org-mode) headers:
#+begin_src emacs-lisp
(let ((drx-BOL "^;; "))
(format "%S" (drx " foo" t t t)))
#+end_src
#+results:
: "^;; \\* foo$"
and this 'outshine' headers in css-mode:
#+begin_src emacs-lisp
(let ((drx-BOL "^/\\* ")
(drx-EOL "\\*/$"))
(format "%S" (drx " foo" t t t)))
#+end_src
#+results:
: "^/\\* \\* foo\\*/$"
The idea was rejected by the Org maintainers, but the library does
exist now, and the reason I mention it here is that it makes writing
regexps much faster and easier (with a different approach than rx.el,
the regexps itself are still written as strings, only the plumbing is
done declaratively.
Here are a few more complex examples from the drx.el test section:
#+begin_src emacs-lisp
(format "%S"
(let ((drx-BOL "^;;")
(drx-STAR ";"))
(drx " foo" t '(2 2) nil)))
#+end_src
#+results:
: "^;;\\(;\\{2\\}\\)\\{2\\} foo"
#+begin_src emacs-lisp
(format "%S" (drx "foo" t t t t))
#+end_src
#+results:
: "^\\*\\(foo\\)$"
#+begin_src emacs-lisp
(format "%S" (drx "foo" nil nil nil 'alt "bar"))
#+end_src
#+results:
: "\\(foo\\|bar\\)"
#+begin_src emacs-lisp
(format "%S" (drx "foo" nil nil nil 'shy "bar"))
#+end_src
#+results:
: "\\(?:foo\\)\\(?:bar\\)"
#+begin_src emacs-lisp
(format "%S" (drx "foo" t 2 t 'app "\\(bar\\)" "loo"))
#+end_src
#+results:
: "^\\*\\{2\\}\\(foo\\)\\(bar\\)\\(loo\\)$"
#+begin_src emacs-lisp
(format "%S" (drx "foo" t '(t t t) t '(t t t) "bar" "loo"))
#+end_src
#+results:
: "^\\(\\(\\*\\)\\(\\*\\)\\)\\(foo\\(bar\\)\\(loo\\)\\)$"
so even without raw strings, this helps to avoid typing all these
parens and backslashes. By nesting 'drx calls one can create really
complex regexps that contain only a few and simple string literals.
I don't know (but would be curious to know) how writing regexps this
way would affect a library's execution speed, expecially if the 'drx
calls appear in low level functions that are called all the time.
PS
For the sake of completeness, here the docstring of `drx':
,----[ C-h f drx RET ]
| drx is a Lisp function in `drx.el'.
|
| (drx RGXP &optional BOLP STARS EOLP ENCLOSING &rest RGXPS)
|
| Make regexp combining RGXP and optional RGXPS.
|
| With BOLP non-nil, add 'drx-BOL' at beginning of regexp, with EOLP
| non-nil add 'drx-EOL' at end of regexp.
|
| STARS, when non-nil, uses 'drx-STAR' and encloses and repeats it.
|
| ENCLOSING, when non-nil, takes RGXP and optional RGXPS and combines,
| encloses and repeats them.
|
| While BOLP and EOLP are switches that don't do nothing when nil and
| insert whatever value 'drx-BOL' and 'drx-EOL' are set to when
| non-nil, both arguments STARS and ENCLOSING take either symbols,
| numbers, strings or (nested) lists as values and act conditional on
| the type.
|
| All the following 'atomic' argument values are valid for both STARS
| and ENCLOSING but with a slightly different meaning:
|
| STARS: repeat 'drx-STAR' (without enclosing) conditional on argument
| value
|
| ENCLOSING: repeat enclosed combination of RGXP and RGXPS conditional
| on argument value
|
| - nil :: do nothing (no repeater, no enclosing)
|
| - t :: (and any other symbol w/o special meaning) repeat once
|
| - n :: (number) repeat n times {n}
|
| - "n" :: (number-as-string) repeat n times {n}
|
| - "n," :: (string) repeat >= n times {n,}
|
| - ",m" :: (string) repeat <= m times {,m}
|
| - "n,m" :: (string) repeat n to m times {n,m}
|
| - "?" :: (string) repeat with ?
|
| - "*" :: (string) repeat with *
|
| - "+" :: (string) repeat with +
|
| - "??" :: (string) repeat with ??
|
| - "*?" :: (string) repeat with *?
|
| - "+?" :: (string) repeat with +?
|
| - "xyz" :: (any other string) repeat once
|
| Note that, when used with STARS and ENCLOSING, t almost always
| means 'enclose and repeat once', while 1 and "1" stand for
| 'do not enclose, repeat once' - depending on the context.
|
| These atomic values can be wrapped in a list and change their
| meaning then. In a list of length 1 they specify 'enclose element
| first, apply repeater then'. In a list of lenght > 1 the specifier
| in the car applies to the combination of all elements, while each of
| the specifiers in the cdr applies to one element only. In the case
| of argument STAR, an element is always 'drx-STAR'. In the case of
| argument ENCLOSING, a non-nil optional argument RGXPS represents the
| list of elements, each of them being a regexp string.
|
| Here are two calls of 'drx' with interchanged list arguments to
| STARS and ENCLOSING and their return values, demonstrating the
| above:
|
| ,------------------------------------------------------------
| | (drx "foo" t '(nil t (2)) t '(t nil (2))
| | "bar" "loo")
| | "^\(\*\)\(\*\)
| Uses keymap `2\', which is not currently defined.
| \(foobar\(loo\)
| Uses keymap `2\', which is not currently defined.
| \)$"
| `------------------------------------------------------------
|
| ,------------------------------------------------------------
| | (drx "foo" t '(t nil (2)) t '(nil t (2))
| | "bar" "loo")
| | "^\(\*\(\*\)
| Uses keymap `2\', which is not currently defined.
| \)foo\(bar\)\(loo\)
| Uses keymap `2\', which is not currently defined.
| $"
| `------------------------------------------------------------
ups, bug in boxquote.el?
should look like this:
,------------------------------------------------------------
| (drx \"foo\" t '(nil t (2)) t '(t nil (2))
| \"bar\" \"loo\")
| \"^\\(\\*\\)\\(\\*\\)\\{2\\}\\(foobar\\(loo\\)\\{2\\}\\)$\"
`------------------------------------------------------------
,------------------------------------------------------------
| (drx \"foo\" t '(t nil (2)) t '(nil t (2))
| \"bar\" \"loo\")
| \"^\\(\\*\\(\\*\\)\\{2\\}\\)foo\\(bar\\)\\(loo\\)\\{2\\}$\"
`------------------------------------------------------------
|
| Many more usage examples with their expected outcome can be found as
| ERT tests in the test-section of drx.el and should be consulted in
| doubt.
|
| There are a few symbols with special meaning as values of the
| ENCLOSING argument (when used as atomic argument or as car of a list
| argument), namely:
|
| - alt :: Concat and enclose RGXP and RGXPS as regexp alternatives.
| Eventually add drx-BOL/STARS and drx-EOL before
| first/after last alternative.
|
| - grp :: Concat and enclose RGXP and RGXPS. Eventually add
| drx-BOL, STARS and drx-EOL as first/second/last group.
|
| - shy :: Concat and enclose RGXP and RGXPS as shy regexp
| groups. Eventually add drx-BOL, STARS and drx-EOL as
| first/second/last group.
|
| - app :: like 'grp', but rather append RGXP and RGXPS instead
| of enclosing them if they are already regexp groups
| themselves.
|
| They create regexp groups but don't apply repeaters to them.
|
| [back]
`----
--
cheers,
Thorsten
next prev parent reply other threads:[~2014-07-26 21:37 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-25 19:47 Raw string literals in Emacs lisp Matthew Plant
2014-07-25 19:56 ` Tassilo Horn
2014-07-25 20:06 ` Matthew Plant
2014-07-25 20:15 ` Tassilo Horn
2014-07-25 20:24 ` Matthew Plant
2014-07-25 20:33 ` Tom Tromey
2014-07-25 21:40 ` Matthew Plant
2014-07-26 1:19 ` Stephen J. Turnbull
2014-07-26 5:28 ` Matthew Plant
2014-07-26 5:45 ` chad
2014-07-26 19:39 ` Matthew Plant
2014-07-27 12:27 ` Stephen J. Turnbull
2014-07-27 13:03 ` David Kastrup
2014-07-27 20:58 ` David Caldwell
2014-07-27 23:17 ` Matthew Plant
2014-07-28 18:27 ` Richard Stallman
2014-07-28 19:32 ` Matthew Plant
2014-07-29 19:15 ` Richard Stallman
2014-07-30 0:26 ` Matthew Plant
2014-07-30 4:28 ` Richard Stallman
2014-07-30 18:54 ` Matthew Plant
2014-07-28 2:16 ` Stephen J. Turnbull
2014-07-28 7:43 ` Andreas Schwab
2014-07-30 20:28 ` Ted Zlatanov
2014-07-30 20:41 ` David Caldwell
2014-07-30 20:54 ` Ted Zlatanov
2014-07-30 21:01 ` Matthew Plant
2014-07-30 21:16 ` Ted Zlatanov
2014-07-30 21:19 ` Matthew Plant
2014-07-31 10:13 ` Ted Zlatanov
2014-08-02 8:47 ` Alan Mackenzie
2014-08-02 9:14 ` David Kastrup
2014-08-02 10:23 ` Alan Mackenzie
2014-08-02 15:51 ` Richard Stallman
2014-08-03 6:50 ` Stephen J. Turnbull
2014-08-03 7:29 ` David Kastrup
2014-08-03 13:12 ` Stephen J. Turnbull
2014-08-03 13:27 ` David Kastrup
2014-08-03 15:01 ` Stephen J. Turnbull
2014-08-04 1:55 ` Richard Stallman
2014-08-04 6:38 ` David Kastrup
2014-08-05 1:41 ` Richard Stallman
2014-08-05 6:15 ` David Kastrup
2014-08-03 13:40 ` David Kastrup
2014-08-03 15:06 ` Stephen J. Turnbull
2014-08-04 1:55 ` Richard Stallman
2014-08-02 9:17 ` Andreas Schwab
2014-07-28 1:29 ` Stephen J. Turnbull
2014-07-26 21:37 ` Thorsten Jolitz [this message]
2014-07-29 6:32 ` William Xu
2014-07-29 7:40 ` Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871tt7hkd5.fsf@gmail.com \
--to=tjolitz@gmail.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).