From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Thorsten Jolitz Newsgroups: gmane.emacs.devel Subject: Re: Raw string literals in Emacs lisp. Date: Sat, 26 Jul 2014 23:37:42 +0200 Message-ID: <871tt7hkd5.fsf@gmail.com> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1406410699 16057 80.91.229.3 (26 Jul 2014 21:38:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 26 Jul 2014 21:38:19 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jul 26 23:38:12 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XB9fO-0004dE-Sx for ged-emacs-devel@m.gmane.org; Sat, 26 Jul 2014 23:38:11 +0200 Original-Received: from localhost ([::1]:32940 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XB9fO-00032T-GV for ged-emacs-devel@m.gmane.org; Sat, 26 Jul 2014 17:38:10 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43858) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XB9fG-00032N-2L for emacs-devel@gnu.org; Sat, 26 Jul 2014 17:38:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XB9fA-0001lN-LC for emacs-devel@gnu.org; Sat, 26 Jul 2014 17:38:01 -0400 Original-Received: from plane.gmane.org ([80.91.229.3]:39754) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XB9fA-0001lB-9b for emacs-devel@gnu.org; Sat, 26 Jul 2014 17:37:56 -0400 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1XB9f8-0004SQ-Pj for emacs-devel@gnu.org; Sat, 26 Jul 2014 23:37:54 +0200 Original-Received: from g231227136.adsl.alicedsl.de ([92.231.227.136]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 26 Jul 2014 23:37:54 +0200 Original-Received: from tjolitz by g231227136.adsl.alicedsl.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 26 Jul 2014 23:37:54 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 295 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: g231227136.adsl.alicedsl.de User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) Cancel-Lock: sha1:O7JXAxKnA7H0zQ/iGlzY80G0M2w= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:173169 Archived-At: Matthew Plant writes: > I think that raw string literals would be a really nice thing to add > to Emacs > lisp. The most immediate benefit is that writing regexps would be much > easier. > And since most of the work that goes into major modes is writing > regexp, writing > major modes would become a lot faster. BTW, I recently wrote a little library called ,---- | drx.el --- declarative dynamic regular expressions `---- available on github (https://github.com/tj64/drx). Its main purpose was enabling one more level of abstraction when writing (org-mode) regexps, i.e. replace the hardcoded ,---- | "^" (BOL) | "$" (EOL) | "\*" (Org STAR) `---- in regexps strings like ,---- | "^\\* foo$" `---- with variables ,---- | (defvar drx-BOL "^") | (defvar drx-EOL "$") | (defvar drx-STAR (regexp-quote "*")) `---- and build regexps with functions calls like ,---- | (drx " foo" t t t) `---- The idea was based on an analysis of what would be needed for a true Org Minor Mode, i.e. the application of Org's core functionality outside of the Org major-mode. At the lowest level, the core obstacle is in the hard-coded regexp snippets spread all over the Org sources that don't match anymore when the org elements are in comment sections of programming major-modes. E.g. this would match 'old-school' headers in emacs-lisp-mode: #+begin_src emacs-lisp (let ((drx-BOL "^;;") (drx-STAR ";")) (format "%S" (drx " foo" t t t))) #+end_src #+results: : "^;;; foo$" and this 'outshine' (outcommented org-mode) headers: #+begin_src emacs-lisp (let ((drx-BOL "^;; ")) (format "%S" (drx " foo" t t t))) #+end_src #+results: : "^;; \\* foo$" and this 'outshine' headers in css-mode: #+begin_src emacs-lisp (let ((drx-BOL "^/\\* ") (drx-EOL "\\*/$")) (format "%S" (drx " foo" t t t))) #+end_src #+results: : "^/\\* \\* foo\\*/$" The idea was rejected by the Org maintainers, but the library does exist now, and the reason I mention it here is that it makes writing regexps much faster and easier (with a different approach than rx.el, the regexps itself are still written as strings, only the plumbing is done declaratively. Here are a few more complex examples from the drx.el test section: #+begin_src emacs-lisp (format "%S" (let ((drx-BOL "^;;") (drx-STAR ";")) (drx " foo" t '(2 2) nil))) #+end_src #+results: : "^;;\\(;\\{2\\}\\)\\{2\\} foo" #+begin_src emacs-lisp (format "%S" (drx "foo" t t t t)) #+end_src #+results: : "^\\*\\(foo\\)$" #+begin_src emacs-lisp (format "%S" (drx "foo" nil nil nil 'alt "bar")) #+end_src #+results: : "\\(foo\\|bar\\)" #+begin_src emacs-lisp (format "%S" (drx "foo" nil nil nil 'shy "bar")) #+end_src #+results: : "\\(?:foo\\)\\(?:bar\\)" #+begin_src emacs-lisp (format "%S" (drx "foo" t 2 t 'app "\\(bar\\)" "loo")) #+end_src #+results: : "^\\*\\{2\\}\\(foo\\)\\(bar\\)\\(loo\\)$" #+begin_src emacs-lisp (format "%S" (drx "foo" t '(t t t) t '(t t t) "bar" "loo")) #+end_src #+results: : "^\\(\\(\\*\\)\\(\\*\\)\\)\\(foo\\(bar\\)\\(loo\\)\\)$" so even without raw strings, this helps to avoid typing all these parens and backslashes. By nesting 'drx calls one can create really complex regexps that contain only a few and simple string literals. I don't know (but would be curious to know) how writing regexps this way would affect a library's execution speed, expecially if the 'drx calls appear in low level functions that are called all the time. PS For the sake of completeness, here the docstring of `drx': ,----[ C-h f drx RET ] | drx is a Lisp function in `drx.el'. | | (drx RGXP &optional BOLP STARS EOLP ENCLOSING &rest RGXPS) | | Make regexp combining RGXP and optional RGXPS. | | With BOLP non-nil, add 'drx-BOL' at beginning of regexp, with EOLP | non-nil add 'drx-EOL' at end of regexp. | | STARS, when non-nil, uses 'drx-STAR' and encloses and repeats it. | | ENCLOSING, when non-nil, takes RGXP and optional RGXPS and combines, | encloses and repeats them. | | While BOLP and EOLP are switches that don't do nothing when nil and | insert whatever value 'drx-BOL' and 'drx-EOL' are set to when | non-nil, both arguments STARS and ENCLOSING take either symbols, | numbers, strings or (nested) lists as values and act conditional on | the type. | | All the following 'atomic' argument values are valid for both STARS | and ENCLOSING but with a slightly different meaning: | | STARS: repeat 'drx-STAR' (without enclosing) conditional on argument | value | | ENCLOSING: repeat enclosed combination of RGXP and RGXPS conditional | on argument value | | - nil :: do nothing (no repeater, no enclosing) | | - t :: (and any other symbol w/o special meaning) repeat once | | - n :: (number) repeat n times {n} | | - "n" :: (number-as-string) repeat n times {n} | | - "n," :: (string) repeat >= n times {n,} | | - ",m" :: (string) repeat <= m times {,m} | | - "n,m" :: (string) repeat n to m times {n,m} | | - "?" :: (string) repeat with ? | | - "*" :: (string) repeat with * | | - "+" :: (string) repeat with + | | - "??" :: (string) repeat with ?? | | - "*?" :: (string) repeat with *? | | - "+?" :: (string) repeat with +? | | - "xyz" :: (any other string) repeat once | | Note that, when used with STARS and ENCLOSING, t almost always | means 'enclose and repeat once', while 1 and "1" stand for | 'do not enclose, repeat once' - depending on the context. | | These atomic values can be wrapped in a list and change their | meaning then. In a list of length 1 they specify 'enclose element | first, apply repeater then'. In a list of lenght > 1 the specifier | in the car applies to the combination of all elements, while each of | the specifiers in the cdr applies to one element only. In the case | of argument STAR, an element is always 'drx-STAR'. In the case of | argument ENCLOSING, a non-nil optional argument RGXPS represents the | list of elements, each of them being a regexp string. | | Here are two calls of 'drx' with interchanged list arguments to | STARS and ENCLOSING and their return values, demonstrating the | above: | | ,------------------------------------------------------------ | | (drx "foo" t '(nil t (2)) t '(t nil (2)) | | "bar" "loo") | | "^\(\*\)\(\*\) | Uses keymap `2\', which is not currently defined. | \(foobar\(loo\) | Uses keymap `2\', which is not currently defined. | \)$" | `------------------------------------------------------------ | | ,------------------------------------------------------------ | | (drx "foo" t '(t nil (2)) t '(nil t (2)) | | "bar" "loo") | | "^\(\*\(\*\) | Uses keymap `2\', which is not currently defined. | \)foo\(bar\)\(loo\) | Uses keymap `2\', which is not currently defined. | $" | `------------------------------------------------------------ ups, bug in boxquote.el? should look like this: ,------------------------------------------------------------ | (drx \"foo\" t '(nil t (2)) t '(t nil (2)) | \"bar\" \"loo\") | \"^\\(\\*\\)\\(\\*\\)\\{2\\}\\(foobar\\(loo\\)\\{2\\}\\)$\" `------------------------------------------------------------ ,------------------------------------------------------------ | (drx \"foo\" t '(t nil (2)) t '(nil t (2)) | \"bar\" \"loo\") | \"^\\(\\*\\(\\*\\)\\{2\\}\\)foo\\(bar\\)\\(loo\\)\\{2\\}$\" `------------------------------------------------------------ | | Many more usage examples with their expected outcome can be found as | ERT tests in the test-section of drx.el and should be consulted in | doubt. | | There are a few symbols with special meaning as values of the | ENCLOSING argument (when used as atomic argument or as car of a list | argument), namely: | | - alt :: Concat and enclose RGXP and RGXPS as regexp alternatives. | Eventually add drx-BOL/STARS and drx-EOL before | first/after last alternative. | | - grp :: Concat and enclose RGXP and RGXPS. Eventually add | drx-BOL, STARS and drx-EOL as first/second/last group. | | - shy :: Concat and enclose RGXP and RGXPS as shy regexp | groups. Eventually add drx-BOL, STARS and drx-EOL as | first/second/last group. | | - app :: like 'grp', but rather append RGXP and RGXPS instead | of enclosing them if they are already regexp groups | themselves. | | They create regexp groups but don't apply repeaters to them. | | [back] `---- -- cheers, Thorsten