From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Raw string literals in Emacs lisp. Date: Mon, 28 Jul 2014 11:16:17 +0900 Message-ID: <87zjfumdn2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <878ungor1v.fsf@uwakimon.sk.tsukuba.ac.jp> <8761ijng08.fsf@uwakimon.sk.tsukuba.ac.jp> <871tt7lzro.fsf@fencepost.gnu.org> <53D567FD.4030708@porkrind.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: ger.gmane.org 1406513806 9791 80.91.229.3 (28 Jul 2014 02:16:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 28 Jul 2014 02:16:46 +0000 (UTC) Cc: emacs-devel@gnu.org To: David Caldwell Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jul 28 04:16:39 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XBaUQ-0007LE-Vy for ged-emacs-devel@m.gmane.org; Mon, 28 Jul 2014 04:16:39 +0200 Original-Received: from localhost ([::1]:37265 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XBaUQ-0004wr-BW for ged-emacs-devel@m.gmane.org; Sun, 27 Jul 2014 22:16:38 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60891) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XBaUG-0004vi-Ma for emacs-devel@gnu.org; Sun, 27 Jul 2014 22:16:36 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XBaU9-0002Pi-55 for emacs-devel@gnu.org; Sun, 27 Jul 2014 22:16:28 -0400 Original-Received: from mgmt1.sk.tsukuba.ac.jp ([130.158.97.223]:41761) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XBaU8-0002PL-Jx for emacs-devel@gnu.org; Sun, 27 Jul 2014 22:16:21 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt1.sk.tsukuba.ac.jp (Postfix) with ESMTP id 6C1883FA0B00; Mon, 28 Jul 2014 11:16:17 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 5DF7B1A28DD; Mon, 28 Jul 2014 11:16:17 +0900 (JST) In-Reply-To: <53D567FD.4030708@porkrind.org> X-Mailer: VM undefined under 21.5 (beta34) "kale" acf1c26e3019 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 130.158.97.223 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:173194 Archived-At: David Caldwell writes: > Why not, then, skip rawstrings completely and go directly to a regular > expression reader: #r// (or even just #//) instead of #r""? It's unlispy. Regular expressions *are* strings and can be manipulated as strings; (almost) any string can be used as a regular expression. Therefore (in Lisp) we normally define separate functions to deal with "string" use cases and "regexp" uses cases for the same object. And they mix and match well: (defvar xft-xlfd-font-regexp (concat ;; XLFD specifies ISO 8859-1 encoding, but we can't handle non-ASCII ;; in Mule when this function is called. So use HPC. ;; (xe_xlfd_prefix "\\(\\+[\040-\176\240-\377]*\\)?-") ;; (xe_xlfd_opt_text "\\([\040-\044\046-\176\240-\377]*\\)") ;; (xe_xlfd_text "\\([\040-\044\046-\176\240-\377]+\\)") "\\`" "\\(\\+[\040-\176]*\\)?-" ; prefix "\\([^-]+\\)" ; foundry "-" "\\([^-]+\\)" ; family "-" "\\([^-]+\\)" ; weight "-" "\\([0-9ior?*][iot]?\\)" ; slant "-" "\\([^-]+\\)" ; swidth "-" "\\([^-]*\\)" ; adstyle "-" "\\([0-9?*]+\\|\\[[ 0-9+~.e?*]+\\]\\)" ; pixelsize "-" "\\([0-9?*]+\\|\\[[ 0-9+~.e?*]+\\]\\)" ; pointsize "-" "\\([0-9?*]+\\)" ; resx "-" "\\([0-9?*]+\\)" ; resy "-" "\\([cmp?*]\\)" ; spacing "-" "~?" ; avgwidth "\\([0-9?*]+\\)" "-" "\\([^-]+\\)" ; registry "-" "\\([^-]+\\)" ; encoding "\\'") "The regular expression used to match XLFD font names.") Of course that would be more readable with rawstrings (not used because this code is shared with XEmacs 21.4), and even more readable with PCRE, but it shows we don't really need /x to build regexps readably. If #r"..." generated something other than strings, you'd have to write code to deal with issues like building regexps using concat. I think format would be a huge can of worms. > This will be just as easy to implement as raw strings. No, it won't. Raw strings are just a different read syntax for strings, and have exactly the same internal representation. At present we don't have a regular expression type (although we do have a compiled regular expression type internally). If you're not proposing to define a regular expression type (good luck getting that past RMS!), then you're just proposing a rawstring syntax tuned for regexp use. But there's no reason that couldn't be used for other purposes. For example, some people (Python programmers) would probably appreciate a #r"..."/x rawstring syntax that automatically dedents -- for use in docstrings. > Languages like Javascript, Perl, Ruby, Bash, and Groovy have shown that > having a special support for regexps at a language level is a very > effective way of dealing with them. Lisp is not those languages, and in fact it is very unlike those languages. > Plus it opens the door to extensions: #r//p for PCRE/Perl syntax[1] > or #r//x for more readable regexps[2], etc. (defun emacsify-pcre (s) "Convert a PCRE to Emacs notation, properly ;-) ignoring unknown backslash." ;; exercise for the reader ) or (require 'pcre) ; SXEmacs may have implemented this. (let ((cre (pcre-compile "..."))) (while (pcre-search-forward cre) (do-something))) and as shown above /x isn't really necessary. Like it or not, that's the way these things are done in the Emacs Lisp world. If you don't like it, there are languages like Javascript, Perl, Ruby, Bash, and Groovy. (Python is too much like Lisp for you, I suspect. ;-) > I think using rawstrings is too generic an answer to the problem. I think using rawstrings is the only sane answer to the problem. You can call them "regular expressions" as suggested by the #r notation and their most prominent application, but in Emacs Lisp representing them internally as a type other than string would be way too much work given the idioms we have for constructing regexps that would need to be reimplemented. Given that internally they are (Just String), why specialize to regular expressions? Would you error on #r/*.*/, which is invalid syntax for a regular expression? > [1] And practically every other language on the planet. Really, it seems > like only Emacs is left in the dark ages of basic POSIX regexps where > '(' means literal paren and not matching. Sure, but that's a different problem easily solved if anyone wants to do it. GNU grep shows how: use egrep. (POSIX grep with its default to basic REs and an argument -E to indicate modern syntax is a bad example for Lisp, I think.) The analog for Emacs is a suite of "pcre-" functions.