From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Raw string literals in Emacs lisp.
Date: Mon, 28 Jul 2014 11:16:17 +0900
Message-ID: <87zjfumdn2.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAMbiG3_eorJe+71ZGaM33w+BqS12izYex4NdD_bMtORqb+x+Vg@mail.gmail.com>
	<878ungor1v.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAMbiG39qUuq3daUqMbKjDRaakSceU1FhsyhOvNvNqv0wErX1BQ@mail.gmail.com>
	<AECFD120-4664-485C-89AB-B1D367013BB0@gmail.com>
	<CAMbiG3_As8YQpLQec9oMwR4vOytLdG8jff2M5Yy3kitD9zQ5Rw@mail.gmail.com>
	<8761ijng08.fsf@uwakimon.sk.tsukuba.ac.jp>
	<871tt7lzro.fsf@fencepost.gnu.org> <53D567FD.4030708@porkrind.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
X-Trace: ger.gmane.org 1406513806 9791 80.91.229.3 (28 Jul 2014 02:16:46 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 28 Jul 2014 02:16:46 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: David Caldwell <david@porkrind.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jul 28 04:16:39 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1XBaUQ-0007LE-Vy
	for ged-emacs-devel@m.gmane.org; Mon, 28 Jul 2014 04:16:39 +0200
Original-Received: from localhost ([::1]:37265 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1XBaUQ-0004wr-BW
	for ged-emacs-devel@m.gmane.org; Sun, 27 Jul 2014 22:16:38 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60891)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1XBaUG-0004vi-Ma
	for emacs-devel@gnu.org; Sun, 27 Jul 2014 22:16:36 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1XBaU9-0002Pi-55
	for emacs-devel@gnu.org; Sun, 27 Jul 2014 22:16:28 -0400
Original-Received: from mgmt1.sk.tsukuba.ac.jp ([130.158.97.223]:41761)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1XBaU8-0002PL-Jx
	for emacs-devel@gnu.org; Sun, 27 Jul 2014 22:16:21 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mgmt1.sk.tsukuba.ac.jp (Postfix) with ESMTP id 6C1883FA0B00;
	Mon, 28 Jul 2014 11:16:17 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id 5DF7B1A28DD; Mon, 28 Jul 2014 11:16:17 +0900 (JST)
In-Reply-To: <53D567FD.4030708@porkrind.org>
X-Mailer: VM undefined under 21.5  (beta34) "kale" acf1c26e3019 XEmacs Lucid
	(x86_64-unknown-linux)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 130.158.97.223
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:173194
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/173194>

David Caldwell writes:

 > Why not, then, skip rawstrings completely and go directly to a regular
 > expression reader: #r// (or even just #//) instead of #r""?

It's unlispy.  Regular expressions *are* strings and can be
manipulated as strings; (almost) any string can be used as a regular
expression.  Therefore (in Lisp) we normally define separate functions
to deal with "string" use cases and "regexp" uses cases for the same
object.  And they mix and match well:

(defvar xft-xlfd-font-regexp
  (concat
   ;; XLFD specifies ISO 8859-1 encoding, but we can't handle non-ASCII
   ;; in Mule when this function is called.  So use HPC.
   ;; (xe_xlfd_prefix "\\(\\+[\040-\176\240-\377]*\\)?-")
   ;; (xe_xlfd_opt_text "\\([\040-\044\046-\176\240-\377]*\\)")
   ;; (xe_xlfd_text "\\([\040-\044\046-\176\240-\377]+\\)")
   "\\`"
   "\\(\\+[\040-\176]*\\)?-"		; prefix
   "\\([^-]+\\)"			; foundry
   "-"
   "\\([^-]+\\)"			; family
   "-"
   "\\([^-]+\\)"			; weight
   "-"
   "\\([0-9ior?*][iot]?\\)"		; slant
   "-"
   "\\([^-]+\\)"			; swidth
   "-"
   "\\([^-]*\\)"			; adstyle
   "-"
   "\\([0-9?*]+\\|\\[[ 0-9+~.e?*]+\\]\\)"    ; pixelsize
   "-"
   "\\([0-9?*]+\\|\\[[ 0-9+~.e?*]+\\]\\)"    ; pointsize
   "-"
   "\\([0-9?*]+\\)"			; resx
   "-"
   "\\([0-9?*]+\\)"			; resy
   "-"
   "\\([cmp?*]\\)"			; spacing
   "-"
   "~?"					; avgwidth
   "\\([0-9?*]+\\)"
   "-"
   "\\([^-]+\\)"			; registry
   "-"
   "\\([^-]+\\)"			; encoding
   "\\'")
  "The regular expression used to match XLFD font names.")

Of course that would be more readable with rawstrings (not used
because this code is shared with XEmacs 21.4), and even more readable
with PCRE, but it shows we don't really need /x to build regexps
readably.  If #r"..." generated something other than strings, you'd
have to write code to deal with issues like building regexps using
concat.  I think format would be a huge can of worms.

 > This will be just as easy to implement as raw strings.

No, it won't.  Raw strings are just a different read syntax for
strings, and have exactly the same internal representation.  At
present we don't have a regular expression type (although we do have a
compiled regular expression type internally).  If you're not proposing
to define a regular expression type (good luck getting that past
RMS!), then you're just proposing a rawstring syntax tuned for regexp
use.

But there's no reason that couldn't be used for other purposes.  For
example, some people (Python programmers) would probably appreciate a
#r"..."/x rawstring syntax that automatically dedents -- for use in
docstrings.

 > Languages like Javascript, Perl, Ruby, Bash, and Groovy have shown that
 > having a special support for regexps at a language level is a very
 > effective way of dealing with them.

Lisp is not those languages, and in fact it is very unlike those
languages.

 > Plus it opens the door to extensions: #r//p for PCRE/Perl syntax[1]
 > or #r//x for more readable regexps[2], etc.

(defun emacsify-pcre (s)
  "Convert a PCRE to Emacs notation, properly ;-) ignoring unknown backslash."
  ;; exercise for the reader
  )

or

(require 'pcre)                         ; SXEmacs may have implemented this.
(let ((cre (pcre-compile "...")))
  (while (pcre-search-forward cre)
    (do-something)))

and as shown above /x isn't really necessary.  Like it or not, that's
the way these things are done in the Emacs Lisp world.  If you don't
like it, there are languages like Javascript, Perl, Ruby, Bash, and
Groovy.  (Python is too much like Lisp for you, I suspect. ;-)

 > I think using rawstrings is too generic an answer to the problem.

I think using rawstrings is the only sane answer to the problem.  You
can call them "regular expressions" as suggested by the #r notation
and their most prominent application, but in Emacs Lisp representing
them internally as a type other than string would be way too much work
given the idioms we have for constructing regexps that would need to
be reimplemented.  Given that internally they are (Just String), why
specialize to regular expressions?  Would you error on #r/*.*/, which
is invalid syntax for a regular expression?

 > [1] And practically every other language on the planet. Really, it seems
 > like only Emacs is left in the dark ages of basic POSIX regexps where
 > '(' means literal paren and not matching.

Sure, but that's a different problem easily solved if anyone wants to
do it.  GNU grep shows how: use egrep.  (POSIX grep with its default
to basic REs and an argument -E to indicate modern syntax is a bad
example for Lisp, I think.)  The analog for Emacs is a suite of
"pcre-" functions.