From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Matthew Plant Newsgroups: gmane.emacs.devel Subject: Re: Raw string literals in Emacs lisp. Date: Sun, 27 Jul 2014 18:17:46 -0500 Message-ID: References: <878ungor1v.fsf@uwakimon.sk.tsukuba.ac.jp> <8761ijng08.fsf@uwakimon.sk.tsukuba.ac.jp> <871tt7lzro.fsf@fencepost.gnu.org> <53D567FD.4030708@porkrind.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a1132f322b0467604ff35039b X-Trace: ger.gmane.org 1406503087 15622 80.91.229.3 (27 Jul 2014 23:18:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 27 Jul 2014 23:18:07 +0000 (UTC) Cc: "emacs-devel@gnu.org" To: David Caldwell Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jul 28 01:18:01 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XBXhZ-0008Hw-JG for ged-emacs-devel@m.gmane.org; Mon, 28 Jul 2014 01:18:01 +0200 Original-Received: from localhost ([::1]:36659 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XBXhY-0001oU-Ud for ged-emacs-devel@m.gmane.org; Sun, 27 Jul 2014 19:18:00 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39148) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XBXhS-0001oN-9k for emacs-devel@gnu.org; Sun, 27 Jul 2014 19:17:58 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XBXhM-0004Qx-VQ for emacs-devel@gnu.org; Sun, 27 Jul 2014 19:17:54 -0400 Original-Received: from mail-lb0-f170.google.com ([209.85.217.170]:43227) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XBXhM-0004Qd-KN for emacs-devel@gnu.org; Sun, 27 Jul 2014 19:17:48 -0400 Original-Received: by mail-lb0-f170.google.com with SMTP id w7so5313162lbi.29 for ; Sun, 27 Jul 2014 16:17:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=rDy4IaXm+6doT8wvMI8cpJxNYAFRMGaIbAj9mCwkw4k=; b=bKWvzJ2viWIxeiKDs+vNPvLlFO7BnL2VJu+tZG2O0xTHSlZf0M6lV4CKsKa3i6LxN2 5cJKFswj+BXZ/XYH2vaZwk4f3yEFmDYc6TNmcYuiJP7UWMEg76e1ki4gsTlJK88NiDW+ K827roeWPF0yQ8Pm7mKrxKEwnjiEXP4Dhm+ctsMN3l+sX3/HbPjnddlvjEFsMBBNwRlN qYIXKnqPBp/xsKG3vvG1SPJHHB90hiS+75kfvFLzXvFJMqMxlSmEpyCTuTjUCXfemNs1 NUlQWsmQoiBni3+8t2geZJd3qR5lYLkaaoZgLi/p4TmNYGa1c4rQgKYrup0rm5yGmixn 7TFA== X-Gm-Message-State: ALoCoQlWQ3E09jAcAAsfNwTNRX6JDde8Tuhp9YkqkjhbDtQNAWpwki4adQ/mEpBEZH8Xv4lui+MS X-Received: by 10.152.9.233 with SMTP id d9mr6087887lab.66.1406503067068; Sun, 27 Jul 2014 16:17:47 -0700 (PDT) Original-Received: by 10.112.185.99 with HTTP; Sun, 27 Jul 2014 16:17:46 -0700 (PDT) In-Reply-To: <53D567FD.4030708@porkrind.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.85.217.170 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:173191 Archived-At: --001a1132f322b0467604ff35039b Content-Type: text/plain; charset=UTF-8 I think this is a very good idea. However, agreeing upon which semantics are needed may prove problematic. Do you have any suggestions on this point? The easiest method would probably just go off some other predefined rules like perl's (but definitely not perl's). -Matt On Sunday, July 27, 2014, David Caldwell wrote: > On 7/27/14 6:03 AM, David Kastrup wrote: > > "Stephen J. Turnbull" > writes: > > > >> Sure, you can do a lot for readability as PCRE or Python regexps have > >> done, but regexps are unreadable almost by design, and those regexp > >> syntaxes benefit from rawstrings, too. Almost anything (that doesn't > >> involve changing the meaning of existing legal programs) that improves > >> readability of regexps is worthwhile. > >> > >> Rawstrings are cheap and effective. > > > > When rawstrings are supported, it becomes more expedient to recognize > > things like \n and \t, probably also \f in regexps (\b is already > > taken). At the current point of time, they just evaluate to n and t. > > That makes input of tabs and newlines in raw strings a nuisance and a > > potential source of errors. > > > > It's not actually an issue with rawstrings as such, but rather of their > > use within regexps. > > Why not, then, skip rawstrings completely and go directly to a regular > expression reader: #r// (or even just #//) instead of #r""? > > Then you can add whatever semantics are needed for good regexp reading > (ie, let '\n', '\t', and others get escaped in the string reading, but > allow '\(' to go through unescaped). This will be just as easy to > implement as raw strings. > > Languages like Javascript, Perl, Ruby, Bash, and Groovy have shown that > having a special support for regexps at a language level is a very > effective way of dealing with them. Plus it opens the door to > extensions: #r//p for PCRE/Perl syntax[1] or #r//x for more readable > regexps[2], etc. > > I think using rawstrings is too generic an answer to the problem. Given > that so much of Emacs's functionality is reliant an regular expressions, > it makes sense to design something specifically for them. Doing that > means they can be tailored and tweaked for maximum functionality without > worrying about possible other usages that people might come up (which > will undoubtedly happen with rawstrings). > > -David > > [1] And practically every other language on the planet. Really, it seems > like only Emacs is left in the dark ages of basic POSIX regexps where > '(' means literal paren and not matching. > > [2] Another Perl feature, it allows whitespace and comments in regexps, > for much improved readability. See http://perldoc.perl.org/perlre.html#/x > > --001a1132f322b0467604ff35039b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I think this is a very good idea. However, agreeing upon which semantics ar= e needed may prove problematic. Do you have any suggestions on this point? = The easiest method would probably just go off some other predefined rules l= ike perl's (but definitely not perl's).

-Matt

On Sunday, July 27, 2014, David Caldw= ell <david@porkrind.org> wr= ote:
On 7/27/14 6:03 AM, David Kastrup wrote:
> "Stephen J. Turnbull" <turnbul= l@sk.tsukuba.ac.jp> writes:
>
>> Sure, you can do a lot for readability as PCRE or Python regexps h= ave
>> done, but regexps are unreadable almost by design, and those regex= p
>> syntaxes benefit from rawstrings, too. =C2=A0Almost anything (that= doesn't
>> involve changing the meaning of existing legal programs) that impr= oves
>> readability of regexps is worthwhile.
>>
>> Rawstrings are cheap and effective.
>
> When rawstrings are supported, it becomes more expedient to recognize<= br> > things like \n and \t, probably also \f in regexps (\b is already
> taken). =C2=A0At the current point of time, they just evaluate to n an= d t.
> That makes input of tabs and newlines in raw strings a nuisance and a<= br> > potential source of errors.
>
> It's not actually an issue with rawstrings as such, but rather of = their
> use within regexps.

Why not, then, skip rawstrings completely and go directly to a regular
expression reader: #r// (or even just #//) instead of #r""?

Then you can add whatever semantics are needed for good regexp reading
(ie, let '\n', '\t', and others get escaped in the string r= eading, but
allow '\(' to go through unescaped). This will be just as easy to implement as raw strings.

Languages like Javascript, Perl, Ruby, Bash, and Groovy have shown that
having a special support for regexps at a language level is a very
effective way of dealing with them. Plus it opens the door to
extensions: #r//p for PCRE/Perl syntax[1] or #r//x for more readable
regexps[2], etc.

I think using rawstrings is too generic an answer to the problem. Given
that so much of Emacs's functionality is reliant an regular expressions= ,
it makes sense to design something specifically for them. Doing that
means they can be tailored and tweaked for maximum functionality without worrying about possible other usages that people might come up (which
will undoubtedly happen with rawstrings).

-David

[1] And practically every other language on the planet. Really, it seems like only Emacs is left in the dark ages of basic POSIX regexps where
'(' means literal paren and not matching.

[2] Another Perl feature, it allows whitespace and comments in regexps,
for much improved readability. See http://perldoc.perl.org/perlre.html#/x

--001a1132f322b0467604ff35039b--