From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Brooks Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Interpret #r"..." as a raw string Date: Fri, 26 Feb 2021 16:39:05 -0800 Message-ID: <87zgzqz6mu.fsf@db48x.net> References: <20210227.031857.1351840144740816188.conao3@gmail.com> <83pn0mppjd.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12607"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cc: Naoya Yamashita , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Feb 27 01:40:46 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lFnf4-00038P-LF for ged-emacs-devel@m.gmane-mx.org; Sat, 27 Feb 2021 01:40:46 +0100 Original-Received: from localhost ([::1]:49998 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lFnf3-0007eS-Kb for ged-emacs-devel@m.gmane-mx.org; Fri, 26 Feb 2021 19:40:45 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:39544) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lFndc-0007AW-Mv for emacs-devel@gnu.org; Fri, 26 Feb 2021 19:39:16 -0500 Original-Received: from smtp-out-4.mxes.net ([198.205.123.69]:36229) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lFndZ-00077g-J9 for emacs-devel@gnu.org; Fri, 26 Feb 2021 19:39:16 -0500 Original-Received: from Customer-MUA (mua.mxes.net [10.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 4DnSMJ0F9qz3c9s; Fri, 26 Feb 2021 19:39:07 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mxes.net; s=mta; t=1614386349; bh=n4yHOysrjkdvLcnecOPTURd9ppXpB0hdb5NXc+N4X3U=; h=From:To:Subject:References:Date:In-Reply-To:Message-ID: MIME-Version:Content-Type; b=xIbwvDTC4WGxgX3BZp3O+T9cp/Z31fQpzgQhubcLNgCsPQ0JVcPfeubHIHWxYY5um 2PGP+2JryiyhNbNob7NlOIKrXWABTjNW+yhqIRAtPm5ZCqzqhfmNiNuya0LonxVMRR mSWCiE3P/TPE9Ce0Iqee9+OBoNrwiJcNUtifLYEA= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGOfPtRkwAAABJQ TFRFpKfbdou67PD6JjJgAwUWXGSeIcyLHgAAAkZJREFUOI1VU8Fy6yAMxLi+Q13fCZ3cnQL3dqTc 7RD+/1feStDXVnXHDuvVSivZTMba2GPdw3gyCGcMAFxTyrTd9dwGoxHiZX9PmRFUHYAQlGGtXY+F Uk0SJOxgJiUEnH1qkitT9D+pQub7qGAmUbR6bu3CvI96Yv6QqkBBMrsyfZccr1/RDXGDTLf4P7ZY glVxe2V+/ACXWO1gvDO9/gDRpFFVmPluvLcmBjd5H6d8DEte+Pbk4rcY/Fa5tLKLOtCZsuQKYhpa LOkYDT7hESya7/WIET3lfQBqX0pwFtbI832Is0ayMUR9B+12xjgPCQ089cfwkCkX6L5TPmRelJTh zMS0Sz1PyjLAMCUWjcmgQLWQMds+e3aaauZDf9dU9A2/8kPVF2odCUoMKHkfjJR+mbgC+DRiycw5 3XSqGe6HmhN/AWjHypkAXOAFW5EiuA1ge2GiZuMb0s1fSEXcATeLUfbyEY2L8yPOmdSsdghQXx3K pz2eoeXuYvMCINVFDrCdNfVUp4eJ6cSEbjbgFjBEvonGGTrgv9cHjAc8aVgSAPoxaONbzfwhDIhR at7IIS7fAGiDSwIA9alhhTBzfA7YM2FY6eMwayrIGK8FDFmshmUA43WqhFtpvoqG9HHaJ7fqtgTz 8EWVkgZgtsylFliHDgk0MB7KAEC45C/rgnGvanNLXyzOeTzcT2nw/N44gfrtYXRQLoz9Q3TgmJRx 2Mx/Q51qzpm+l3m8z2SWBqC5+PZXAtNYlGFf/gKfHfjFkDT4x7od7R+w3Ls+ZdQBuQAAAABJRU5E rkJggg== In-Reply-To: <83pn0mppjd.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 26 Feb 2021 22:00:54 +0200") X-Sent-To: Received-SPF: none client-ip=198.205.123.69; envelope-from=db48x@db48x.net; helo=smtp-out-4.mxes.net X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:265709 Archived-At: Eli Zaretskii writes: >> Date: Sat, 27 Feb 2021 03:18:57 +0900 (JST) >> From: Naoya Yamashita >>=20 >> I write a patch to allow Emacs reader interpret raw string. > > What is a "raw string", and how does it differ from regular Lisp > strings? > > Thanks. Many languages have multiple string types because they simplify the process of writing strings that contain quotation characters, backslashes, or other syntax such as interpolation. Think of sh, where double=E2=80=93quoted strings allow substitutions, while single=E2=80=93quoted strings do not. The single=E2=80=93quoted strings are= similar to raw strings. Or Perl, where similar but more complex rules apply, including strings that look like q{foo} and can be delimited by any punctuation characters. Or Raku, which allows unicode punctuation as delimiters such as q=C2=ABfoo=C2=BB. Or Rust, where r"foo" is a raw string = that can be delimited not just by double quotes, but also double quotes plus an arbitrary number of # characters. For example, suppose I am writing a shell script and I want to print out an html anchor: echo "click here for an example" vs: echo 'click here for an example' The single=E2=80=93quoted string is nicer because I don=E2=80=99t have to e= scape the quotes. Of course, HTML also allows me to use single quotes in place of double quotes (and with no change of the semantics of the HTML), so changing them would also be an option. Perhaps an even better example would be a shell script that emits elisp, where strings must be double=E2=80=93quoted. Of course the primary difference between single=E2=80=93 and double=E2=80= =93quoted strings in Shell and Perl is interpolation, rather than escape characters. In Raku this is extended so that there are half a dozen different features that can be independently turned on or off for any given quoted item. Q"foo" is a raw string. q"foo" adds the backslash escape mechanism for concisely representing various characters such as tabs, newlines, and so on. qq"foo" adds interpolation on top of escaping. qw"foo bar" and qqw"foo bar" add word splitting, so that you get not a single string but a list of the words in the string. qx"foo" is like the backtick syntax in Shell; it runs the quoted item in a subshell. qqx"foo" does interpolation on it before running it in the subshell. Heredocs allow for multiline strings. All of these forms allow you to use arbitrary punctuation characters as delimiters. Then there is a whole thing with adjectives where you can pick and choose those features using an even more uniform syntax. And finally regexes are yet more fun on top of all of that. Raku even has an unquoting mechanism that is rather similar to the lisp unquote; it allows the nesting of different string types. Most languages don=E2=80=99t go to this extreme, but in languages that have= raw strings they are a way to turn off complicated features that you don=E2=80= =99t want to use in every instance. As written, Naoya=E2=80=99s raw string patch allows the user to turn off st= ring escaping, but not to chose alternative delimiters (which has little or no precedent in elisp) or to turn off string interpolation (which isn=E2=80= =99t built in to the elisp syntax, but is instead implemented by library functions such as format.) Naoya, your patch looks fairly good to my unpractised eye, but you might consider adding an error message for malformed expressions such as #r'foo', where the character after the r isn=E2=80=99t a double quote chara= cter. Probably best to start thinking about how to document the syntax in the elisp manual too. Personally, I quite like the idea. Raw strings are useful for a lot more than just regular expressions. db48x