From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Brooks Newsgroups: gmane.emacs.devel Subject: Re: "Raw" string literals for elisp Date: Sat, 02 Oct 2021 14:03:57 -0700 Message-ID: <87v92ft9z6.fsf@db48x.net> References: <4209edd83cfee7c84b2d75ebfcd38784fa21b23c.camel@crossproduct.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17314"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) Cc: emacs-devel@gnu.org To: Anna Glasgall Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Oct 02 23:04:52 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mWmBe-0004H2-PS for ged-emacs-devel@m.gmane-mx.org; Sat, 02 Oct 2021 23:04:51 +0200 Original-Received: from localhost ([::1]:46422 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mWmBc-00068P-TK for ged-emacs-devel@m.gmane-mx.org; Sat, 02 Oct 2021 17:04:48 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38872) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mWmAy-0005SA-1M for emacs-devel@gnu.org; Sat, 02 Oct 2021 17:04:08 -0400 Original-Received: from smtp-out-4.mxes.net ([198.205.123.69]:37257) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mWmAw-0007MU-0Z for emacs-devel@gnu.org; Sat, 02 Oct 2021 17:04:07 -0400 Original-Received: from Customer-MUA (mua.mxes.net [10.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 4HMKGQ2H8Bz3c9m; Sat, 2 Oct 2021 17:03:58 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mxes.net; s=mta; t=1633208639; bh=q++Imv3ELZ8R1IRJ5T3/tKw6pGnA5LWIhesV/qajXDc=; h=From:To:Subject:References:Date:In-Reply-To:Message-ID: MIME-Version:Content-Type; b=eghCp5y1gLGsC7+fGxndvz5hR8/CM5P28SjkVOSJpYhvOfcqhvm8A2rWtVEXsm3oR 4yd93Rw1daZsAPEbGeHYTcvxLyN4YEIvxCDpbNbDUTsCAdTEdJRJ0lutbj8AUIt3uF BrwH25ZFKl+VhYwO3DzwjP0YLCb3vTNZguaWSGaU= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGOfPtRkwAAABJQ TFRFpKfbdou67PD6JjJgAwUWXGSeIcyLHgAAAkZJREFUOI1VU8Fy6yAMxLi+Q13fCZ3cnQL3dqTc 7RD+/1feStDXVnXHDuvVSivZTMba2GPdw3gyCGcMAFxTyrTd9dwGoxHiZX9PmRFUHYAQlGGtXY+F Uk0SJOxgJiUEnH1qkitT9D+pQub7qGAmUbR6bu3CvI96Yv6QqkBBMrsyfZccr1/RDXGDTLf4P7ZY glVxe2V+/ACXWO1gvDO9/gDRpFFVmPluvLcmBjd5H6d8DEte+Pbk4rcY/Fa5tLKLOtCZsuQKYhpa LOkYDT7hESya7/WIET3lfQBqX0pwFtbI832Is0ayMUR9B+12xjgPCQ089cfwkCkX6L5TPmRelJTh zMS0Sz1PyjLAMCUWjcmgQLWQMds+e3aaauZDf9dU9A2/8kPVF2odCUoMKHkfjJR+mbgC+DRiycw5 3XSqGe6HmhN/AWjHypkAXOAFW5EiuA1ge2GiZuMb0s1fSEXcATeLUfbyEY2L8yPOmdSsdghQXx3K pz2eoeXuYvMCINVFDrCdNfVUp4eJ6cSEbjbgFjBEvonGGTrgv9cHjAc8aVgSAPoxaONbzfwhDIhR at7IIS7fAGiDSwIA9alhhTBzfA7YM2FY6eMwayrIGK8FDFmshmUA43WqhFtpvoqG9HHaJ7fqtgTz 8EWVkgZgtsylFliHDgk0MB7KAEC45C/rgnGvanNLXyzOeTzcT2nw/N44gfrtYXRQLoz9Q3TgmJRx 2Mx/Q51qzpm+l3m8z2SWBqC5+PZXAtNYlGFf/gKfHfjFkDT4x7od7R+w3Ls+ZdQBuQAAAABJRU5E rkJggg== In-Reply-To: (Anna Glasgall's message of "Wed, 08 Sep 2021 16:40:09 -0400") X-Sent-To: Received-SPF: none client-ip=198.205.123.69; envelope-from=db48x@db48x.net; helo=smtp-out-4.mxes.net X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:276082 Archived-At: Anna Glasgall writes: > Alan (Dr. Mackenzie? Forgive me, not sure what standards are here), > your point about strings ending in \ is very well taken and I'm frankly > not sure what the easiest path forward here is. Having "raw literals > cannot end in a \" is a weird and unpleasant restriction, although the > fact that it is one that Python places on r-strings (to my considerable > surprise; I've been using Python since the mid-00s and have never run > across this particular syntax oddity before) may mean that it is > perhaps not so bad. The C++ concept of allowing r-strings to specify > their own delimiters is perhaps maximally flexible, but is definitely > going to be a heavier lift to implement than any of the above. I'd love > to hear people's opinions on the merits of the various possible > approaches here. I=E2=80=99ve written a little about raw strings on this mailing list. You m= ight read 87zgzqz6mu.fsf@db48x.net, but I can summarize or restate the parts dealing with delimiters. I happen to love Raku=E2=80=99s choice: you can use any matched pair of nonalphanumeric unicode characters. U+2603 SNOWMAN is a perfectly cromulent choice of delimiter as far as Raku is concerned; an example would be q=E2=98=83foo=E2=98=83. Since you can always choose a character th= at will not appear in your string, this essentially eliminates all need for escaping of the delimiter. Raku also lets you use characters that come in left=E2=80= =93 and right=E2=80=93handed versions, as long as you order them correctly. For example q=C2=ABfoo=C2=BB is allowed, while q=C2=BBfoo=C2=AB is not. There a= re unicode properties that allow this to work without enumerating all of the possibilities, making it future=E2=80=93proof. (There are only a couple of = dozen pairs, so enumerating them is not hard either.) Then of course there are languages where the delimiters can be chosen by the programmer but from a much more constrained set of possibilities. C++ and Rust seem like good ones that we could mimic. All of these delimiter styles are quite easy to implement in the reader, but as Alan points out they can cause some complexity in the corresponding language modes: Alan Mackenzie writes: > When implementing the C++ raw strings, that flexibility caused me a lot > of grief. For example, changing text in the middle of a C++ raw string, > I had to check the new text didn't, by chance, form a closing delimiter > matching the opening one. I would recommend not implementing anything > like the C++ raw string identifiers. As such, if we go this route I would recommend Rust=E2=80=93style over C++ = style raw strings. The Rust style is a lot like the C++ style, except that the extra delimiter must be a sequence of # characters, matching on both sides, rather than arbitrary source characters. Modes that want to check for this will have an easier time with Rust=E2=80=93style than C++=E2=80=93= style raw strings. But ultimately I prefer the exuberance and whimsy of Raku=E2=80=99s approach over the more staid and pedestrian approaches taken by C++ and Rust. db48x