From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Brooks Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Interpret #r"..." as a raw string Date: Tue, 02 Mar 2021 01:56:43 -0800 Message-ID: <87blc1khes.fsf@db48x.net> References: <20210227.031857.1351840144740816188.conao3@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12154"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cc: Alan Mackenzie , Naoya Yamashita , emacs-devel@gnu.org To: Matt Armstrong Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Mar 02 10:57:51 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lH1mp-00031r-43 for ged-emacs-devel@m.gmane-mx.org; Tue, 02 Mar 2021 10:57:51 +0100 Original-Received: from localhost ([::1]:48594 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lH1mm-0005px-2Z for ged-emacs-devel@m.gmane-mx.org; Tue, 02 Mar 2021 04:57:49 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46334) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lH1ls-0005Ln-Ph for emacs-devel@gnu.org; Tue, 02 Mar 2021 04:56:52 -0500 Original-Received: from smtp-out-4.mxes.net ([2605:d100:2f:10::315]:17106) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lH1lq-0006dR-6z for emacs-devel@gnu.org; Tue, 02 Mar 2021 04:56:52 -0500 Original-Received: from Customer-MUA (mua.mxes.net [10.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 4DqXbK01bFz3cBr; Tue, 2 Mar 2021 04:56:44 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mxes.net; s=mta; t=1614679006; bh=9dFXcZO0P+3FfqVHW9RaMl2L7E+7cWZDoklYl5CCOAw=; h=From:To:Subject:References:Date:In-Reply-To:Message-ID: MIME-Version:Content-Type; b=Hcf44/CRtYeKlwqkmdeocGJuTCo7YvCfJ/cq9ZOiIuDtHKybnYP5xq+YBVSrbDEIr sE5rfduTPxMBbO7y93cwhcd+wHY6eL8Ln58IgShCAVyauiPXhf1zh0y+W/Q1T1Onq6 bOWrow+akHe6EC7ytsdbT68bMvrPsXrP88KFaKcE= Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGOfPtRkwAAABJQ TFRFpKfbdou67PD6JjJgAwUWXGSeIcyLHgAAAkZJREFUOI1VU8Fy6yAMxLi+Q13fCZ3cnQL3dqTc 7RD+/1feStDXVnXHDuvVSivZTMba2GPdw3gyCGcMAFxTyrTd9dwGoxHiZX9PmRFUHYAQlGGtXY+F Uk0SJOxgJiUEnH1qkitT9D+pQub7qGAmUbR6bu3CvI96Yv6QqkBBMrsyfZccr1/RDXGDTLf4P7ZY glVxe2V+/ACXWO1gvDO9/gDRpFFVmPluvLcmBjd5H6d8DEte+Pbk4rcY/Fa5tLKLOtCZsuQKYhpa LOkYDT7hESya7/WIET3lfQBqX0pwFtbI832Is0ayMUR9B+12xjgPCQ089cfwkCkX6L5TPmRelJTh zMS0Sz1PyjLAMCUWjcmgQLWQMds+e3aaauZDf9dU9A2/8kPVF2odCUoMKHkfjJR+mbgC+DRiycw5 3XSqGe6HmhN/AWjHypkAXOAFW5EiuA1ge2GiZuMb0s1fSEXcATeLUfbyEY2L8yPOmdSsdghQXx3K pz2eoeXuYvMCINVFDrCdNfVUp4eJ6cSEbjbgFjBEvonGGTrgv9cHjAc8aVgSAPoxaONbzfwhDIhR at7IIS7fAGiDSwIA9alhhTBzfA7YM2FY6eMwayrIGK8FDFmshmUA43WqhFtpvoqG9HHaJ7fqtgTz 8EWVkgZgtsylFliHDgk0MB7KAEC45C/rgnGvanNLXyzOeTzcT2nw/N44gfrtYXRQLoz9Q3TgmJRx 2Mx/Q51qzpm+l3m8z2SWBqC5+PZXAtNYlGFf/gKfHfjFkDT4x7od7R+w3Ls+ZdQBuQAAAABJRU5E rkJggg== In-Reply-To: (Matt Armstrong's message of "Mon, 01 Mar 2021 21:59:33 -0800") X-Sent-To: Received-SPF: none client-ip=2605:d100:2f:10::315; envelope-from=db48x@db48x.net; helo=smtp-out-4.mxes.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:265814 Archived-At: Matt Armstrong writes: > Alan Mackenzie writes: > > C++ has probably the most flexible "gold standard" raw string literals. With respect, I think that Raku =E2=80=9Cwins=E2=80=9D this fight. https://docs.raku.org/language/quoting is really worth reading; it's a work of art. You can think of the quote operator as a function that takes 13 named boolean arguments plus a choice of opening and closing delimiters. > As Alan I think rightly points out, this makes the language and all > tools that process the language more complex. This is a high cost, so > the feature should deliver some real value. Certainly true. As the ordinary Lisp string syntax already allows multi-line strings, and interpolation is handled by the format function, the primary benefit is to turn off escaping. We could also offer a choice of opening and closing delimiters, though the proposed code didn't implement that. I think the benefit will be worth it. If we offered a little more choice of delimiters, then we could gain more benefit when the string must also contain double quotes. This need have a large complexity cost. > For those that don't know, C++'s raw string literals can be as imple as > this for the string "raw-content": > > R"(raw-content)" > > But if the content itself contains the character sequence )" then the > programmer can specify any delimiter they want: > > R"DELIMITER(raw-content)"more-raw-content)DELIMITER" > > But as you can see above, it isn't always clearer to write a raw string > literal. I would say that there are four ways to choose the delimiters. The simplest way is just accepting just one specific delimiter, often with no way to include that character in the string. For example, Scala's syntax is raw"foo", but without any form of escaping that will allow a double quote inside the string. C#'s syntax is @"foo", but you can include a double-quote by repeating it, so @"foo""bar" is the string =E2=80=9Dfoo"bar=E2=80=9D. Most languages are in this category, and this is= how the proposed code works. Then there is the sed=E2=86=92perl=E2=86=92raku way, where the parser accep= ts a wide variety of characters as the opening delimiter, and uses it to compute which closing delimiter to look for. Raku allows any character not allowed in identifiers, which is most characters not in the L or N Unicode categories. Sed and Perl just allow punctuation characters. There is the Rust way, where the parser looks for a double-quote proceeded by zero or more #'s. The closing delimiter is a double-quote followed by the same number of #'s. And finally the C++11 way, where it looks for a double-quote followed by zero to sixteen source characters (with a few minor exceptions) followed by an opening parenthesis. The closing delimiter is a closing parenthesis followed by the same zero to sixteen characters in the same order as in the opening delimiter followed by a double-quote character. Of the three, I think Raku's way is the most fun because it allows the widest choice of characters (q=F0=9F=95=B6awesome!=F0=9F=95=B6, for example= ). I'd be fine with the current proposal, but if others think that it is important to allow double-quotes inside the raw string, then I think Rust's syntax is the next logical step. #r##"foo"## would fit in well with the rest of elsip; it won't look as out of place as the others, and it's only a small increment in compexity. Or maybe we want to invent something completely new. As Emacs buffers may include images which are treated as if they were characters of unusual size, perhaps we could use gifs. A string bracketed by a GIF of a dude putting on sunglasses would really show those other languages up. As it's nicer when delimiters are paired, we could allow the closing GIF to be horizontally mirrored so that both dudes are either looking inwards at the string or outwards at the rest of the world. db48x PS: if anyone wants to go the Perl/Raku way, I happen to have built a list of the paired punctuation characters recently: var _PiPf =3D map[rune]rune{ '=C2=AB': '=C2=BB', '=E2=80=98': '=E2=80=99', '=E2=80=9C': '=E2=80=9D', '= =E2=80=B9': '=E2=80=BA', '=E2=B8=82': '=E2=B8=83', '=E2=B8=84': '=E2=B8=85'= , '=E2=B8=89': '=E2=B8=8A', '=E2=B8=8C': '=E2=B8=8D', '=E2=B8=9C': '=E2=B8=9D', '=E2=B8=A0': '=E2=B8= =A1', } var _PsPf =3D map[rune]rune{ '=E2=80=9A': '=E2=80=99', '=E2=80=9E': '=E2=80=9D', } var _PsPe =3D map[rune]rune{ '(': ')', '[': ']', '{': '}', '=E0=BC=BA': '=E0=BC=BB', '=E0=BC=BC': '=E0= =BC=BD', '=E1=9A=9B': '=E1=9A=9C', '=E2=81=85': '=E2=81=86', '=E2=81=BD': '=E2=81=BE', '=E2=82=8D': '=E2=82=8E', '=E2=9D=A8': '=E2=9D= =A9', '=E2=9D=AA': '=E2=9D=AB', '=E2=9D=AC': '=E2=9D=AD', '=E2=9D=AE': '=E2= =9D=AF', '=E2=9D=B0': '=E2=9D=B1', '=E2=9D=B2': '=E2=9D=B3', '=E2=9D=B4': '=E2=9D=B5', '=E2=9F=85': '=E2=9F= =86', '=E2=9F=A6': '=E2=9F=A7', '=E2=9F=A8': '=E2=9F=A9', '=E2=9F=AA': '=E2= =9F=AB', '=E2=A6=83': '=E2=A6=84', '=E2=A6=85': '=E2=A6=86', '=E2=A6=87': '=E2=A6=88', '=E2=A6=89': '=E2=A6= =8A', '=E2=A6=8B': '=E2=A6=8C', '=E2=A6=91': '=E2=A6=92', '=E2=A6=93': '=E2= =A6=94', '=E2=A6=95': '=E2=A6=96', '=E2=A6=97': '=E2=A6=98', '=E2=A7=98': '=E2=A7=99', '=E2=A7=9A': '=E2=A7= =9B', '=E2=A7=BC': '=E2=A7=BD', '=E3=80=88': '=E3=80=89', '=E3=80=8A': '=E3= =80=8B', '=E3=80=8C': '=E3=80=8D', '=E3=80=8E': '=E3=80=8F', '=E3=80=90': '=E3=80= =91', '=E3=80=94': '=E3=80=95', '=E3=80=96': '=E3=80=97', '=E3=80=98': '=E3= =80=99', '=E3=80=9A': '=E3=80=9B', '=E3=80=9D': '=E3=80=9E', '=EF=B8=97': '=EF=B8= =98', '=EF=B8=B5': '=EF=B8=B6', '=EF=B8=B7': '=EF=B8=B8', '=EF=B8=B9': '=EF= =B8=BA', '=EF=B8=BB': '=EF=B8=BC', '=EF=B8=BD': '=EF=B8=BE', '=EF=B8=BF': '=EF=B9= =80', '=EF=B9=81': '=EF=B9=82', '=EF=B9=83': '=EF=B9=84', '=EF=B9=87': '=EF= =B9=88', '=EF=B9=99': '=EF=B9=9A', '=EF=B9=9B': '=EF=B9=9C', '=EF=B9=9D': '=EF=B9= =9E', '=EF=BC=88': '=EF=BC=89', '=EF=BC=BB': '=EF=BC=BD', '=EF=BD=9B': '=EF= =BD=9D', '=EF=BD=9F': '=EF=BD=A0', '=EF=BD=A2': '=EF=BD=A3', '=E2=B8=A8': '=E2=B8= =A9', } var _SmSm =3D map[rune]rune{ '<': '>', } This is obviously written in Go. My source code is at https://github.com/db48x/goparsify/blob/master/literals.go#L298-L322. Feel free to use these tables however you like; I consider them to be a mere listing of facts and as such they're not copyrightable. The basic algorithm that Perl uses is that the delimiter may be any punctuation character, and if the opening delimiter is a key in any of these tables then the closing delimiter is expected to be the corresponding value; otherwise the closing delimiter is expected to be identical to the opening delimiter. Raku is similar, execept that it allows any unicode character that isn't designated as belonging to identifiers rather than just punctuation. For speed you'll obviously prefer to do a single lookup into one hash table, but for organizational purposes it's nicer to have them grouped by unicode category. This will help you update them when new characters are added in the future.