From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Change of Lisp syntax for "fancy" quotes in Emacs 27? Date: Sat, 3 Feb 2018 20:51:06 -0800 Organization: UCLA Computer Science Department Message-ID: References: <87shaigcvs.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------A7D3690E7483A2CC2354DBBD" X-Trace: blaine.gmane.org 1517719833 16710 195.159.176.226 (4 Feb 2018 04:50:33 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 4 Feb 2018 04:50:33 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 Cc: Drew Adams To: Aaron Ecay , Noam Postavsky , Emacs developers Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Feb 04 05:50:28 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eiCFy-0003j3-2X for ged-emacs-devel@m.gmane.org; Sun, 04 Feb 2018 05:50:22 +0100 Original-Received: from localhost ([::1]:33591 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eiCHz-0002BY-9J for ged-emacs-devel@m.gmane.org; Sat, 03 Feb 2018 23:52:27 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53569) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eiCGp-00028y-Lj for emacs-devel@gnu.org; Sat, 03 Feb 2018 23:51:18 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eiCGl-0007Lo-Me for emacs-devel@gnu.org; Sat, 03 Feb 2018 23:51:15 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:58156) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eiCGl-0007KJ-AE for emacs-devel@gnu.org; Sat, 03 Feb 2018 23:51:11 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5D8D6161615; Sat, 3 Feb 2018 20:51:08 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id WFfu06DYtLNR; Sat, 3 Feb 2018 20:51:06 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C3341161619; Sat, 3 Feb 2018 20:51:06 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id gkQcbGVBZ0RR; Sat, 3 Feb 2018 20:51:06 -0800 (PST) Original-Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 88DFD161615; Sat, 3 Feb 2018 20:51:06 -0800 (PST) In-Reply-To: <87shaigcvs.fsf@gmail.com> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:222490 Archived-At: This is a multi-part message in MIME format. --------------A7D3690E7483A2CC2354DBBD Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Aaron Ecay wrote: > I was surprised to learn that this is the case, in light of what is > said in the Elisp reference about symbol names Good point; thanks. In the spirit of "be strict about what you generate",= the=20 Emacs printer should escape any character that is not in the list of char= acters=20 documented in the Elisp manual as being safe (i.e., as not requiring esca= ping).=20 This is elementary future-proofing, and is independent of whether we want= Emacs=20 to warn about or disallow confusable chars in symbols. Proposed patches against 'master' attached. The first merely simplifes th= e code=20 without changing its effect. The second fixes a bug in the manual, which=20 incorrectly states that '?' never needs escaping in symbol names. These t= wo=20 patches are routine. (I assume the second one should be applied to emacs2= 6=20 instead of to master.) The third patch changes the Lisp printer to escape characters as suggeste= d above. The fourth patch changes the Lisp printer to escape '?' only at the start= of a=20 symbol. This is nicer for programs using Scheme-style naming conventions = in=20 Emacs Lisp, e.g., 'fooish?' rather than 'fooishp'. I discovered the need = for=20 this patch when I wrote the second patch. --------------A7D3690E7483A2CC2354DBBD Content-Type: text/x-patch; name="0001-Simplify-print_object-a-bit.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-Simplify-print_object-a-bit.patch" =46rom c03b816016f8cc2f15d275e7ad23448366489277 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 3 Feb 2018 20:29:00 -0800 Subject: [PATCH 1/4] Simplify print_object a bit * src/print.c (print_object): Simplify by using C99 constructs, and by taking advantage of the fact that Lisp strings are are followed by null bytes. --- src/print.c | 40 ++++++++++++++++------------------------ 1 file changed, 16 insertions(+), 24 deletions(-) diff --git a/src/print.c b/src/print.c index b3c0f6f..d3eb49d 100644 --- a/src/print.c +++ b/src/print.c @@ -1916,38 +1916,29 @@ print_object (Lisp_Object obj, Lisp_Object printc= harfun, bool escapeflag) case Lisp_Symbol: { bool confusing; - unsigned char *p =3D SDATA (SYMBOL_NAME (obj)); - unsigned char *end =3D p + SBYTES (SYMBOL_NAME (obj)); - int c; - ptrdiff_t i, i_byte; - ptrdiff_t size_byte; - Lisp_Object name; - - name =3D SYMBOL_NAME (obj); - - if (p !=3D end && (*p =3D=3D '-' || *p =3D=3D '+')) p++; - if (p =3D=3D end) - confusing =3D 0; + Lisp_Object name =3D SYMBOL_NAME (obj); + ptrdiff_t size_byte =3D SBYTES (name); + unsigned char *p =3D SDATA (name); + unsigned char *end =3D p + size_byte; + /* If symbol name begins with a digit, and ends with a digit, and contains nothing but digits and `e', it could be treated as a number. So set CONFUSING. =20 - Symbols that contain periods could also be taken as numbers, - but periods are always escaped, so we don't have to worry - about them here. */ - else if (*p >=3D '0' && *p <=3D '9' - && end[-1] >=3D '0' && end[-1] <=3D '9') + Symbols that contain '.' or '#' could also be taken as + numbers, but these are always escaped so don't worry about + them here. */ + if (c_isdigit (p[*p =3D=3D '-' || *p =3D=3D '+']) && c_isdigit (end[-1]= )) { - while (p !=3D end && ((*p >=3D '0' && *p <=3D '9') - /* Needed for \2e10. */ - || *p =3D=3D 'e' || *p =3D=3D 'E')) + /* Check for 'e' too; needed for \2e10. */ + do p++; + while (c_isdigit (*p) || *p =3D=3D 'e' || *p =3D=3D 'E'); + confusing =3D (end =3D=3D p); } else - confusing =3D 0; - - size_byte =3D SBYTES (name); + confusing =3D false; =20 if (! NILP (Vprint_gensym) && !SYMBOL_INTERNED_IN_INITIAL_OBARRAY_P (obj)) @@ -1958,10 +1949,11 @@ print_object (Lisp_Object obj, Lisp_Object printc= harfun, bool escapeflag) break; } =20 - for (i =3D 0, i_byte =3D 0; i_byte < size_byte;) + for (ptrdiff_t i =3D 0, i_byte =3D 0; i_byte < size_byte;) { /* Here, we must convert each multi-byte form to the corresponding character code before handing it to PRINTCHAR. */= + int c; FETCH_STRING_CHAR_ADVANCE (c, name, i, i_byte); maybe_quit (); =20 --=20 2.7.4 --------------A7D3690E7483A2CC2354DBBD Content-Type: text/x-patch; name="0002-Say-needs-escaping-at-start-of-symbol.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0002-Say-needs-escaping-at-start-of-symbol.patch" =46rom 4b945a3fcbf6ff2bde4595dd8b8f472d1b3d17af Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 3 Feb 2018 20:30:21 -0800 Subject: [PATCH 2/4] Say ? needs escaping at start of symbol. * doc/lispref/objects.texi: ? is also special. --- doc/lispref/objects.texi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/lispref/objects.texi b/doc/lispref/objects.texi index af74062..f0420e6 100644 --- a/doc/lispref/objects.texi +++ b/doc/lispref/objects.texi @@ -557,7 +557,8 @@ Symbol Type of the name suffice as long as the name does not look like a number. (If it does, write a @samp{\} at the beginning of the name to force interpretation as a symbol.) The characters @samp{_~!@@$%^&:<>@{@}?} ar= e -less often used but also require no special punctuation. Any other +less often used but also require no special punctuation, except that +@samp{\} must precede @samp{?} at the start of a symbol. Any other characters may be included in a symbol's name by escaping them with a backslash. In contrast to its use in strings, however, a backslash in the name of a symbol simply quotes the single character that follows the= --=20 2.7.4 --------------A7D3690E7483A2CC2354DBBD Content-Type: text/x-patch; name="0003-prin1-etc.-now-escape-more-chars-in-symbols.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0003-prin1-etc.-now-escape-more-chars-in-symbols.patch" =46rom 2add3a1595f709bb071e2b775970038470b2fab2 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 3 Feb 2018 20:30:48 -0800 Subject: [PATCH 3/4] prin1 etc. now escape more chars in symbols Inspired by email from Aaron Ecay in: https://lists.gnu.org/r/emacs-devel/2018-02/msg00125.html * etc/NEWS: Mention this. * src/print.c (print_object): Escape any character that is not documented to not require escaping. --- etc/NEWS | 7 +++++++ src/print.c | 37 +++++++++++++++++++++++++++++++------ 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/etc/NEWS b/etc/NEWS index afd0fba..2a46002 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -87,6 +87,13 @@ regular expression was previously invalid, but is now = accepted: =20 x\{32768\} =20 +** 'print' and related functions now escape more chars in symbols. +They now escape any symbol character that is outside the documented +set of characters that do not need escaping. For example, (print +(intern "n\u0456l")) now outputs "n\=D1=96l" instead of "n=D1=96l", as a= hint to +the reader that the "=D1=96" is not the usual U+0069 LATIN SMALL LETTER = I, +but is instead U+0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I. + =0C * Editing Changes in Emacs 27.1 =20 diff --git a/src/print.c b/src/print.c index d3eb49d..7eca36a 100644 --- a/src/print.c +++ b/src/print.c @@ -1959,12 +1959,37 @@ print_object (Lisp_Object obj, Lisp_Object printc= harfun, bool escapeflag) =20 if (escapeflag) { - if (c =3D=3D '\"' || c =3D=3D '\\' || c =3D=3D '\'' - || c =3D=3D ';' || c =3D=3D '#' || c =3D=3D '(' || c =3D=3D ')' - || c =3D=3D ',' || c =3D=3D '.' || c =3D=3D '`' - || c =3D=3D '[' || c =3D=3D ']' || c =3D=3D '?' || c <=3D 040 - || confusing - || (i =3D=3D 1 && confusable_symbol_character_p (c))) + switch (c) + { + /* The Emacs Lisp manual lists these characters as + not requiring escaping in symbols. Although some + other characters might also work, play it safe + and escape all but these characters. */ + case '!': case '$': case '%': case '&': + case '*': case '-': case '+': case '/': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + case ':': case '<': case '=3D': case '>': case '@': + case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': + case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': + case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': + case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': + case 'Y': case 'Z': + case '^': case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': + case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': + case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': + case 's': case 't': case 'u': case 'v': case 'w': case 'x': + case 'y': case 'z': + case '{': case '}': case '~': + break; + + default: + confusing =3D true; + break; + } + + if (confusing) { printchar ('\\', printcharfun); confusing =3D false; --=20 2.7.4 --------------A7D3690E7483A2CC2354DBBD Content-Type: text/x-patch; name="0004-Escape-only-at-start-of-symbol.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0004-Escape-only-at-start-of-symbol.patch" =46rom 4289ea136de4876b5dfc20d83b5a2556d1b5d8e6 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 3 Feb 2018 20:39:48 -0800 Subject: [PATCH 4/4] Escape ? only at start of symbol * src/print.c (print_object): Do it. --- etc/NEWS | 4 ++++ src/print.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/etc/NEWS b/etc/NEWS index 2a46002..c435136 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -94,6 +94,10 @@ set of characters that do not need escaping. For exam= ple, (print the reader that the "=D1=96" is not the usual U+0069 LATIN SMALL LETTER = I, but is instead U+0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I. =20 +** 'print' etc. no longer escape non-initial '?' in symbols. +For example, the symbol 'list?' is now printed as-is. Initial '?' +is still escaped, e.g., (print (intern "?x")) still outputs "\?x". + =0C * Editing Changes in Emacs 27.1 =20 diff --git a/src/print.c b/src/print.c index 7eca36a..dfd6c50 100644 --- a/src/print.c +++ b/src/print.c @@ -1938,7 +1938,7 @@ print_object (Lisp_Object obj, Lisp_Object printcha= rfun, bool escapeflag) confusing =3D (end =3D=3D p); } else - confusing =3D false; + confusing =3D *p =3D=3D '?'; =20 if (! NILP (Vprint_gensym) && !SYMBOL_INTERNED_IN_INITIAL_OBARRAY_P (obj)) @@ -1969,7 +1969,7 @@ print_object (Lisp_Object obj, Lisp_Object printcha= rfun, bool escapeflag) case '*': case '-': case '+': case '/': case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': - case ':': case '<': case '=3D': case '>': case '@': + case ':': case '<': case '=3D': case '>': case '?': case '@': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': --=20 2.7.4 --------------A7D3690E7483A2CC2354DBBD--