From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Naoya Yamashita Newsgroups: gmane.emacs.devel Subject: [PATCH] Interpret #r"..." as a raw string Date: Sat, 27 Feb 2021 03:18:57 +0900 (JST) Message-ID: <20210227.031857.1351840144740816188.conao3@gmail.com> Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Sat_Feb_27_03_18_57_2021_399)--" Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39489"; mail-complaints-to="usenet@ciao.gmane.io" To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Feb 26 19:21:42 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lFhkE-000A8k-Fc for ged-emacs-devel@m.gmane-mx.org; Fri, 26 Feb 2021 19:21:42 +0100 Original-Received: from localhost ([::1]:58466 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lFhkD-0007Nj-GG for ged-emacs-devel@m.gmane-mx.org; Fri, 26 Feb 2021 13:21:41 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43804) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lFhhs-0005LY-5S for emacs-devel@gnu.org; Fri, 26 Feb 2021 13:19:16 -0500 Original-Received: from mail-pf1-x433.google.com ([2607:f8b0:4864:20::433]:46970) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lFhhp-0002bV-OH for emacs-devel@gnu.org; Fri, 26 Feb 2021 13:19:15 -0500 Original-Received: by mail-pf1-x433.google.com with SMTP id r5so6766891pfh.13 for ; Fri, 26 Feb 2021 10:19:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:message-id:to:subject:from:mime-version :content-transfer-encoding; bh=mwyxZecSjUE48RP4wUGixBW6gjYoBXV8PChwfHV6d+Q=; b=dAZASYNHP63S2koRQDgwwiOxwF5QFLitWTtu7wZyVDtkPeTo65nJfxa9z2ZZw6W2dJ 7stbkoathpMHlnxBnKRn/LInf0lEICIqDBzPN44XhvHxvRknb5J+/AqnS0oQQWERcbOm H11LtCJIFz3owQy9JbkgvXVr2bfPvfO8h77d4gA2DIjgpq0JV3XZl3XfFnCFDJc1Q/3J 7X7xht9UO6SqHH4mudQVxU3gUKb99UYHOoTTTyrE8UjK6hemD3zrYZr6UAg2gqK1+e3Z OJMLNGKEZB9PA2p+9fbUqGNwY40Q+3XgVrSj0GvK7ml+QuMkmhtNLEFYY4UBvMV4Cv4a UP6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:to:subject:from:mime-version :content-transfer-encoding; bh=mwyxZecSjUE48RP4wUGixBW6gjYoBXV8PChwfHV6d+Q=; b=cbvAG/JRzi04M9pmb1zC6o515SdMvfEn062Nzw3cS8haycDoEjfg9461UHClFDHUSv B+x7S2fCVROPGZCAcAB8xJ5ZqOh1ic/MC6VEXCWAtfu6Q5XxaDSlSV/GMJhp0BXZ1mYZ tKe6NljTfXFKdW0K1tLohIV8f0cKEZx75Qm+0v5YBP5Mr0tYZquZmVS2vxrJlB8ifdbj nTGTNRtu9l83EwZhAKAqYpVRGAeYuGWWBnvz8HHbaukzXRUxdKbkxTAhGBw2vXp8NQVs XTiQxDxOJOv8wlUO8tmyCVNMIdHwsTqUH+Ygs6dhM0WVt+5/lZHh2zsZy5FHmnOP1MZT bqxg== X-Gm-Message-State: AOAM530tw4L+vaspKpC/XtTN995FGNixWJ1mxE7j4/RKymUekF5rC4u6 KaK/su6ASROwo7pc3tiiCMJeip6gbGyS8w== X-Google-Smtp-Source: ABdhPJwvh5/DTPPJrkyd5SRCY/AZLzBGJhbFr4QxBIaxil6NYb4fOKGDKur7lgKLXTjXPpSkRNbH8g== X-Received: by 2002:a05:6a00:1393:b029:1b4:7938:ff1d with SMTP id t19-20020a056a001393b02901b47938ff1dmr4438951pfg.31.1614363551056; Fri, 26 Feb 2021 10:19:11 -0800 (PST) Original-Received: from localhost (p210141-ipngn200407niho.hiroshima.ocn.ne.jp. [118.4.79.141]) by smtp.gmail.com with ESMTPSA id z2sm10297012pfc.8.2021.02.26.10.19.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Feb 2021 10:19:10 -0800 (PST) X-Mailer: Mew version 6.8 on Emacs 27.1 Received-SPF: pass client-ip=2607:f8b0:4864:20::433; envelope-from=conao3@gmail.com; helo=mail-pf1-x433.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:265686 Archived-At: ----Next_Part(Sat_Feb_27_03_18_57_2021_399)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, all. I write a patch to allow Emacs reader interpret raw string. As you know, we already has some special marker using `#` to make Emacs reader work in a special way. For example, we have `#[` to indicate byte-compiled object and `#s(` to indicate hash-table. I introduce raw string using this architecture, if users put `#r` before string, Emacs reader interpret it as a raw string. Many programming language has a Raw string feature[^1], so I want to use raw string in Emacs-lisp. To see more concrete example, please see the attached patch testcases. ^1: https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(strings)#Quoted_raw Regards, Naoya ----Next_Part(Sat_Feb_27_03_18_57_2021_399)-- Content-Type: Text/X-Patch; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="0001-Interpret-r-.-as-a-raw-string.patch" >From 649c6f9c8aa994b992f3353d2ad373461ed24d15 Mon Sep 17 00:00:00 2001 From: Naoya Yamashita Date: Sat, 27 Feb 2021 02:55:19 +0900 Subject: [PATCH] Interpret #r"..." as a raw string * src/lread.c (read1): Add new reader symbol, #r", indicates raw string * test/src/lread-tests.el (lread-raw-string-1, lread-raw-string-2, lread-raw-string-usage-1, lread-raw-string-usage-2): Add testcases --- src/lread.c | 67 +++++++++++++++++++++++++++++++++++++++++ test/src/lread-tests.el | 36 ++++++++++++++++++++++ 2 files changed, 103 insertions(+) diff --git a/src/lread.c b/src/lread.c index dea1b232ff..d2d7eee407 100644 --- a/src/lread.c +++ b/src/lread.c @@ -2835,6 +2835,73 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list) case '#': c = READCHAR; + if (c == 'r') + { + c = READCHAR; + if (c == '"') + { + ptrdiff_t count = SPECPDL_INDEX (); + char *read_buffer = stackbuf; + ptrdiff_t read_buffer_size = sizeof stackbuf; + char *heapbuf = NULL; + char *p = read_buffer; + char *end = read_buffer + read_buffer_size; + int ch; + /* True if we saw an escape sequence specifying + a multibyte character. */ + bool force_multibyte = false; + /* True if we saw an escape sequence specifying + a single-byte character. */ + bool force_singlebyte = false; + bool cancel = false; + ptrdiff_t nchars = 0; + + while ((ch = READCHAR) >= 0 + && ch != '\"') + { + if (end - p < MAX_MULTIBYTE_LENGTH) + { + ptrdiff_t offset = p - read_buffer; + read_buffer = grow_read_buffer (read_buffer, offset, + &heapbuf, &read_buffer_size, + count); + p = read_buffer + offset; + end = read_buffer + read_buffer_size; + } + + p += CHAR_STRING (ch, (unsigned char *) p); + if (CHAR_BYTE8_P (ch)) + force_singlebyte = true; + else if (! ASCII_CHAR_P (ch)) + force_multibyte = true; + nchars++; + } + + if (ch < 0) + end_of_file_error (); + + /* If purifying, and string starts with \ newline, + return zero instead. This is for doc strings + that we are really going to find in etc/DOC.nn.nn. */ + if (!NILP (Vpurify_flag) && NILP (Vdoc_file_name) && cancel) + return unbind_to (count, make_fixnum (0)); + + if (! force_multibyte && force_singlebyte) + { + /* READ_BUFFER contains raw 8-bit bytes and no multibyte + forms. Convert it to unibyte. */ + nchars = str_as_unibyte ((unsigned char *) read_buffer, + p - read_buffer); + p = read_buffer + nchars; + } + + Lisp_Object result + = make_specified_string (read_buffer, nchars, p - read_buffer, + (force_multibyte + || (p - read_buffer != nchars))); + return unbind_to (count, result); + } + } if (c == 's') { c = READCHAR; diff --git a/test/src/lread-tests.el b/test/src/lread-tests.el index f2a60bcf32..4357c27ee0 100644 --- a/test/src/lread-tests.el +++ b/test/src/lread-tests.el @@ -28,6 +28,42 @@ (require 'ert) (require 'ert-x) +(ert-deftest lread-raw-string-1 () + (should (string-equal + (read "#r\"\\(?:def\\(?:macro\\|un\\)\\)\"") + "\\(?:def\\(?:macro\\|un\\)\\)"))) + +(ert-deftest lread-raw-string-2 () + (should (string-equal + (read "#r\"\\n\"") + "\\n"))) + +(ert-deftest lread-raw-string-usage-1 () + (should (equal + (let ((str "(defmacro leaf () nil)")) + (string-match "(\\(def\\(?:macro\\|un\\)\\) \\([^ ]+\\)" str) + (list (match-string 1 str) (match-string 2 str))) + '("defmacro" "leaf"))) + + (should (equal + (let ((str "(defmacro leaf () nil)")) + (string-match #r"(\(def\(?:macro\|un\)\) \([^ ]+\)" str) + (list (match-string 1 str) (match-string 2 str))) + '("defmacro" "leaf")))) + +(ert-deftest lread-raw-string-usage-2 () + (should (equal + (let ((str "(def\\macro leaf () nil)")) + (string-match "(\\(def\\\\macro\\) \\([^ ]+\\)" str) + (list (match-string 1 str) (match-string 2 str))) + '("def\\macro" "leaf"))) + + (should (equal + (let ((str "(def\\macro leaf () nil)")) + (string-match #r"(\(def\macro\) \([^ ]+\)" str) + (list (match-string 1 str) (match-string 2 str))) + '("def\\macro" "leaf")))) + (ert-deftest lread-char-number () (should (equal (read "?\\N{U+A817}") #xA817))) -- 2.30.1 ----Next_Part(Sat_Feb_27_03_18_57_2021_399)----