bug#53260: char-syntax differs in interpreter and bytecode [PATCH]

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: "Mattias Engdegård" <mattiase@acm.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, 53260@debbugs.gnu.org
Subject: bug#53260: char-syntax differs in interpreter and bytecode [PATCH]
Date: Sun, 16 Jan 2022 12:04:51 +0100	[thread overview]
Message-ID: <AE02E5D0-BB58-4112-8DC6-A1B14CD35B11@acm.org> (raw)
In-Reply-To: <jwv5yqkipje.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 465 bytes --]

15 jan. 2022 kl. 23.51 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> Doesn't sound right: char tables are indexed by chars (i.e. Unicode code
> points) not by bytes, so we need to convert the byte into a char
> before indexing.

Sure, I'm happy to do it either way. Chars retrieved from unibyte buffers or strings really should be converted to multibyte before used with char-syntax; unibyte buffers are not very common but strings slightly more so.


[-- Attachment #2: 0001-Fix-Fchar_syntax-for-non-ASCII-in-unibyte-buffers.patch --]
[-- Type: application/octet-stream, Size: 3163 bytes --]

From 2adb5c862232abf126f73d0aa514f5ae8b6babba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Sun, 16 Jan 2022 11:58:00 +0100
Subject: [PATCH] Fix Fchar_syntax for non-ASCII in unibyte buffers

Fchar_syntax did not convert unibyte characters to multibyte when the
current buffer was unibyte, in contrast to `char-syntax` in
byte-compiled code (bug#53260).

* src/bytecode.c (exec_byte_code): Call out to Fchar_syntax;
the dynamic frequency is too low to justify inlining here, and it
did lead to implementations diverging.
* src/syntax.c (Fchar_syntax): Convert non-ASCII unibyte values to
multibyte.  Remove useless SETUP_BUFFER_SYNTAX_TABLE which has no
effect here.
* test/src/syntax-tests.el (syntax-char-syntax): New test.
---
 src/bytecode.c           |  8 +-------
 src/syntax.c             |  6 +++---
 test/src/syntax-tests.el | 15 +++++++++++++++
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/src/bytecode.c b/src/bytecode.c
index 472992be18..b7e65d05ae 100644
--- a/src/bytecode.c
+++ b/src/bytecode.c
@@ -1167,13 +1167,7 @@ #define DEFINE(name, value) LABEL (name) ,
 	  NEXT;
 
 	CASE (Bchar_syntax):
-	  {
-	    CHECK_CHARACTER (TOP);
-	    int c = XFIXNAT (TOP);
-	    if (NILP (BVAR (current_buffer, enable_multibyte_characters)))
-	      c = make_char_multibyte (c);
-	    XSETFASTINT (TOP, syntax_code_spec[SYNTAX (c)]);
-	  }
+	  TOP = Fchar_syntax (TOP);
 	  NEXT;
 
 	CASE (Bbuffer_substring):
diff --git a/src/syntax.c b/src/syntax.c
index 9df878b8ed..c1e81dfa47 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1101,10 +1101,10 @@ DEFUN ("char-syntax", Fchar_syntax, Schar_syntax, 1, 1, 0,
 `syntax-after' instead.  */)
   (Lisp_Object character)
 {
-  int char_int;
   CHECK_CHARACTER (character);
-  char_int = XFIXNUM (character);
-  SETUP_BUFFER_SYNTAX_TABLE ();
+  int char_int = XFIXNAT (character);
+  if (NILP (BVAR (current_buffer, enable_multibyte_characters)))
+    char_int = make_char_multibyte (char_int);
   return make_fixnum (syntax_code_spec[SYNTAX (char_int)]);
 }
 
diff --git a/test/src/syntax-tests.el b/test/src/syntax-tests.el
index 3b9f21cde3..501b5e067f 100644
--- a/test/src/syntax-tests.el
+++ b/test/src/syntax-tests.el
@@ -506,4 +506,19 @@ test-from-to-parse-partial-sexp
     (should (parse-partial-sexp 1 1))
     (should-error (parse-partial-sexp 2 1))))
 
+(ert-deftest syntax-char-syntax ()
+  ;; Verify that char-syntax behaves identically in interpreted and
+  ;; byte-compiled code (bug#53260).
+  (let ((cs (byte-compile (lambda (x) (char-syntax x)))))
+    ;; Use a unibyte buffer with a syntax table using symbol syntax
+    ;; for raw byte 128.
+    (with-temp-buffer
+      (set-buffer-multibyte nil)
+      (let ((st (make-syntax-table)))
+        (modify-syntax-entry (unibyte-char-to-multibyte 128) "_" st)
+        (set-syntax-table st)
+        (should (equal (char-syntax 128) ?_))
+        (should (equal (funcall cs 128) ?_))))
+    (list (char-syntax 128) (funcall cs 128))))
+
 ;;; syntax-tests.el ends here
-- 
2.32.0 (Apple Git-132)

next prev parent reply	other threads:[~2022-01-16 11:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 16:43 bug#53260: char-syntax differs in interpreter and bytecode Mattias Engdegård
2022-01-15  8:36 ` Lars Ingebrigtsen
2022-01-15 14:46   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-01-15 17:29     ` Mattias Engdegård
2022-01-15 17:57       ` Eli Zaretskii
2022-01-15 22:51       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-01-16 11:04         ` Mattias Engdegård [this message]
2022-01-20  9:30           ` bug#53260: char-syntax differs in interpreter and bytecode [PATCH] Lars Ingebrigtsen
2022-01-20 10:47             ` Mattias Engdegård

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:472992be1 dfblob:b7e65d05a dfblob:9df878b8e dfblob:c1e81dfa4
dfblob:3b9f21cde dfblob:501b5e067 )
 OR (
bs:"Fix Fchar_syntax for non-ASCII in unibyte buffers" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AE02E5D0-BB58-4112-8DC6-A1B14CD35B11@acm.org \
    --to=mattiase@acm.org \
    --cc=53260@debbugs.gnu.org \
    --cc=larsi@gnus.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.