From: "Mattias Engdegård" <mattiase@acm.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, 53260@debbugs.gnu.org
Subject: bug#53260: char-syntax differs in interpreter and bytecode [PATCH]
Date: Sun, 16 Jan 2022 12:04:51 +0100 [thread overview]
Message-ID: <AE02E5D0-BB58-4112-8DC6-A1B14CD35B11@acm.org> (raw)
In-Reply-To: <jwv5yqkipje.fsf-monnier+emacs@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 465 bytes --]
15 jan. 2022 kl. 23.51 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
> Doesn't sound right: char tables are indexed by chars (i.e. Unicode code
> points) not by bytes, so we need to convert the byte into a char
> before indexing.
Sure, I'm happy to do it either way. Chars retrieved from unibyte buffers or strings really should be converted to multibyte before used with char-syntax; unibyte buffers are not very common but strings slightly more so.
[-- Attachment #2: 0001-Fix-Fchar_syntax-for-non-ASCII-in-unibyte-buffers.patch --]
[-- Type: application/octet-stream, Size: 3163 bytes --]
From 2adb5c862232abf126f73d0aa514f5ae8b6babba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Sun, 16 Jan 2022 11:58:00 +0100
Subject: [PATCH] Fix Fchar_syntax for non-ASCII in unibyte buffers
Fchar_syntax did not convert unibyte characters to multibyte when the
current buffer was unibyte, in contrast to `char-syntax` in
byte-compiled code (bug#53260).
* src/bytecode.c (exec_byte_code): Call out to Fchar_syntax;
the dynamic frequency is too low to justify inlining here, and it
did lead to implementations diverging.
* src/syntax.c (Fchar_syntax): Convert non-ASCII unibyte values to
multibyte. Remove useless SETUP_BUFFER_SYNTAX_TABLE which has no
effect here.
* test/src/syntax-tests.el (syntax-char-syntax): New test.
---
src/bytecode.c | 8 +-------
src/syntax.c | 6 +++---
test/src/syntax-tests.el | 15 +++++++++++++++
3 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/src/bytecode.c b/src/bytecode.c
index 472992be18..b7e65d05ae 100644
--- a/src/bytecode.c
+++ b/src/bytecode.c
@@ -1167,13 +1167,7 @@ #define DEFINE(name, value) LABEL (name) ,
NEXT;
CASE (Bchar_syntax):
- {
- CHECK_CHARACTER (TOP);
- int c = XFIXNAT (TOP);
- if (NILP (BVAR (current_buffer, enable_multibyte_characters)))
- c = make_char_multibyte (c);
- XSETFASTINT (TOP, syntax_code_spec[SYNTAX (c)]);
- }
+ TOP = Fchar_syntax (TOP);
NEXT;
CASE (Bbuffer_substring):
diff --git a/src/syntax.c b/src/syntax.c
index 9df878b8ed..c1e81dfa47 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -1101,10 +1101,10 @@ DEFUN ("char-syntax", Fchar_syntax, Schar_syntax, 1, 1, 0,
`syntax-after' instead. */)
(Lisp_Object character)
{
- int char_int;
CHECK_CHARACTER (character);
- char_int = XFIXNUM (character);
- SETUP_BUFFER_SYNTAX_TABLE ();
+ int char_int = XFIXNAT (character);
+ if (NILP (BVAR (current_buffer, enable_multibyte_characters)))
+ char_int = make_char_multibyte (char_int);
return make_fixnum (syntax_code_spec[SYNTAX (char_int)]);
}
diff --git a/test/src/syntax-tests.el b/test/src/syntax-tests.el
index 3b9f21cde3..501b5e067f 100644
--- a/test/src/syntax-tests.el
+++ b/test/src/syntax-tests.el
@@ -506,4 +506,19 @@ test-from-to-parse-partial-sexp
(should (parse-partial-sexp 1 1))
(should-error (parse-partial-sexp 2 1))))
+(ert-deftest syntax-char-syntax ()
+ ;; Verify that char-syntax behaves identically in interpreted and
+ ;; byte-compiled code (bug#53260).
+ (let ((cs (byte-compile (lambda (x) (char-syntax x)))))
+ ;; Use a unibyte buffer with a syntax table using symbol syntax
+ ;; for raw byte 128.
+ (with-temp-buffer
+ (set-buffer-multibyte nil)
+ (let ((st (make-syntax-table)))
+ (modify-syntax-entry (unibyte-char-to-multibyte 128) "_" st)
+ (set-syntax-table st)
+ (should (equal (char-syntax 128) ?_))
+ (should (equal (funcall cs 128) ?_))))
+ (list (char-syntax 128) (funcall cs 128))))
+
;;; syntax-tests.el ends here
--
2.32.0 (Apple Git-132)
next prev parent reply other threads:[~2022-01-16 11:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-14 16:43 bug#53260: char-syntax differs in interpreter and bytecode Mattias Engdegård
2022-01-15 8:36 ` Lars Ingebrigtsen
2022-01-15 14:46 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-01-15 17:29 ` Mattias Engdegård
2022-01-15 17:57 ` Eli Zaretskii
2022-01-15 22:51 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-01-16 11:04 ` Mattias Engdegård [this message]
2022-01-20 9:30 ` bug#53260: char-syntax differs in interpreter and bytecode [PATCH] Lars Ingebrigtsen
2022-01-20 10:47 ` Mattias Engdegård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AE02E5D0-BB58-4112-8DC6-A1B14CD35B11@acm.org \
--to=mattiase@acm.org \
--cc=53260@debbugs.gnu.org \
--cc=larsi@gnus.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.