From: "Mattias Engdegård" <mattiase@acm.org>
To: 34492@debbugs.gnu.org
Subject: bug#34492: Acknowledgement (rx: ASCII-raw byte ranges comprise all of Unicode)
Date: Fri, 15 Feb 2019 19:29:28 +0100 [thread overview]
Message-ID: <DAFD97B2-85EF-46C5-9431-D149CF47C09C@acm.org> (raw)
In-Reply-To: <handler.34492.B.15502550523602.ack@debbugs.gnu.org>
[-- Attachment #1: Type: text/plain, Size: 8 bytes --]
Patch.
[-- Attachment #2: 0001-Prevent-over-eager-rx-character-range-condensation.patch --]
[-- Type: application/octet-stream, Size: 2596 bytes --]
From 39a593336d00c3418f52fbe205b4dc284e8b65ce Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Fri, 15 Feb 2019 19:27:48 +0100
Subject: [PATCH] Prevent over-eager rx character range condensation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
`rx' incorrectly considers character ranges between ASCII and raw bytes to
cover all codes in-between, which includes all non-ASCII Unicode chars.
This causes (any "\000-\377" ?Å) to be simplified to (any "\000-\377"),
which is not at all the same thing: [\000-\377] really means
[\000-\177\200-\377] (Bug#34492).
* lisp/emacs-lisp/rx.el (rx-any-condense-range): Split ranges going
from ASCII to raw bytes.
* test/lisp/emacs-lisp/rx-tests.el (rx-char-any-raw-byte): Add test case.
---
lisp/emacs-lisp/rx.el | 7 +++++++
test/lisp/emacs-lisp/rx-tests.el | 6 +++++-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/lisp/emacs-lisp/rx.el b/lisp/emacs-lisp/rx.el
index b2299030a1..715cd608c4 100644
--- a/lisp/emacs-lisp/rx.el
+++ b/lisp/emacs-lisp/rx.el
@@ -429,6 +429,13 @@ Only both edges of each range is checked."
;; set L list of all ranges
(mapc (lambda (e) (cond ((stringp e) (push e str))
((numberp e) (push (cons e e) l))
+ ;; Ranges between ASCII and raw bytes are split,
+ ;; to prevent accidental inclusion of Unicode
+ ;; characters later on.
+ ((and (<= (car e) #x7f)
+ (>= (cdr e) #x3fff80))
+ (push (cons (car e) #x7f) l)
+ (push (cons #x3fff80 (cdr e)) l))
(t (push e l))))
args)
;; condense overlapped ranges in L
diff --git a/test/lisp/emacs-lisp/rx-tests.el b/test/lisp/emacs-lisp/rx-tests.el
index f15e1016f7..e14feda347 100644
--- a/test/lisp/emacs-lisp/rx-tests.el
+++ b/test/lisp/emacs-lisp/rx-tests.el
@@ -53,7 +53,11 @@
;; Range of raw characters, multibyte.
(should (equal (string-match-p (rx (any "Å\211\326-\377\177"))
"XY\355\177\327")
- 2)))
+ 2))
+ ;; Split range; \177-\377ÿ should not be optimised to \177-\377.
+ (should (equal (string-match-p (rx (any "\177-\377" ?ÿ))
+ "ÿA\310B")
+ 0)))
(ert-deftest rx-pcase ()
(should (equal (pcase "a 1 2 3 1 1 b"
--
2.17.2 (Apple Git-113)
next prev parent reply other threads:[~2019-02-15 18:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-15 18:23 bug#34492: rx: ASCII-raw byte ranges comprise all of Unicode Mattias Engdegård
[not found] ` <handler.34492.B.15502550523602.ack@debbugs.gnu.org>
2019-02-15 18:29 ` Mattias Engdegård [this message]
2019-02-16 7:20 ` bug#34492: Acknowledgement (rx: ASCII-raw byte ranges comprise all of Unicode) Eli Zaretskii
2019-02-16 8:08 ` Mattias Engdegård
2019-02-16 10:14 ` Eli Zaretskii
2019-02-16 11:05 ` Mattias Engdegård
2019-02-16 11:40 ` Eli Zaretskii
2019-02-16 11:46 ` Mattias Engdegård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DAFD97B2-85EF-46C5-9431-D149CF47C09C@acm.org \
--to=mattiase@acm.org \
--cc=34492@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.