From: Robert Pluim <rpluim@gmail.com>
To: 73312@debbugs.gnu.org
Subject: bug#73312: 31.0.50; textsec test failure because of UTS #46 changes
Date: Tue, 17 Sep 2024 12:11:50 +0200 [thread overview]
Message-ID: <87ed5ik2s9.fsf@gmail.com> (raw)
Following the update to Unicode 16, the textsec tests now fail:
GEN lisp/international/textsec-tests.log
Running 12 tests (2024-09-17 11:51:51+0200, selector `(not (or (tag :unstable) (tag :nativecomp)))')
passed 1/12 test-confusable (0.001562 sec)
passed 2/12 test-minimal-scripts (0.000155 sec)
passed 3/12 test-mixed-numbers (0.000846 sec)
passed 4/12 test-resolved (0.000120 sec)
passed 5/12 test-restriction-level (0.000259 sec)
passed 6/12 test-scripts (0.000354 sec)
passed 7/12 test-suspicious-email (0.001587 sec)
passed 8/12 test-suspicious-link (0.015283 sec)
passed 9/12 test-suspicious-local (0.000522 sec)
passed 10/12 test-suspicious-name (0.000420 sec)
passed 11/12 test-suspicious-url (0.000498 sec)
Test test-suspiction-domain backtrace:
signal(ert-test-failed (((should (textsec-domain-suspicious-p "foo/b
ert-fail(((should (textsec-domain-suspicious-p "foo/bar.org")) :form
(if (unwind-protect (setq value-222 (apply fn-220 args-221)) (setq f
(let (form-description-224) (if (unwind-protect (setq value-222 (app
(let ((value-222 'ert-form-evaluation-aborted-223)) (let (form-descr
(let* ((fn-220 #'textsec-domain-suspicious-p) (args-221 (condition-c
#f(lambda () [t] (let* ((fn-220 #'textsec-domain-suspicious-p) (args
#f(compiled-function () #<bytecode -0x167cc0e1752f76aa>)()
handler-bind-1(#f(compiled-function () #<bytecode -0x167cc0e1752f76a
ert--run-test-internal(#s(ert--test-execution-info :test #s(ert-test
ert-run-test(#s(ert-test :name test-suspiction-domain :documentation
ert-run-or-rerun-test(#s(ert--stats :selector ... :tests ... :test-m
ert-run-tests((not (or (tag :unstable) (tag :nativecomp))) #f(compil
ert-run-tests-batch((not (or (tag :unstable) (tag :nativecomp))))
ert-run-tests-batch-and-exit((not (or (tag :unstable) (tag :nativeco
eval((ert-run-tests-batch-and-exit '(not (or (tag :unstable) (tag :n
command-line-1(("-L" ":." "-l" "ert" "--eval" "(setq treesit-extra-l
command-line()
normal-top-level()
Test test-suspiction-domain condition:
(ert-test-failed
((should (textsec-domain-suspicious-p "foo/bar.org")) :form
(textsec-domain-suspicious-p "foo/bar.org") :value nil))
FAILED 12/12 test-suspiction-domain (0.000228 sec) at lisp/international/textsec-tests.el:114
Ran 12 tests, 11 results as expected, 1 unexpected (2024-09-17 11:51:51+0200, 0.102143 sec)
1 unexpected results:
FAILED test-suspiction-domain
This is because UTS #46 in their infinite wisdom have decided to change
the rules on how to check what is considered an allowed character in a
domain name. Previously, IdnaMappingTable.txt contained eg:
002F ; disallowed_STD3_valid # 1.1 SOLIDUS
but now it contains
002F ; valid ; ; NV8 # 1.1 SOLIDUS
with a change to section 4.1 of UTS#46 saying that only [a-z0-9-] are
allowed for ASCII. Note that theyʼve helpfully marked
valid-but-invalid-in-idna characters with either NV8 or XV8, but then
have unhelpfully said that those markings are not normative. <sigh>
Anyway, willfully ignoring their verbiage about normative markings,
the following fixes it for me, at least until the next version of UTS
#46, I guess.
diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el
index 7be03fe63af..adbe9c83670 100644
--- a/admin/unidata/unidata-gen.el
+++ b/admin/unidata/unidata-gen.el
@@ -1598,15 +1598,21 @@ unidata-gen-idna-mapping
(let ((map (make-char-table nil)))
(with-temp-buffer
(unidata-gen--insert-file "IdnaMappingTable.txt")
- (while (re-search-forward "^\\([0-9A-F]+\\)\\(?:\\.\\.\\([0-9A-F]+\\)\\)? +; +\\([^ ]+\\) +\\(?:; +\\([ 0-9A-F]+\\)\\)?"
+ (while (re-search-forward "^\\([0-9A-F]+\\)\\(?:\\.\\.\\([0-9A-F]+\\)\\)? +; +\\([^ ]+\\) +\\(?:; +\\([ 0-9A-F]+\\)\\)?\\(?:; \\(NV8\\|XV8\\)\\)?"
nil t)
(let ((start (match-string 1))
(end (match-string 2))
(status (match-string 3))
- (mapped (match-string 4)))
+ (mapped (match-string 4))
+ (idna-status (match-string 5)))
;; Make reading the file slightly faster by using `t'
;; instead of `disallowed' all over the place.
- (when (string-match-p "\\`disallowed" status)
+ (when (or (string-match-p "\\`disallowed" status)
+ ;; UTS#46 messed us about with "status = valid" for
+ ;; invalid characters, so we need to check for "NV8" or
+ ;; "XV8".
+ (string= idna-status "NV8")
+ (string= idna-status "XV8"))
(setq status "t"))
(unless (or (equal status "valid")
(equal status "deviation"))
Robert
--
next reply other threads:[~2024-09-17 10:11 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-17 10:11 Robert Pluim [this message]
2024-09-17 13:10 ` bug#73312: 31.0.50; textsec test failure because of UTS #46 changes Eli Zaretskii
2024-09-17 13:52 ` Robert Pluim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ed5ik2s9.fsf@gmail.com \
--to=rpluim@gmail.com \
--cc=73312@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).