unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Robert Pluim <rpluim@gmail.com>
To: 73312@debbugs.gnu.org
Subject: bug#73312: 31.0.50; textsec test failure because of UTS #46 changes
Date: Tue, 17 Sep 2024 12:11:50 +0200	[thread overview]
Message-ID: <87ed5ik2s9.fsf@gmail.com> (raw)

Following the update to Unicode 16, the textsec tests now fail:

  GEN      lisp/international/textsec-tests.log
Running 12 tests (2024-09-17 11:51:51+0200, selector `(not (or (tag :unstable) (tag :nativecomp)))')
   passed   1/12  test-confusable (0.001562 sec)
   passed   2/12  test-minimal-scripts (0.000155 sec)
   passed   3/12  test-mixed-numbers (0.000846 sec)
   passed   4/12  test-resolved (0.000120 sec)
   passed   5/12  test-restriction-level (0.000259 sec)
   passed   6/12  test-scripts (0.000354 sec)
   passed   7/12  test-suspicious-email (0.001587 sec)
   passed   8/12  test-suspicious-link (0.015283 sec)
   passed   9/12  test-suspicious-local (0.000522 sec)
   passed  10/12  test-suspicious-name (0.000420 sec)
   passed  11/12  test-suspicious-url (0.000498 sec)
Test test-suspiction-domain backtrace:
  signal(ert-test-failed (((should (textsec-domain-suspicious-p "foo/b
  ert-fail(((should (textsec-domain-suspicious-p "foo/bar.org")) :form
  (if (unwind-protect (setq value-222 (apply fn-220 args-221)) (setq f
  (let (form-description-224) (if (unwind-protect (setq value-222 (app
  (let ((value-222 'ert-form-evaluation-aborted-223)) (let (form-descr
  (let* ((fn-220 #'textsec-domain-suspicious-p) (args-221 (condition-c
  #f(lambda () [t] (let* ((fn-220 #'textsec-domain-suspicious-p) (args
  #f(compiled-function () #<bytecode -0x167cc0e1752f76aa>)()
  handler-bind-1(#f(compiled-function () #<bytecode -0x167cc0e1752f76a
  ert--run-test-internal(#s(ert--test-execution-info :test #s(ert-test
  ert-run-test(#s(ert-test :name test-suspiction-domain :documentation
  ert-run-or-rerun-test(#s(ert--stats :selector ... :tests ... :test-m
  ert-run-tests((not (or (tag :unstable) (tag :nativecomp))) #f(compil
  ert-run-tests-batch((not (or (tag :unstable) (tag :nativecomp))))
  ert-run-tests-batch-and-exit((not (or (tag :unstable) (tag :nativeco
  eval((ert-run-tests-batch-and-exit '(not (or (tag :unstable) (tag :n
  command-line-1(("-L" ":." "-l" "ert" "--eval" "(setq treesit-extra-l
  command-line()
  normal-top-level()
Test test-suspiction-domain condition:
    (ert-test-failed
     ((should (textsec-domain-suspicious-p "foo/bar.org")) :form
      (textsec-domain-suspicious-p "foo/bar.org") :value nil))
   FAILED  12/12  test-suspiction-domain (0.000228 sec) at lisp/international/textsec-tests.el:114

Ran 12 tests, 11 results as expected, 1 unexpected (2024-09-17 11:51:51+0200, 0.102143 sec)

1 unexpected results:
   FAILED  test-suspiction-domain

This is because UTS #46 in their infinite wisdom have decided to change
the rules on how to check what is considered an allowed character in a
domain name. Previously, IdnaMappingTable.txt contained eg:

002F          ; disallowed_STD3_valid                  # 1.1  SOLIDUS

but now it contains

002F          ; valid      ;      ; NV8    # 1.1  SOLIDUS

with a change to section 4.1 of UTS#46 saying that only [a-z0-9-] are
allowed for ASCII. Note that theyʼve helpfully marked
valid-but-invalid-in-idna characters with either NV8 or XV8, but then
have unhelpfully said that those markings are not normative. <sigh>

Anyway, willfully ignoring their verbiage about normative markings,
the following fixes it for me, at least until the next version of UTS
#46, I guess.

diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el
index 7be03fe63af..adbe9c83670 100644
--- a/admin/unidata/unidata-gen.el
+++ b/admin/unidata/unidata-gen.el
@@ -1598,15 +1598,21 @@ unidata-gen-idna-mapping
   (let ((map (make-char-table nil)))
     (with-temp-buffer
       (unidata-gen--insert-file "IdnaMappingTable.txt")
-      (while (re-search-forward "^\\([0-9A-F]+\\)\\(?:\\.\\.\\([0-9A-F]+\\)\\)? +; +\\([^ ]+\\) +\\(?:; +\\([ 0-9A-F]+\\)\\)?"
+      (while (re-search-forward "^\\([0-9A-F]+\\)\\(?:\\.\\.\\([0-9A-F]+\\)\\)? +; +\\([^ ]+\\) +\\(?:; +\\([ 0-9A-F]+\\)\\)?\\(?:; \\(NV8\\|XV8\\)\\)?"
                                 nil t)
         (let ((start (match-string 1))
               (end (match-string 2))
               (status (match-string 3))
-              (mapped (match-string 4)))
+              (mapped (match-string 4))
+              (idna-status (match-string 5)))
           ;; Make reading the file slightly faster by using `t'
           ;; instead of `disallowed' all over the place.
-          (when (string-match-p "\\`disallowed" status)
+          (when (or (string-match-p "\\`disallowed" status)
+                    ;; UTS#46 messed us about with "status = valid" for
+                    ;; invalid characters, so we need to check for "NV8" or
+                    ;; "XV8".
+                    (string= idna-status "NV8")
+                    (string= idna-status "XV8"))
             (setq status "t"))
           (unless (or (equal status "valid")
                       (equal status "deviation"))



Robert
-- 





             reply	other threads:[~2024-09-17 10:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-17 10:11 Robert Pluim [this message]
2024-09-17 13:10 ` bug#73312: 31.0.50; textsec test failure because of UTS #46 changes Eli Zaretskii
2024-09-17 13:52   ` Robert Pluim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ed5ik2s9.fsf@gmail.com \
    --to=rpluim@gmail.com \
    --cc=73312@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).