all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#73312: 31.0.50; textsec test failure because of UTS #46 changes
@ 2024-09-17 10:11 Robert Pluim
  2024-09-17 13:10 ` Eli Zaretskii
  0 siblings, 1 reply; 3+ messages in thread
From: Robert Pluim @ 2024-09-17 10:11 UTC (permalink / raw)
  To: 73312

Following the update to Unicode 16, the textsec tests now fail:

  GEN      lisp/international/textsec-tests.log
Running 12 tests (2024-09-17 11:51:51+0200, selector `(not (or (tag :unstable) (tag :nativecomp)))')
   passed   1/12  test-confusable (0.001562 sec)
   passed   2/12  test-minimal-scripts (0.000155 sec)
   passed   3/12  test-mixed-numbers (0.000846 sec)
   passed   4/12  test-resolved (0.000120 sec)
   passed   5/12  test-restriction-level (0.000259 sec)
   passed   6/12  test-scripts (0.000354 sec)
   passed   7/12  test-suspicious-email (0.001587 sec)
   passed   8/12  test-suspicious-link (0.015283 sec)
   passed   9/12  test-suspicious-local (0.000522 sec)
   passed  10/12  test-suspicious-name (0.000420 sec)
   passed  11/12  test-suspicious-url (0.000498 sec)
Test test-suspiction-domain backtrace:
  signal(ert-test-failed (((should (textsec-domain-suspicious-p "foo/b
  ert-fail(((should (textsec-domain-suspicious-p "foo/bar.org")) :form
  (if (unwind-protect (setq value-222 (apply fn-220 args-221)) (setq f
  (let (form-description-224) (if (unwind-protect (setq value-222 (app
  (let ((value-222 'ert-form-evaluation-aborted-223)) (let (form-descr
  (let* ((fn-220 #'textsec-domain-suspicious-p) (args-221 (condition-c
  #f(lambda () [t] (let* ((fn-220 #'textsec-domain-suspicious-p) (args
  #f(compiled-function () #<bytecode -0x167cc0e1752f76aa>)()
  handler-bind-1(#f(compiled-function () #<bytecode -0x167cc0e1752f76a
  ert--run-test-internal(#s(ert--test-execution-info :test #s(ert-test
  ert-run-test(#s(ert-test :name test-suspiction-domain :documentation
  ert-run-or-rerun-test(#s(ert--stats :selector ... :tests ... :test-m
  ert-run-tests((not (or (tag :unstable) (tag :nativecomp))) #f(compil
  ert-run-tests-batch((not (or (tag :unstable) (tag :nativecomp))))
  ert-run-tests-batch-and-exit((not (or (tag :unstable) (tag :nativeco
  eval((ert-run-tests-batch-and-exit '(not (or (tag :unstable) (tag :n
  command-line-1(("-L" ":." "-l" "ert" "--eval" "(setq treesit-extra-l
  command-line()
  normal-top-level()
Test test-suspiction-domain condition:
    (ert-test-failed
     ((should (textsec-domain-suspicious-p "foo/bar.org")) :form
      (textsec-domain-suspicious-p "foo/bar.org") :value nil))
   FAILED  12/12  test-suspiction-domain (0.000228 sec) at lisp/international/textsec-tests.el:114

Ran 12 tests, 11 results as expected, 1 unexpected (2024-09-17 11:51:51+0200, 0.102143 sec)

1 unexpected results:
   FAILED  test-suspiction-domain

This is because UTS #46 in their infinite wisdom have decided to change
the rules on how to check what is considered an allowed character in a
domain name. Previously, IdnaMappingTable.txt contained eg:

002F          ; disallowed_STD3_valid                  # 1.1  SOLIDUS

but now it contains

002F          ; valid      ;      ; NV8    # 1.1  SOLIDUS

with a change to section 4.1 of UTS#46 saying that only [a-z0-9-] are
allowed for ASCII. Note that theyʼve helpfully marked
valid-but-invalid-in-idna characters with either NV8 or XV8, but then
have unhelpfully said that those markings are not normative. <sigh>

Anyway, willfully ignoring their verbiage about normative markings,
the following fixes it for me, at least until the next version of UTS
#46, I guess.

diff --git a/admin/unidata/unidata-gen.el b/admin/unidata/unidata-gen.el
index 7be03fe63af..adbe9c83670 100644
--- a/admin/unidata/unidata-gen.el
+++ b/admin/unidata/unidata-gen.el
@@ -1598,15 +1598,21 @@ unidata-gen-idna-mapping
   (let ((map (make-char-table nil)))
     (with-temp-buffer
       (unidata-gen--insert-file "IdnaMappingTable.txt")
-      (while (re-search-forward "^\\([0-9A-F]+\\)\\(?:\\.\\.\\([0-9A-F]+\\)\\)? +; +\\([^ ]+\\) +\\(?:; +\\([ 0-9A-F]+\\)\\)?"
+      (while (re-search-forward "^\\([0-9A-F]+\\)\\(?:\\.\\.\\([0-9A-F]+\\)\\)? +; +\\([^ ]+\\) +\\(?:; +\\([ 0-9A-F]+\\)\\)?\\(?:; \\(NV8\\|XV8\\)\\)?"
                                 nil t)
         (let ((start (match-string 1))
               (end (match-string 2))
               (status (match-string 3))
-              (mapped (match-string 4)))
+              (mapped (match-string 4))
+              (idna-status (match-string 5)))
           ;; Make reading the file slightly faster by using `t'
           ;; instead of `disallowed' all over the place.
-          (when (string-match-p "\\`disallowed" status)
+          (when (or (string-match-p "\\`disallowed" status)
+                    ;; UTS#46 messed us about with "status = valid" for
+                    ;; invalid characters, so we need to check for "NV8" or
+                    ;; "XV8".
+                    (string= idna-status "NV8")
+                    (string= idna-status "XV8"))
             (setq status "t"))
           (unless (or (equal status "valid")
                       (equal status "deviation"))



Robert
-- 





^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-09-17 13:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17 10:11 bug#73312: 31.0.50; textsec test failure because of UTS #46 changes Robert Pluim
2024-09-17 13:10 ` Eli Zaretskii
2024-09-17 13:52   ` Robert Pluim

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.