From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Benjamin Riefenstahl Newsgroups: gmane.emacs.bugs Subject: bug#61005: 28.1.91; Encoding not detected in HTML files inside archives Date: Sun, 22 Jan 2023 14:24:07 +0100 Message-ID: <877cxeem88.fsf@turtle-trading.net> References: <87bkmqempd.fsf@turtle-trading.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3941"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2.50 (gnu/linux) To: 61005@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Jan 22 14:25:11 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pJaLP-0000oI-0T for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 22 Jan 2023 14:25:11 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pJaLI-0002RE-Tq; Sun, 22 Jan 2023 08:25:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJaLG-0002Ql-W5 for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:25:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pJaLG-0003nh-F7 for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:25:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pJaLG-0007xC-8d for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:25:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Benjamin Riefenstahl Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 22 Jan 2023 13:25:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61005 X-GNU-PR-Package: emacs Original-Received: via spool by 61005-submit@debbugs.gnu.org id=B61005.167439385730508 (code B ref 61005); Sun, 22 Jan 2023 13:25:02 +0000 Original-Received: (at 61005) by debbugs.gnu.org; 22 Jan 2023 13:24:17 +0000 Original-Received: from localhost ([127.0.0.1]:50920 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJaKX-0007vz-8I for submit@debbugs.gnu.org; Sun, 22 Jan 2023 08:24:17 -0500 Original-Received: from odoacer.turtle-trading.net ([93.241.193.16]:49764) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJaKU-0007vk-JU for 61005@debbugs.gnu.org; Sun, 22 Jan 2023 08:24:15 -0500 Original-Received: from zenobia.turtle-trading.net ([192.168.2.111]) by odoacer.turtle-trading.net with esmtp (Exim 4.80) (envelope-from ) id 1pJaKO-00077x-2X; Sun, 22 Jan 2023 14:24:08 +0100 Original-Received: from benny by zenobia.turtle-trading.net with local (Exim 4.94.2) (envelope-from ) id 1pJaKN-0009wO-QW; Sun, 22 Jan 2023 14:24:07 +0100 In-Reply-To: <87bkmqempd.fsf@turtle-trading.net> (Benjamin Riefenstahl's message of "Sun, 22 Jan 2023 14:13:50 +0100") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:253935 Archived-At: --=-=-= Content-Type: text/plain The promised patch. This is against master. Also a small test-suite for sgml-html-meta-auto-coding-function, if you want that. If you care, I could also add one for sgml-xml-auto-coding-function. --=-=-= Content-Type: text/x-diff Content-Disposition: attachment; filename=0001-Fix-decoding-HTML-files-from-archives.patch >From 95b63baf1bf411422c61b76470abb1aa681f2db2 Mon Sep 17 00:00:00 2001 From: Benjamin Riefenstahl Date: Tue, 17 Jan 2023 20:08:15 +0200 Subject: [PATCH 1/2] Fix decoding HTML files from archives * lisp/international/mule.el (sgml-xml-auto-coding-function): Avoid signaling an error from coding-system-equal when the XML encoding tag specifies an encoding whose type is 'charset'. (Bug#61005) This is the same fix as in #df7ed10e for sgml-xml-auto-coding-function. --- lisp/international/mule.el | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lisp/international/mule.el b/lisp/international/mule.el index 4f6addea387..9480213be9a 100644 --- a/lisp/international/mule.el +++ b/lisp/international/mule.el @@ -2539,6 +2539,10 @@ sgml-html-meta-auto-coding-function (bfcs-type (coding-system-type buffer-file-coding-system))) (if (and enable-multibyte-characters + ;; 'charset' will signal an error in + ;; coding-system-equal, since it isn't a + ;; coding-system. So test that up front. + (not (equal sym-type 'charset)) (coding-system-equal 'utf-8 sym-type) (coding-system-equal 'utf-8 bfcs-type)) buffer-file-coding-system -- 2.30.2 --=-=-= Content-Type: text/x-diff Content-Disposition: attachment; filename=0002-Add-test-suite-for-sgml-html-meta-auto-coding-functi.patch >From 29996e07c23c9716f731dde224c8ca47e321e697 Mon Sep 17 00:00:00 2001 From: Benjamin Riefenstahl Date: Tue, 17 Jan 2023 20:13:39 +0200 Subject: [PATCH 2/2] Add test suite for sgml-html-meta-auto-coding-function * test/lisp/international/mule-tests.el (sgml-html-meta-pre) (sgml-html-meta-post, sgml-html-meta-run, sgml-html-meta-utf-8) (sgml-html-meta-windows-hebrew, sgml-html-meta-none) (sgml-html-meta-unknown-coding, sgml-html-meta-no-pre) (sgml-html-meta-no-post-less-than-10lines) (sgml-html-meta-no-post-10lines, sgml-html-meta-utf-8-with-bom): Add. --- test/lisp/international/mule-tests.el | 66 +++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/test/lisp/international/mule-tests.el b/test/lisp/international/mule-tests.el index 4f70b275848..6e23d8c5421 100644 --- a/test/lisp/international/mule-tests.el +++ b/test/lisp/international/mule-tests.el @@ -70,6 +70,72 @@ mule-hz ;; The chinese-hz encoding is not ASCII compatible. (should-not (coding-system-get 'chinese-hz :ascii-compatible-p))) +;;; Testing `sgml-html-meta-auto-coding-function'. + +(defconst sgml-html-meta-pre "" + "The beginning of a minimal HTML document.") + +(defconst sgml-html-meta-post "" + "The end of a minimal HTML document.") + +(defun sgml-html-meta-run (coding-system) + "Run `sgml-html-meta-auto-coding-function' on a minimal HTML. +When CODING-SYSTEM is not nil, insert it, wrapped in a '' +element. When CODING-SYSTEM contains HTML meta characters or +white space, insert it as-is, without additional formatting. Use +the variables `sgml-html-meta-pre' and `sgml-html-meta-post' to +provide HTML fragments. Some tests override those variables." + (with-temp-buffer + (insert sgml-html-meta-pre + (cond ((not coding-system) + "") + ((string-match "[<>'\"\n ]" coding-system) + coding-system) + (t + (format "" coding-system))) + sgml-html-meta-post) + (goto-char (point-min)) + (sgml-html-meta-auto-coding-function (- (point-max) (point-min))))) + +(ert-deftest sgml-html-meta-utf-8 () + "Baseline: UTF-8." + (should (eq 'utf-8 (sgml-html-meta-run "utf-8")))) + +(ert-deftest sgml-html-meta-windows-hebrew () + "A non-Unicode charset." + (should (eq 'windows-1255 (sgml-html-meta-run "windows-1255")))) + +(ert-deftest sgml-html-meta-none () + (should (eq nil (sgml-html-meta-run nil)))) + +(ert-deftest sgml-html-meta-unknown-coding () + (should (eq nil (sgml-html-meta-run "XXX")))) + +(ert-deftest sgml-html-meta-no-pre () + "Without the prefix, so not HTML." + (let ((sgml-html-meta-pre "")) + (should (eq nil (sgml-html-meta-run "utf-8"))))) + +(ert-deftest sgml-html-meta-no-post-less-than-10lines () + "No '', detect charset in the first 10 lines." + (let ((sgml-html-meta-post "")) + (should (eq 'utf-8 (sgml-html-meta-run + (concat "\n\n\n\n\n\n\n\n\n" + "")))))) + +(ert-deftest sgml-html-meta-no-post-10lines () + "No '', do not detect charset after the first 10 lines." + (let ((sgml-html-meta-post "")) + (should (eq nil (sgml-html-meta-run + (concat "\n\n\n\n\n\n\n\n\n\n" + "")))))) + +(ert-deftest sgml-html-meta-utf-8-with-bom () + "Requesting 'UTF-8' does not override `utf-8-with-signature'. +Check fix for Bug#20623." + (let ((buffer-file-coding-system 'utf-8-with-signature)) + (should (eq 'utf-8-with-signature (sgml-html-meta-run "utf-8"))))) + ;; Stop "Local Variables" above causing confusion when visiting this file. -- 2.30.2 --=-=-=--