From: Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net>
To: 61005@debbugs.gnu.org
Subject: bug#61005: 28.1.91; Encoding not detected in HTML files inside archives
Date: Sun, 22 Jan 2023 14:24:07 +0100 [thread overview]
Message-ID: <877cxeem88.fsf@turtle-trading.net> (raw)
In-Reply-To: <87bkmqempd.fsf@turtle-trading.net> (Benjamin Riefenstahl's message of "Sun, 22 Jan 2023 14:13:50 +0100")
[-- Attachment #1: Type: text/plain, Size: 200 bytes --]
The promised patch. This is against master.
Also a small test-suite for sgml-html-meta-auto-coding-function, if you
want that. If you care, I could also add one for
sgml-xml-auto-coding-function.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-decoding-HTML-files-from-archives.patch --]
[-- Type: text/x-diff, Size: 1391 bytes --]
From 95b63baf1bf411422c61b76470abb1aa681f2db2 Mon Sep 17 00:00:00 2001
From: Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net>
Date: Tue, 17 Jan 2023 20:08:15 +0200
Subject: [PATCH 1/2] Fix decoding HTML files from archives
* lisp/international/mule.el (sgml-xml-auto-coding-function): Avoid
signaling an error from coding-system-equal when the XML encoding tag
specifies an encoding whose type is 'charset'. (Bug#61005)
This is the same fix as in #df7ed10e for
sgml-xml-auto-coding-function.
---
lisp/international/mule.el | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 4f6addea387..9480213be9a 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2539,6 +2539,10 @@ sgml-html-meta-auto-coding-function
(bfcs-type
(coding-system-type buffer-file-coding-system)))
(if (and enable-multibyte-characters
+ ;; 'charset' will signal an error in
+ ;; coding-system-equal, since it isn't a
+ ;; coding-system. So test that up front.
+ (not (equal sym-type 'charset))
(coding-system-equal 'utf-8 sym-type)
(coding-system-equal 'utf-8 bfcs-type))
buffer-file-coding-system
--
2.30.2
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Add-test-suite-for-sgml-html-meta-auto-coding-functi.patch --]
[-- Type: text/x-diff, Size: 3803 bytes --]
From 29996e07c23c9716f731dde224c8ca47e321e697 Mon Sep 17 00:00:00 2001
From: Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net>
Date: Tue, 17 Jan 2023 20:13:39 +0200
Subject: [PATCH 2/2] Add test suite for sgml-html-meta-auto-coding-function
* test/lisp/international/mule-tests.el (sgml-html-meta-pre)
(sgml-html-meta-post, sgml-html-meta-run, sgml-html-meta-utf-8)
(sgml-html-meta-windows-hebrew, sgml-html-meta-none)
(sgml-html-meta-unknown-coding, sgml-html-meta-no-pre)
(sgml-html-meta-no-post-less-than-10lines)
(sgml-html-meta-no-post-10lines, sgml-html-meta-utf-8-with-bom): Add.
---
test/lisp/international/mule-tests.el | 66 +++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/test/lisp/international/mule-tests.el b/test/lisp/international/mule-tests.el
index 4f70b275848..6e23d8c5421 100644
--- a/test/lisp/international/mule-tests.el
+++ b/test/lisp/international/mule-tests.el
@@ -70,6 +70,72 @@ mule-hz
;; The chinese-hz encoding is not ASCII compatible.
(should-not (coding-system-get 'chinese-hz :ascii-compatible-p)))
+;;; Testing `sgml-html-meta-auto-coding-function'.
+
+(defconst sgml-html-meta-pre "<!doctype html><html><head>"
+ "The beginning of a minimal HTML document.")
+
+(defconst sgml-html-meta-post "</head></html>"
+ "The end of a minimal HTML document.")
+
+(defun sgml-html-meta-run (coding-system)
+ "Run `sgml-html-meta-auto-coding-function' on a minimal HTML.
+When CODING-SYSTEM is not nil, insert it, wrapped in a '<meta>'
+element. When CODING-SYSTEM contains HTML meta characters or
+white space, insert it as-is, without additional formatting. Use
+the variables `sgml-html-meta-pre' and `sgml-html-meta-post' to
+provide HTML fragments. Some tests override those variables."
+ (with-temp-buffer
+ (insert sgml-html-meta-pre
+ (cond ((not coding-system)
+ "")
+ ((string-match "[<>'\"\n ]" coding-system)
+ coding-system)
+ (t
+ (format "<meta charset='%s'>" coding-system)))
+ sgml-html-meta-post)
+ (goto-char (point-min))
+ (sgml-html-meta-auto-coding-function (- (point-max) (point-min)))))
+
+(ert-deftest sgml-html-meta-utf-8 ()
+ "Baseline: UTF-8."
+ (should (eq 'utf-8 (sgml-html-meta-run "utf-8"))))
+
+(ert-deftest sgml-html-meta-windows-hebrew ()
+ "A non-Unicode charset."
+ (should (eq 'windows-1255 (sgml-html-meta-run "windows-1255"))))
+
+(ert-deftest sgml-html-meta-none ()
+ (should (eq nil (sgml-html-meta-run nil))))
+
+(ert-deftest sgml-html-meta-unknown-coding ()
+ (should (eq nil (sgml-html-meta-run "XXX"))))
+
+(ert-deftest sgml-html-meta-no-pre ()
+ "Without the prefix, so not HTML."
+ (let ((sgml-html-meta-pre ""))
+ (should (eq nil (sgml-html-meta-run "utf-8")))))
+
+(ert-deftest sgml-html-meta-no-post-less-than-10lines ()
+ "No '</head>', detect charset in the first 10 lines."
+ (let ((sgml-html-meta-post ""))
+ (should (eq 'utf-8 (sgml-html-meta-run
+ (concat "\n\n\n\n\n\n\n\n\n"
+ "<meta charset='utf-8'>"))))))
+
+(ert-deftest sgml-html-meta-no-post-10lines ()
+ "No '</head>', do not detect charset after the first 10 lines."
+ (let ((sgml-html-meta-post ""))
+ (should (eq nil (sgml-html-meta-run
+ (concat "\n\n\n\n\n\n\n\n\n\n"
+ "<meta charset='utf-8'>"))))))
+
+(ert-deftest sgml-html-meta-utf-8-with-bom ()
+ "Requesting 'UTF-8' does not override `utf-8-with-signature'.
+Check fix for Bug#20623."
+ (let ((buffer-file-coding-system 'utf-8-with-signature))
+ (should (eq 'utf-8-with-signature (sgml-html-meta-run "utf-8")))))
+
;; Stop "Local Variables" above causing confusion when visiting this file.
\f
--
2.30.2
next prev parent reply other threads:[~2023-01-22 13:24 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-22 13:13 bug#61005: 28.1.91; Encoding not detected in HTML files inside archives Benjamin Riefenstahl
2023-01-22 13:24 ` Benjamin Riefenstahl [this message]
2023-01-22 14:09 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877cxeem88.fsf@turtle-trading.net \
--to=b.riefenstahl@turtle-trading.net \
--cc=61005@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).