From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Benjamin Riefenstahl Newsgroups: gmane.emacs.bugs Subject: bug#61005: 28.1.91; Encoding not detected in HTML files inside archives Date: Sun, 22 Jan 2023 14:13:50 +0100 Message-ID: <87bkmqempd.fsf@turtle-trading.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36421"; mail-complaints-to="usenet@ciao.gmane.io" To: 61005@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Jan 22 14:15:26 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pJaBx-0009Gt-5J for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 22 Jan 2023 14:15:25 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pJaBk-0001Bq-I4; Sun, 22 Jan 2023 08:15:12 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJaBg-0001BP-JM for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:15:08 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pJaBa-0001pa-Mg for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:15:08 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pJaBa-0007ho-5H for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:15:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Benjamin Riefenstahl Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 22 Jan 2023 13:15:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 61005 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.167439324329525 (code B ref -1); Sun, 22 Jan 2023 13:15:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 22 Jan 2023 13:14:03 +0000 Original-Received: from localhost ([127.0.0.1]:50906 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJaAd-0007g9-3r for submit@debbugs.gnu.org; Sun, 22 Jan 2023 08:14:03 -0500 Original-Received: from lists.gnu.org ([209.51.188.17]:36538) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJaAY-0007fb-PU for submit@debbugs.gnu.org; Sun, 22 Jan 2023 08:14:01 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJaAY-000134-JO for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:13:58 -0500 Original-Received: from odoacer.turtle-trading.net ([93.241.193.16]) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.90_1) (envelope-from ) id 1pJaAV-0001lJ-Om for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:13:58 -0500 Original-Received: from zenobia.turtle-trading.net ([192.168.2.111]) by odoacer.turtle-trading.net with esmtp (Exim 4.80) (envelope-from ) id 1pJaAQ-00077S-M1; Sun, 22 Jan 2023 14:13:50 +0100 Original-Received: from benny by zenobia.turtle-trading.net with local (Exim 4.94.2) (envelope-from ) id 1pJaAQ-0009AD-Dq; Sun, 22 Jan 2023 14:13:50 +0100 Received-SPF: none client-ip=93.241.193.16; envelope-from=benny@turtle-trading.net; helo=odoacer.turtle-trading.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_HTML_ATTACH=0.01, T_OBFU_HTML_ATT_MALW=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:253933 Archived-At: --=-=-= Content-Type: text/plain Content-Disposition: inline Problem ---- * Given an HTML file with charset "windows-1255". * Opening the file from disk detects the encoding correctly. * Opening a ZIP archive with the same file inside and than opening the HTML archive member does not detect the encoding, instead the coding system for saving is the default according to M-x describe-coding-system. Attached are two files test.html and test.zip. Call "emacs -Q test.html test.zip" and press RET on the archive member to reproduce. --=-=-= Content-Type: text/html; charset=windows-1255 Content-Disposition: attachment; filename=test.html Content-Transfer-Encoding: quoted-printable =F9=C8=D1=EC=E5=C9=ED

=F9=C8=D1=EC=E5=C9=ED

--=-=-= Content-Type: application/zip Content-Disposition: attachment; filename=test.zip Content-Transfer-Encoding: base64 UEsDBBQAAAAIAPGdMVauwGXsbwAAAKIAAAAJABwAdGVzdC5odG1sVVQJAAM138Zj9d7GY3V4CwAB BOgDAAAE6AMAALNRdPF3DokMcFXIKMnNseOygVAKCjYZqYkpIAaQmZtakqiQnJFYVJxaYqtUnpmX kl9erGtoZGqqZGejD5KFKizJLMlJtVP4eeLim6cn3yrY6EMEQMbpw8yzScpPqYSqzzBEVgzkgVVC FAD5YKcAAFBLAQIeAxQAAAAIAPGdMVauwGXsbwAAAKIAAAAJABgAAAAAAAEAAACkgQAAAAB0ZXN0 Lmh0bWxVVAUAAzXfxmN1eAsAAQToAwAABOgDAABQSwUGAAAAAAEAAQBPAAAAsgAAAAAA --=-=-= Content-Type: text/plain Content-Disposition: inline Solution ---- The problem seems to be the function sgml-html-meta-auto-coding-function. It is missing a condition similar to the one added to code in sgml-xml-auto-coding-function with commit #df7ed10e in 2018. modified lisp/international/mule.el @@ -2539,6 +2539,10 @@ sgml-html-meta-auto-coding-function (bfcs-type (coding-system-type buffer-file-coding-system))) (if (and enable-multibyte-characters + ;; 'charset' will signal an error in + ;; coding-system-equal, since it isn't a + ;; coding-system. So test that up front. + (not (equal sym-type 'charset)) (coding-system-equal 'utf-8 sym-type) (coding-system-equal 'utf-8 bfcs-type)) buffer-file-coding-system I will send this as a patch as soon as I have a bug number to mention in the commit message. ---- In GNU Emacs 28.1.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0) of 2022-08-29 built on arrian Repository revision: f4168b8143008b787a11366462c928d761e90dd0 Repository branch: emacs-28 Windowing system distributor 'The X.Org Foundation', version 11.0.12011000 System Description: Debian GNU/Linux 11 (bullseye) Configured features: ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XPM GTK3 ZLIB Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Dired by date Minor modes in effect: shell-dirtrack-mode: t desktop-save-mode: t display-time-mode: t xclip-mode: t xterm-mouse-mode: t delete-selection-mode: t cua-mode: t display-battery-mode: t tooltip-mode: t global-eldoc-mode: t show-paren-mode: t electric-indent-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t column-number-mode: t line-number-mode: t transient-mark-mode: t Load-path shadows: ~/Projects/ttf-mode/arc-mode-compat hides ~/emacs/arc-mode-compat /home/benny/.emacs.d/elpa/transient-20210723.1601/transient hides /usr/local/share/emacs/28.1.91/lisp/transient /home/benny/.emacs.d/elpa/dictionary-20201001.1727/dictionary hides /usr/local/share/emacs/28.1.91/lisp/net/dictionary Features: (shadow sort mail-extr emacsbug message rmc puny rfc822 mml mml-sec epa epg rfc6068 epg-config gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader arc-mode archive-mode benny-images dirtrack shell pcomplete misearch multi-isearch thai-util thai-word lao-util enriched view tabify benny-auto-insert ttf-glyphs rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap sgml-mode facemenu dom nxml-util nxml-enc xmltok mule-util jka-compr dired-aux time-date bug-reference imenu desktop frameset highline benny-calendar-cfg ange-ftp generic-x autoinsert cc-mode cc-fonts cc-guess cc-menus cc-styles cc-align cc-cmds cc-engine cc-vars cc-defs ps-print ps-print-loaddefs ps-def lpr advice cl-extra help-mode dired dired-loaddefs derived benny-x-clipboard disp-table time server protbuf xclip term/xterm xterm xt-mouse cal-china lunar solar cal-dst cal-bahai cal-islam cal-hebrew holidays hol-loaddefs vc-git diff-mode easy-mmode vc-dispatcher vc-fossil diary-lib diary-loaddefs cal-menu calendar cal-loaddefs delsel grep compile text-property-search comint ansi-color ring cua-base cus-load format-spec battery dbus xml sendmail mail-utils .loaddefs benny-tools autoload radix-tree lisp-mnt mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr edmacro kmacro info package browse-url url url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf url-util mailcap url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache json subr-x map url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice button loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote threads dbusbind inotify lcms2 dynamic-setting system-font-setting font-render-setting cairo move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 16 273770 13520) (symbols 48 18619 1) (strings 32 66582 2920) (string-bytes 1 2318045) (vectors 16 39996) (vector-slots 8 1131973 174560) (floats 8 762 66) (intervals 56 1039 60) (buffers 992 50)) --=-=-=--