From: Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net>
To: 61005@debbugs.gnu.org
Subject: bug#61005: 28.1.91; Encoding not detected in HTML files inside archives
Date: Sun, 22 Jan 2023 14:13:50 +0100 [thread overview]
Message-ID: <87bkmqempd.fsf@turtle-trading.net> (raw)
[-- Attachment #1: Type: text/plain, Size: 483 bytes --]
Problem
----
* Given an HTML file with charset "windows-1255".
* Opening the file from disk detects the encoding correctly.
* Opening a ZIP archive with the same file inside and than opening the
HTML archive member does not detect the encoding, instead the coding
system for saving is the default according to M-x
describe-coding-system.
Attached are two files test.html and test.zip. Call "emacs -Q test.html
test.zip" and press RET on the archive member to reproduce.
[-- Attachment #2: test.html --]
[-- Type: text/html, Size: 172 bytes --]
[-- Attachment #3: test.zip --]
[-- Type: application/zip, Size: 279 bytes --]
[-- Attachment #4: Type: text/plain, Size: 5626 bytes --]
Solution
----
The problem seems to be the function
sgml-html-meta-auto-coding-function. It is missing a condition similar
to the one added to code in sgml-xml-auto-coding-function with commit
#df7ed10e in 2018.
modified lisp/international/mule.el
@@ -2539,6 +2539,10 @@ sgml-html-meta-auto-coding-function
(bfcs-type
(coding-system-type buffer-file-coding-system)))
(if (and enable-multibyte-characters
+ ;; 'charset' will signal an error in
+ ;; coding-system-equal, since it isn't a
+ ;; coding-system. So test that up front.
+ (not (equal sym-type 'charset))
(coding-system-equal 'utf-8 sym-type)
(coding-system-equal 'utf-8 bfcs-type))
buffer-file-coding-system
I will send this as a patch as soon as I have a bug number to mention in
the commit message.
----
In GNU Emacs 28.1.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0)
of 2022-08-29 built on arrian
Repository revision: f4168b8143008b787a11366462c928d761e90dd0
Repository branch: emacs-28
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux 11 (bullseye)
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE
XIM XPM GTK3 ZLIB
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Dired by date
Minor modes in effect:
shell-dirtrack-mode: t
desktop-save-mode: t
display-time-mode: t
xclip-mode: t
xterm-mouse-mode: t
delete-selection-mode: t
cua-mode: t
display-battery-mode: t
tooltip-mode: t
global-eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
buffer-read-only: t
column-number-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
~/Projects/ttf-mode/arc-mode-compat hides ~/emacs/arc-mode-compat
/home/benny/.emacs.d/elpa/transient-20210723.1601/transient hides /usr/local/share/emacs/28.1.91/lisp/transient
/home/benny/.emacs.d/elpa/dictionary-20201001.1727/dictionary hides /usr/local/share/emacs/28.1.91/lisp/net/dictionary
Features:
(shadow sort mail-extr emacsbug message rmc puny rfc822 mml mml-sec epa
epg rfc6068 epg-config gnus-util rmail rmail-loaddefs mm-decode
mm-bodies mm-encode mailabbrev gmm-utils mailheader arc-mode
archive-mode benny-images dirtrack shell pcomplete misearch
multi-isearch thai-util thai-word lao-util enriched view tabify
benny-auto-insert ttf-glyphs rng-xsd xsd-regexp rng-cmpct rng-nxml
rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util
rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap sgml-mode facemenu dom
nxml-util nxml-enc xmltok mule-util jka-compr dired-aux time-date
bug-reference imenu desktop frameset highline benny-calendar-cfg
ange-ftp generic-x autoinsert cc-mode cc-fonts cc-guess cc-menus
cc-styles cc-align cc-cmds cc-engine cc-vars cc-defs ps-print
ps-print-loaddefs ps-def lpr advice cl-extra help-mode dired
dired-loaddefs derived benny-x-clipboard disp-table time server protbuf
xclip term/xterm xterm xt-mouse cal-china lunar solar cal-dst cal-bahai
cal-islam cal-hebrew holidays hol-loaddefs vc-git diff-mode easy-mmode
vc-dispatcher vc-fossil diary-lib diary-loaddefs cal-menu calendar
cal-loaddefs delsel grep compile text-property-search comint ansi-color
ring cua-base cus-load format-spec battery dbus xml sendmail mail-utils
.loaddefs benny-tools autoload radix-tree lisp-mnt mail-parse rfc2231
rfc2047 rfc2045 mm-util ietf-drums mail-prsvr edmacro kmacro info
package browse-url url url-proxy url-privacy url-expand url-methods
url-history url-cookie url-domsuf url-util mailcap url-handlers
url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache json subr-x map url-vars seq byte-opt gv bytecomp
byte-compile cconv cl-loaddefs cl-lib iso-transl tooltip eldoc paren
electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice
button loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote threads dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting cairo
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)
Memory information:
((conses 16 273770 13520)
(symbols 48 18619 1)
(strings 32 66582 2920)
(string-bytes 1 2318045)
(vectors 16 39996)
(vector-slots 8 1131973 174560)
(floats 8 762 66)
(intervals 56 1039 60)
(buffers 992 50))
next reply other threads:[~2023-01-22 13:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-22 13:13 Benjamin Riefenstahl [this message]
2023-01-22 13:24 ` bug#61005: 28.1.91; Encoding not detected in HTML files inside archives Benjamin Riefenstahl
2023-01-22 14:09 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bkmqempd.fsf@turtle-trading.net \
--to=b.riefenstahl@turtle-trading.net \
--cc=61005@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.