all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#61005: 28.1.91; Encoding not detected in HTML files inside archives
@ 2023-01-22 13:13 Benjamin Riefenstahl
  2023-01-22 13:24 ` Benjamin Riefenstahl
  0 siblings, 1 reply; 3+ messages in thread
From: Benjamin Riefenstahl @ 2023-01-22 13:13 UTC (permalink / raw)
  To: 61005

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]

Problem
----

* Given an HTML file with charset "windows-1255". 

* Opening the file from disk detects the encoding correctly.

* Opening a ZIP archive with the same file inside and than opening the
  HTML archive member does not detect the encoding, instead the coding
  system for saving is the default according to M-x
  describe-coding-system.

Attached are two files test.html and test.zip.  Call "emacs -Q test.html
test.zip" and press RET on the archive member to reproduce.


[-- Attachment #2: test.html --]
[-- Type: text/html, Size: 172 bytes --]

[-- Attachment #3: test.zip --]
[-- Type: application/zip, Size: 279 bytes --]

[-- Attachment #4: Type: text/plain, Size: 5626 bytes --]


Solution
----

The problem seems to be the function
sgml-html-meta-auto-coding-function.  It is missing a condition similar
to the one added to code in sgml-xml-auto-coding-function with commit
#df7ed10e in 2018.

modified   lisp/international/mule.el
@@ -2539,6 +2539,10 @@ sgml-html-meta-auto-coding-function
                   (bfcs-type
                    (coding-system-type buffer-file-coding-system)))
               (if (and enable-multibyte-characters
+                       ;; 'charset' will signal an error in
+                       ;; coding-system-equal, since it isn't a
+                       ;; coding-system.  So test that up front.
+                       (not (equal sym-type 'charset))
                        (coding-system-equal 'utf-8 sym-type)
                        (coding-system-equal 'utf-8 bfcs-type))
                   buffer-file-coding-system

I will send this as a patch as soon as I have a bug number to mention in
the commit message.

----

In GNU Emacs 28.1.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0)
 of 2022-08-29 built on arrian
Repository revision: f4168b8143008b787a11366462c928d761e90dd0
Repository branch: emacs-28
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux 11 (bullseye)

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE
XIM XPM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Dired by date

Minor modes in effect:
  shell-dirtrack-mode: t
  desktop-save-mode: t
  display-time-mode: t
  xclip-mode: t
  xterm-mouse-mode: t
  delete-selection-mode: t
  cua-mode: t
  display-battery-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
~/Projects/ttf-mode/arc-mode-compat hides ~/emacs/arc-mode-compat
/home/benny/.emacs.d/elpa/transient-20210723.1601/transient hides /usr/local/share/emacs/28.1.91/lisp/transient
/home/benny/.emacs.d/elpa/dictionary-20201001.1727/dictionary hides /usr/local/share/emacs/28.1.91/lisp/net/dictionary

Features:
(shadow sort mail-extr emacsbug message rmc puny rfc822 mml mml-sec epa
epg rfc6068 epg-config gnus-util rmail rmail-loaddefs mm-decode
mm-bodies mm-encode mailabbrev gmm-utils mailheader arc-mode
archive-mode benny-images dirtrack shell pcomplete misearch
multi-isearch thai-util thai-word lao-util enriched view tabify
benny-auto-insert ttf-glyphs rng-xsd xsd-regexp rng-cmpct rng-nxml
rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util
rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap sgml-mode facemenu dom
nxml-util nxml-enc xmltok mule-util jka-compr dired-aux time-date
bug-reference imenu desktop frameset highline benny-calendar-cfg
ange-ftp generic-x autoinsert cc-mode cc-fonts cc-guess cc-menus
cc-styles cc-align cc-cmds cc-engine cc-vars cc-defs ps-print
ps-print-loaddefs ps-def lpr advice cl-extra help-mode dired
dired-loaddefs derived benny-x-clipboard disp-table time server protbuf
xclip term/xterm xterm xt-mouse cal-china lunar solar cal-dst cal-bahai
cal-islam cal-hebrew holidays hol-loaddefs vc-git diff-mode easy-mmode
vc-dispatcher vc-fossil diary-lib diary-loaddefs cal-menu calendar
cal-loaddefs delsel grep compile text-property-search comint ansi-color
ring cua-base cus-load format-spec battery dbus xml sendmail mail-utils
.loaddefs benny-tools autoload radix-tree lisp-mnt mail-parse rfc2231
rfc2047 rfc2045 mm-util ietf-drums mail-prsvr edmacro kmacro info
package browse-url url url-proxy url-privacy url-expand url-methods
url-history url-cookie url-domsuf url-util mailcap url-handlers
url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache json subr-x map url-vars seq byte-opt gv bytecomp
byte-compile cconv cl-loaddefs cl-lib iso-transl tooltip eldoc paren
electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice
button loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote threads dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting cairo
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 273770 13520)
 (symbols 48 18619 1)
 (strings 32 66582 2920)
 (string-bytes 1 2318045)
 (vectors 16 39996)
 (vector-slots 8 1131973 174560)
 (floats 8 762 66)
 (intervals 56 1039 60)
 (buffers 992 50))

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-22 14:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-22 13:13 bug#61005: 28.1.91; Encoding not detected in HTML files inside archives Benjamin Riefenstahl
2023-01-22 13:24 ` Benjamin Riefenstahl
2023-01-22 14:09   ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.