unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#73133: 29.2; EWW fails to render some webpages
@ 2024-09-08 20:52 Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-09-10  6:06 ` Jim Porter
  0 siblings, 1 reply; 10+ messages in thread
From: Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-09-08 20:52 UTC (permalink / raw)
  To: 73133

[-- Attachment #1: Type: text/plain, Size: 7951 bytes --]

To Whom it may concern,

I have recently discovered the website gastonle.ru, however it does not
render with Emacs Web Wowser.  It appears to be a relatively simple
website and I cannot see what would prohibit it from rendering.

I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1
but it also fails to render.  This therefore appears to be a bug in EWW.
---

In GNU Emacs 29.2 (build 1, aarch64-apple-darwin21.6.0, NS
appkit-2113.60 Version 12.6.6 (Build 21G646)) of 2024-01-19 built on
armbob.lan
Windowing system distributor 'Apple', version 10.3.2487
System Description:  macOS 14.2.1

Configured using:
'configure --with-ns '--enable-locallisppath=/Library/Application
Support/Emacs/${version}/site-lisp:/Library/Application
Support/Emacs/site-lisp' --with-modules 'CFLAGS=-DFD_SETSIZE=10000
-DDARWIN_UNLIMITED_SELECT' --with-x-toolkit=no'

Configured features:
ACL GLIB GMP GNUTLS JPEG JSON LIBXML2 MODULES NOTIFY KQUEUE NS PDUMPER
PNG RSVG SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER ZLIB

Important settings:
  value of $LANG: en_NZ.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Markdown

Minor modes in effect:
  yas-global-mode: t
  yas-minor-mode: t
  global-git-commit-mode: t
  magit-auto-revert-mode: t
  shell-dirtrack-mode: t
  server-mode: t
  TeX-PDF-mode: t
  TeX-source-correlate-mode: t
  global-display-line-numbers-mode: t
  display-line-numbers-mode: t
  whitespace-mode: t
  global-page-break-lines-mode: t
  override-global-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
/Users/ganimard/.emacs.d/elpa/transient-20230919.2146/transient hides /Applications/Emacs.app/Contents/Resources/lisp/transient <http://Emacs.app/Contents/Resources/lisp/transient>

Features:
(shadow sort mail-extr emacsbug files-x vc-hg vc-bzr vc-src vc-sccs
vc-svn vc-cvs vc-rcs log-view vc bug-reference help-fns radix-tree
magit-patch magit-subtree magit-gitignore magit-ediff ediff ediff-merg
ediff-mult ediff-wind ediff-diff ediff-help ediff-init ediff-util
magit-extras face-remap misearch multi-isearch vc-git vc-dispatcher
markdown-mode color dired-aux disp-table hl-todo flycheck forth-mode
forth-spec forth-smie smie forth-syntax llvm-mode splunk-mode ess
lisp-mnt ess-utils ess-custom go-mode find-file ffap etags fileloop xref
rust-utils rust-mode rust-rustfmt rust-playpen rust-compile rust-cargo
yasnippet magit-submodule magit-blame magit-stash magit-reflog
magit-bisect magit-push magit-pull magit-fetch magit-clone magit-remote
magit-commit magit-sequence magit-notes magit-worktree magit-tag
magit-merge magit-branch magit-reset magit-files magit-refs magit-status
magit magit-repos magit-apply magit-wip magit-log which-func imenu
magit-diff smerge-mode diff diff-mode git-commit log-edit pcvs-util
add-log magit-core magit-autorevert autorevert magit-margin
magit-transient magit-process with-editor shell server magit-mode
transient magit-git magit-base magit-section cursor-sensor dash
auctex-latexmk latex latex-flymake flymake-proc flymake project compile
warnings tex-ispell tex-style tex texmathp latex-preview-pane doc-view
filenotify jka-compr image-mode exif auctex ebib ebib-reading-list
ebib-notes org-element org-persist xdg org-id org-refile org ob
ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-src ob-comint
org-pcomplete pcomplete comint ansi-osc ansi-color org-list org-footnote
org-faces org-entities noutline outline icons ob-emacs-lisp ob-core
ob-eval org-cycle org-table org-keys oc org-loaddefs find-func cal-menu
calendar cal-loaddefs ol org-fold org-fold-core org-compat ring avl-tree
generator org-version org-macs ebib-filters ebib-keywords ebib-utils
ebib-db message sendmail yank-media puny dired dired-loaddefs rfc822 mml
mml-sec epa derived epg rfc6068 epg-config gnus-util
text-property-search mm-decode mm-bodies mm-encode mail-parse rfc2231
rfc2047 rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils
gmm-utils mailheader format-spec parsebib rx hl-line pp crm bibtex
iso8601 time-date writeroom-mode visual-fill-column olivetti
multiple-cursors mc-separate-operations rectangular-region-mode
mc-mark-pop mc-edit-lines mc-hide-unmatched-lines-mode mc-mark-more
thingatpt mc-cycle-cursors multiple-cursors-core advice rect move-text
no-littering compat paredit edmacro kmacro display-line-numbers
whitespace page-break-lines smart-mode-line-atom-one-dark-theme cl-extra
help-mode atom-one-dark-theme use-package use-package-ensure
use-package-delight use-package-diminish use-package-bind-key bind-key
easy-mmode use-package-core finder-inf atom-one-dark-theme-autoloads
auctex-latexmk-autoloads auctex-autoloads tex-site company-autoloads
dracula-theme-autoloads ebib-autoloads ess-autoloads flycheck-autoloads
forth-mode-autoloads gdscript-mode-autoloads go-mode-autoloads
hl-todo-autoloads impatient-mode-autoloads htmlize-autoloads
julia-formatter-autoloads just-mode-autoloads
latex-preview-pane-autoloads llvm-ts-mode-autoloads lsp-docker-autoloads
lsp-julia-autoloads julia-mode-autoloads lsp-ui-autoloads
lsp-mode-autoloads ht-autoloads lv-autoloads magit-autoloads pcase
git-commit-autoloads magit-section-autoloads move-text-autoloads
multiple-cursors-autoloads no-littering-autoloads olivetti-autoloads
package-lint-autoloads page-break-lines-autoloads paredit-autoloads
parsebib-autoloads pkg-info-autoloads epl-autoloads
quelpa-use-package-autoloads quelpa-autoloads rustic-autoloads
markdown-mode-autoloads f-autoloads dash-autoloads rust-mode-autoloads
s-autoloads session-async-autoloads simple-httpd-autoloads
smart-mode-line-atom-one-dark-theme-autoloads smart-mode-line-autoloads
rich-minority-autoloads spinner-autoloads splunk-mode-autoloads
transient-autoloads with-editor-autoloads compat-autoloads info
writeroom-mode-autoloads visual-fill-column-autoloads
xterm-color-autoloads yaml-autoloads yaml-mode-autoloads
yasnippet-autoloads package browse-url url url-proxy url-privacy
url-expand url-methods url-history url-cookie generate-lisp-file
url-domsuf url-util mailcap url-handlers url-parse auth-source cl-seq
eieio eieio-core cl-macs password-cache json subr-x map byte-opt gv
bytecomp byte-compile url-vars cl-loaddefs cl-lib rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel term/ns-win ns-win ucs-normalize mule-util
term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads kqueue cocoa ns multi-tty make-network-process emacs)

Memory information:
((conses 16 412027 70117)
(symbols 48 34112 0)
(strings 32 128155 6447)
(string-bytes 1 4038566)
(vectors 16 67754)
(vector-slots 8 739746 70880)
(floats 8 294 368)
(intervals 56 6200 53)
(buffers 984 43))

[-- Attachment #2: Type: text/html, Size: 12370 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-08 20:52 bug#73133: 29.2; EWW fails to render some webpages Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-09-10  6:06 ` Jim Porter
  2024-09-21  9:13   ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Porter @ 2024-09-10  6:06 UTC (permalink / raw)
  To: Ganimard, 73133

On 9/8/2024 1:52 PM, Ganimard via Bug reports for GNU Emacs, the Swiss 
army knife of text editors wrote:
> I have recently discovered the website gastonle.ru, however it does not
> render with Emacs Web Wowser.  It appears to be a relatively simple
> website and I cannot see what would prohibit it from rendering.

Checking that page via curl, it appears that it doesn't return a 
Content-Type header. In the absence of that header, EWW assumes that the 
page is plain text.

> I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1
> but it also fails to render.  This therefore appears to be a bug in EWW.

 From my reading of RFC9110[1], this is *technically* a bug (we should 
assume application/octet-stream, not text/plain), but that wouldn't fix 
the rendering here; it would probably make things worse. However, per 
the RFC, EWW would be within its rights to guess that the page is HTML, 
e.g. by checking for "<!doctype html>". It also recommends having that 
be an option that can be disabled, which is reasonable (and in keeping 
with Emacs's design principles anyway).

[1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-10  6:06 ` Jim Porter
@ 2024-09-21  9:13   ` Eli Zaretskii
  2024-09-21 17:12     ` Jim Porter
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2024-09-21  9:13 UTC (permalink / raw)
  To: Jim Porter; +Cc: 73133, ganimard

> Date: Mon, 9 Sep 2024 23:06:56 -0700
> From: Jim Porter <jporterbugs@gmail.com>
> 
> On 9/8/2024 1:52 PM, Ganimard via Bug reports for GNU Emacs, the Swiss 
> army knife of text editors wrote:
> > I have recently discovered the website gastonle.ru, however it does not
> > render with Emacs Web Wowser.  It appears to be a relatively simple
> > website and I cannot see what would prohibit it from rendering.
> 
> Checking that page via curl, it appears that it doesn't return a 
> Content-Type header. In the absence of that header, EWW assumes that the 
> page is plain text.
> 
> > I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1
> > but it also fails to render.  This therefore appears to be a bug in EWW.
> 
>  From my reading of RFC9110[1], this is *technically* a bug (we should 
> assume application/octet-stream, not text/plain), but that wouldn't fix 
> the rendering here; it would probably make things worse. However, per 
> the RFC, EWW would be within its rights to guess that the page is HTML, 
> e.g. by checking for "<!doctype html>". It also recommends having that 
> be an option that can be disabled, which is reasonable (and in keeping 
> with Emacs's design principles anyway).
> 
> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5

Thanks.  Would someone like to submit a patch along these lines?





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-21  9:13   ` Eli Zaretskii
@ 2024-09-21 17:12     ` Jim Porter
  2024-09-23 15:43       ` Sebastián Monía
  2024-09-23 15:56       ` Sebastián Monía
  0 siblings, 2 replies; 10+ messages in thread
From: Jim Porter @ 2024-09-21 17:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 73133, ganimard

On 9/21/2024 2:13 AM, Eli Zaretskii wrote:
>> Date: Mon, 9 Sep 2024 23:06:56 -0700
>> From: Jim Porter <jporterbugs@gmail.com>
>>
>>   From my reading of RFC9110[1], this is *technically* a bug (we should
>> assume application/octet-stream, not text/plain), but that wouldn't fix
>> the rendering here; it would probably make things worse. However, per
>> the RFC, EWW would be within its rights to guess that the page is HTML,
>> e.g. by checking for "<!doctype html>". It also recommends having that
>> be an option that can be disabled, which is reasonable (and in keeping
>> with Emacs's design principles anyway).
>>
>> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5
> 
> Thanks.  Would someone like to submit a patch along these lines?

It'll probably be a couple weeks until I have time to write a patch, but 
if no one has done so by then, I'll look into it.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-21 17:12     ` Jim Porter
@ 2024-09-23 15:43       ` Sebastián Monía
  2024-09-28 10:58         ` Eli Zaretskii
  2024-09-23 15:56       ` Sebastián Monía
  1 sibling, 1 reply; 10+ messages in thread
From: Sebastián Monía @ 2024-09-23 15:43 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

Jim Porter <jporterbugs@gmail.com> writes:

> On 9/21/2024 2:13 AM, Eli Zaretskii wrote:
>>> Date: Mon, 9 Sep 2024 23:06:56 -0700
>>> From: Jim Porter <jporterbugs@gmail.com>
>>>
>>>   From my reading of RFC9110[1], this is *technically* a bug (we should
>>> assume application/octet-stream, not text/plain), but that wouldn't fix
>>> the rendering here; it would probably make things worse. However, per
>>> the RFC, EWW would be within its rights to guess that the page is HTML,
>>> e.g. by checking for "<!doctype html>". It also recommends having that
>>> be an option that can be disabled, which is reasonable (and in keeping
>>> with Emacs's design principles anyway).
>>>
>>> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5
>> Thanks.  Would someone like to submit a patch along these lines?
>
> It'll probably be a couple weeks until I have time to write a patch,
> but if no one has done so by then, I'll look into it.

Would the patch attached work?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: eww-use-doctype-fallback --]
[-- Type: text/x-patch, Size: 2863 bytes --]

From 499abe197e6d245228be853731314e19148bb658 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?=
 <sebastian.monia@sebasmonia.com>
Date: Mon, 23 Sep 2024 11:40:18 -0400
Subject: [PATCH] Add option eww-use-doctype-fallback, code to detect if a page
 has a valid doctype tag, and use it as alternative to a content-type header

---
 lisp/net/eww.el | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index a651d9d5020..59a146c8392 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -170,6 +170,14 @@ the first item is the program, and the rest are the arguments."
   :type '(choice (const :tag "Never" nil)
                  regexp))
 
+(defcustom eww-use-doctype-fallback t
+  "Accept a DOCTYPE tag as evidence that page content is HTML.
+This is used only when the page does not have a valid Content-Type
+header."
+  :version "30.1"
+  :group 'eww
+  :type 'boolean)
+
 (defcustom eww-browse-url-new-window-is-tab 'tab-bar
   "Whether to open up new windows in a tab or a new buffer.
 If t, then open the URL in a new tab rather than a new buffer if
@@ -630,6 +638,18 @@ Currently this means either text/html or application/xhtml+xml."
   (member content-type '("text/html"
 			 "application/xhtml+xml")))
 
+(defun eww--doctype-html-p (data-buffer)
+  "Return non-nil if DATA-BUFFER contains a doctype declaration."
+  ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype
+  (let ((case-fold-search t)
+        (target
+         "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)"))
+    (with-current-buffer data-buffer
+      (goto-char (point-min))
+      ;; match basic <!doctype html> and also legacy variants as
+      ;; specified in link above
+      (re-search-forward target nil t))))
+
 (defun eww--rename-buffer ()
   "Rename the current EWW buffer.
 The renaming scheme is performed in accordance with
@@ -695,7 +715,9 @@ The renaming scheme is performed in accordance with
                               url))
               (goto-char (point-min))
               (eww-display-html (or encode charset) url nil point buffer))
-	     ((eww-html-p (car content-type))
+	     ((or (eww-html-p (car content-type))
+                  (and eww-use-doctype-fallback
+                       (eww--doctype-html-p data-buffer)))
               (eww-display-html (or encode charset) url nil point buffer))
 	     ((equal (car content-type) "application/pdf")
 	      (eww-display-pdf))
@@ -717,7 +739,7 @@ The renaming scheme is performed in accordance with
               (setq buffer-undo-list nil)))
         (kill-buffer data-buffer)))
     (unless (buffer-live-p buffer)
-      (kill-buffer data-buffer))))
+      (kill-buffer data-buffer)))
 
 (defun eww-parse-headers ()
   (let ((headers nil))
-- 
2.45.2.windows.1


[-- Attachment #3: Type: text/plain, Size: 54 bytes --]


-- 
Sebastián Monía
https://site.sebasmonia.com/

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-21 17:12     ` Jim Porter
  2024-09-23 15:43       ` Sebastián Monía
@ 2024-09-23 15:56       ` Sebastián Monía
  2024-09-24 18:31         ` Jim Porter
  1 sibling, 1 reply; 10+ messages in thread
From: Sebastián Monía @ 2024-09-23 15:56 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard

[-- Attachment #1: Type: text/plain, Size: 158 bytes --]


Hi all,

Would something like the attached patch work?

Thanks,
Seb

PS: I think I sent this to just one person by mistake instead of a wide
reply, my bad.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: eww-use-doctype-fallback --]
[-- Type: text/x-patch, Size: 2863 bytes --]

From 499abe197e6d245228be853731314e19148bb658 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?=
 <sebastian.monia@sebasmonia.com>
Date: Mon, 23 Sep 2024 11:40:18 -0400
Subject: [PATCH] Add option eww-use-doctype-fallback, code to detect if a page
 has a valid doctype tag, and use it as alternative to a content-type header

---
 lisp/net/eww.el | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index a651d9d5020..59a146c8392 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -170,6 +170,14 @@ the first item is the program, and the rest are the arguments."
   :type '(choice (const :tag "Never" nil)
                  regexp))
 
+(defcustom eww-use-doctype-fallback t
+  "Accept a DOCTYPE tag as evidence that page content is HTML.
+This is used only when the page does not have a valid Content-Type
+header."
+  :version "30.1"
+  :group 'eww
+  :type 'boolean)
+
 (defcustom eww-browse-url-new-window-is-tab 'tab-bar
   "Whether to open up new windows in a tab or a new buffer.
 If t, then open the URL in a new tab rather than a new buffer if
@@ -630,6 +638,18 @@ Currently this means either text/html or application/xhtml+xml."
   (member content-type '("text/html"
 			 "application/xhtml+xml")))
 
+(defun eww--doctype-html-p (data-buffer)
+  "Return non-nil if DATA-BUFFER contains a doctype declaration."
+  ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype
+  (let ((case-fold-search t)
+        (target
+         "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)"))
+    (with-current-buffer data-buffer
+      (goto-char (point-min))
+      ;; match basic <!doctype html> and also legacy variants as
+      ;; specified in link above
+      (re-search-forward target nil t))))
+
 (defun eww--rename-buffer ()
   "Rename the current EWW buffer.
 The renaming scheme is performed in accordance with
@@ -695,7 +715,9 @@ The renaming scheme is performed in accordance with
                               url))
               (goto-char (point-min))
               (eww-display-html (or encode charset) url nil point buffer))
-	     ((eww-html-p (car content-type))
+	     ((or (eww-html-p (car content-type))
+                  (and eww-use-doctype-fallback
+                       (eww--doctype-html-p data-buffer)))
               (eww-display-html (or encode charset) url nil point buffer))
 	     ((equal (car content-type) "application/pdf")
 	      (eww-display-pdf))
@@ -717,7 +739,7 @@ The renaming scheme is performed in accordance with
               (setq buffer-undo-list nil)))
         (kill-buffer data-buffer)))
     (unless (buffer-live-p buffer)
-      (kill-buffer data-buffer))))
+      (kill-buffer data-buffer)))
 
 (defun eww-parse-headers ()
   (let ((headers nil))
-- 
2.45.2.windows.1


[-- Attachment #3: Type: text/plain, Size: 54 bytes --]


-- 
Sebastián Monía
https://site.sebasmonia.com/

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-23 15:56       ` Sebastián Monía
@ 2024-09-24 18:31         ` Jim Porter
  2024-09-25 20:46           ` Sebastián Monía
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Porter @ 2024-09-24 18:31 UTC (permalink / raw)
  To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard

On 9/23/2024 8:56 AM, Sebastián Monía wrote:
> Would something like the attached patch work?

I was actually thinking something more general, like a defcustom named 
'eww-guess-content-type-functions', which would be a list of functions 
where the first non-nil result is the guessed Content-Type. That way, we 
could extend this to other content types (for example, maybe we'd want 
to look for the magic headers for various image formats too; we don't 
have to do that in this bug).

I think your 'eww--doctype-html-p' function would work nicely with a 
couple small tweaks as one of the functions in 
'eww-guess-content-type-functions' though.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-24 18:31         ` Jim Porter
@ 2024-09-25 20:46           ` Sebastián Monía
  2024-09-26  1:59             ` Jim Porter
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastián Monía @ 2024-09-25 20:46 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard

Hi Jim,

Jim Porter <jporterbugs@gmail.com> writes:
> I was actually thinking something more general, like a defcustom named
> 'eww-guess-content-type-functions', which would be a list of functions
> where the first non-nil result is the guessed Content-Type. That way,
> we could extend this to other content types (for example, maybe we'd
> want to look for the magic headers for various image formats too; we
> don't have to do that in this bug).

I think the functions for the new defcustom should accept the
content-type, headers (since both are already parsed by that time), and
the entire buffer. If you agree, I can give your suggestion a shot, if
not let me know what do you think would work.

> I think your 'eww--doctype-html-p' function would work nicely with a
> couple small tweaks as one of the functions in
> 'eww-guess-content-type-functions' though.

Thanks!
I would also have the current '(eww-html-p (car content-type))' wrapped
in a function `eww--content-type-html-p` and put both functions in the
defcustom, first content type then doctype.



-- 
Sebastián Monía
https://site.sebasmonia.com/





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-25 20:46           ` Sebastián Monía
@ 2024-09-26  1:59             ` Jim Porter
  0 siblings, 0 replies; 10+ messages in thread
From: Jim Porter @ 2024-09-26  1:59 UTC (permalink / raw)
  To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard

On 9/25/2024 1:46 PM, Sebastián Monía wrote:
> Jim Porter <jporterbugs@gmail.com> writes:
>> I was actually thinking something more general, like a defcustom named
>> 'eww-guess-content-type-functions', which would be a list of functions
>> where the first non-nil result is the guessed Content-Type. That way,
>> we could extend this to other content types (for example, maybe we'd
>> want to look for the magic headers for various image formats too; we
>> don't have to do that in this bug).
> 
> I think the functions for the new defcustom should accept the
> content-type, headers (since both are already parsed by that time), and
> the entire buffer. If you agree, I can give your suggestion a shot, if
> not let me know what do you think would work.

I think we'd only want to run this hook if the Content-Type is absent 
from the headers (its job is to *guess* a content type, after all), so 
I'd expect the signature to be the list of headers + the buffer.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-23 15:43       ` Sebastián Monía
@ 2024-09-28 10:58         ` Eli Zaretskii
  0 siblings, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2024-09-28 10:58 UTC (permalink / raw)
  To: Sebastián Monía; +Cc: jporterbugs, 73133, ganimard

> From: Sebastián Monía <sebastian@sebasmonia.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  73133@debbugs.gnu.org,  ganimard@tuta.io
> Date: Mon, 23 Sep 2024 11:43:36 -0400
> 
> +(defcustom eww-use-doctype-fallback t
> +  "Accept a DOCTYPE tag as evidence that page content is HTML.

This should say

  "Whether to accept the DOCTYPE tag as evidence that page content is HTML."

> +This is used only when the page does not have a valid Content-Type
> +header."
> +  :version "30.1"
               ^^^^
This should be "31.1"

> +(defun eww--doctype-html-p (data-buffer)
> +  "Return non-nil if DATA-BUFFER contains a doctype declaration."

Not just "doctype declaration", but "HTML doctype declaration", right?





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-09-28 10:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-08 20:52 bug#73133: 29.2; EWW fails to render some webpages Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-10  6:06 ` Jim Porter
2024-09-21  9:13   ` Eli Zaretskii
2024-09-21 17:12     ` Jim Porter
2024-09-23 15:43       ` Sebastián Monía
2024-09-28 10:58         ` Eli Zaretskii
2024-09-23 15:56       ` Sebastián Monía
2024-09-24 18:31         ` Jim Porter
2024-09-25 20:46           ` Sebastián Monía
2024-09-26  1:59             ` Jim Porter

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).