unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#73133: 29.2; EWW fails to render some webpages
@ 2024-09-08 20:52 Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-09-10  6:06 ` Jim Porter
  0 siblings, 1 reply; 12+ messages in thread
From: Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-09-08 20:52 UTC (permalink / raw)
  To: 73133

[-- Attachment #1: Type: text/plain, Size: 7951 bytes --]

To Whom it may concern,

I have recently discovered the website gastonle.ru, however it does not
render with Emacs Web Wowser.  It appears to be a relatively simple
website and I cannot see what would prohibit it from rendering.

I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1
but it also fails to render.  This therefore appears to be a bug in EWW.
---

In GNU Emacs 29.2 (build 1, aarch64-apple-darwin21.6.0, NS
appkit-2113.60 Version 12.6.6 (Build 21G646)) of 2024-01-19 built on
armbob.lan
Windowing system distributor 'Apple', version 10.3.2487
System Description:  macOS 14.2.1

Configured using:
'configure --with-ns '--enable-locallisppath=/Library/Application
Support/Emacs/${version}/site-lisp:/Library/Application
Support/Emacs/site-lisp' --with-modules 'CFLAGS=-DFD_SETSIZE=10000
-DDARWIN_UNLIMITED_SELECT' --with-x-toolkit=no'

Configured features:
ACL GLIB GMP GNUTLS JPEG JSON LIBXML2 MODULES NOTIFY KQUEUE NS PDUMPER
PNG RSVG SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER ZLIB

Important settings:
  value of $LANG: en_NZ.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Markdown

Minor modes in effect:
  yas-global-mode: t
  yas-minor-mode: t
  global-git-commit-mode: t
  magit-auto-revert-mode: t
  shell-dirtrack-mode: t
  server-mode: t
  TeX-PDF-mode: t
  TeX-source-correlate-mode: t
  global-display-line-numbers-mode: t
  display-line-numbers-mode: t
  whitespace-mode: t
  global-page-break-lines-mode: t
  override-global-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
/Users/ganimard/.emacs.d/elpa/transient-20230919.2146/transient hides /Applications/Emacs.app/Contents/Resources/lisp/transient <http://Emacs.app/Contents/Resources/lisp/transient>

Features:
(shadow sort mail-extr emacsbug files-x vc-hg vc-bzr vc-src vc-sccs
vc-svn vc-cvs vc-rcs log-view vc bug-reference help-fns radix-tree
magit-patch magit-subtree magit-gitignore magit-ediff ediff ediff-merg
ediff-mult ediff-wind ediff-diff ediff-help ediff-init ediff-util
magit-extras face-remap misearch multi-isearch vc-git vc-dispatcher
markdown-mode color dired-aux disp-table hl-todo flycheck forth-mode
forth-spec forth-smie smie forth-syntax llvm-mode splunk-mode ess
lisp-mnt ess-utils ess-custom go-mode find-file ffap etags fileloop xref
rust-utils rust-mode rust-rustfmt rust-playpen rust-compile rust-cargo
yasnippet magit-submodule magit-blame magit-stash magit-reflog
magit-bisect magit-push magit-pull magit-fetch magit-clone magit-remote
magit-commit magit-sequence magit-notes magit-worktree magit-tag
magit-merge magit-branch magit-reset magit-files magit-refs magit-status
magit magit-repos magit-apply magit-wip magit-log which-func imenu
magit-diff smerge-mode diff diff-mode git-commit log-edit pcvs-util
add-log magit-core magit-autorevert autorevert magit-margin
magit-transient magit-process with-editor shell server magit-mode
transient magit-git magit-base magit-section cursor-sensor dash
auctex-latexmk latex latex-flymake flymake-proc flymake project compile
warnings tex-ispell tex-style tex texmathp latex-preview-pane doc-view
filenotify jka-compr image-mode exif auctex ebib ebib-reading-list
ebib-notes org-element org-persist xdg org-id org-refile org ob
ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-src ob-comint
org-pcomplete pcomplete comint ansi-osc ansi-color org-list org-footnote
org-faces org-entities noutline outline icons ob-emacs-lisp ob-core
ob-eval org-cycle org-table org-keys oc org-loaddefs find-func cal-menu
calendar cal-loaddefs ol org-fold org-fold-core org-compat ring avl-tree
generator org-version org-macs ebib-filters ebib-keywords ebib-utils
ebib-db message sendmail yank-media puny dired dired-loaddefs rfc822 mml
mml-sec epa derived epg rfc6068 epg-config gnus-util
text-property-search mm-decode mm-bodies mm-encode mail-parse rfc2231
rfc2047 rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils
gmm-utils mailheader format-spec parsebib rx hl-line pp crm bibtex
iso8601 time-date writeroom-mode visual-fill-column olivetti
multiple-cursors mc-separate-operations rectangular-region-mode
mc-mark-pop mc-edit-lines mc-hide-unmatched-lines-mode mc-mark-more
thingatpt mc-cycle-cursors multiple-cursors-core advice rect move-text
no-littering compat paredit edmacro kmacro display-line-numbers
whitespace page-break-lines smart-mode-line-atom-one-dark-theme cl-extra
help-mode atom-one-dark-theme use-package use-package-ensure
use-package-delight use-package-diminish use-package-bind-key bind-key
easy-mmode use-package-core finder-inf atom-one-dark-theme-autoloads
auctex-latexmk-autoloads auctex-autoloads tex-site company-autoloads
dracula-theme-autoloads ebib-autoloads ess-autoloads flycheck-autoloads
forth-mode-autoloads gdscript-mode-autoloads go-mode-autoloads
hl-todo-autoloads impatient-mode-autoloads htmlize-autoloads
julia-formatter-autoloads just-mode-autoloads
latex-preview-pane-autoloads llvm-ts-mode-autoloads lsp-docker-autoloads
lsp-julia-autoloads julia-mode-autoloads lsp-ui-autoloads
lsp-mode-autoloads ht-autoloads lv-autoloads magit-autoloads pcase
git-commit-autoloads magit-section-autoloads move-text-autoloads
multiple-cursors-autoloads no-littering-autoloads olivetti-autoloads
package-lint-autoloads page-break-lines-autoloads paredit-autoloads
parsebib-autoloads pkg-info-autoloads epl-autoloads
quelpa-use-package-autoloads quelpa-autoloads rustic-autoloads
markdown-mode-autoloads f-autoloads dash-autoloads rust-mode-autoloads
s-autoloads session-async-autoloads simple-httpd-autoloads
smart-mode-line-atom-one-dark-theme-autoloads smart-mode-line-autoloads
rich-minority-autoloads spinner-autoloads splunk-mode-autoloads
transient-autoloads with-editor-autoloads compat-autoloads info
writeroom-mode-autoloads visual-fill-column-autoloads
xterm-color-autoloads yaml-autoloads yaml-mode-autoloads
yasnippet-autoloads package browse-url url url-proxy url-privacy
url-expand url-methods url-history url-cookie generate-lisp-file
url-domsuf url-util mailcap url-handlers url-parse auth-source cl-seq
eieio eieio-core cl-macs password-cache json subr-x map byte-opt gv
bytecomp byte-compile url-vars cl-loaddefs cl-lib rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel term/ns-win ns-win ucs-normalize mule-util
term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads kqueue cocoa ns multi-tty make-network-process emacs)

Memory information:
((conses 16 412027 70117)
(symbols 48 34112 0)
(strings 32 128155 6447)
(string-bytes 1 4038566)
(vectors 16 67754)
(vector-slots 8 739746 70880)
(floats 8 294 368)
(intervals 56 6200 53)
(buffers 984 43))

[-- Attachment #2: Type: text/html, Size: 12370 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-08 20:52 bug#73133: 29.2; EWW fails to render some webpages Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-09-10  6:06 ` Jim Porter
  2024-09-21  9:13   ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Porter @ 2024-09-10  6:06 UTC (permalink / raw)
  To: Ganimard, 73133

On 9/8/2024 1:52 PM, Ganimard via Bug reports for GNU Emacs, the Swiss 
army knife of text editors wrote:
> I have recently discovered the website gastonle.ru, however it does not
> render with Emacs Web Wowser.  It appears to be a relatively simple
> website and I cannot see what would prohibit it from rendering.

Checking that page via curl, it appears that it doesn't return a 
Content-Type header. In the absence of that header, EWW assumes that the 
page is plain text.

> I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1
> but it also fails to render.  This therefore appears to be a bug in EWW.

 From my reading of RFC9110[1], this is *technically* a bug (we should 
assume application/octet-stream, not text/plain), but that wouldn't fix 
the rendering here; it would probably make things worse. However, per 
the RFC, EWW would be within its rights to guess that the page is HTML, 
e.g. by checking for "<!doctype html>". It also recommends having that 
be an option that can be disabled, which is reasonable (and in keeping 
with Emacs's design principles anyway).

[1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-10  6:06 ` Jim Porter
@ 2024-09-21  9:13   ` Eli Zaretskii
  2024-09-21 17:12     ` Jim Porter
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2024-09-21  9:13 UTC (permalink / raw)
  To: Jim Porter; +Cc: 73133, ganimard

> Date: Mon, 9 Sep 2024 23:06:56 -0700
> From: Jim Porter <jporterbugs@gmail.com>
> 
> On 9/8/2024 1:52 PM, Ganimard via Bug reports for GNU Emacs, the Swiss 
> army knife of text editors wrote:
> > I have recently discovered the website gastonle.ru, however it does not
> > render with Emacs Web Wowser.  It appears to be a relatively simple
> > website and I cannot see what would prohibit it from rendering.
> 
> Checking that page via curl, it appears that it doesn't return a 
> Content-Type header. In the absence of that header, EWW assumes that the 
> page is plain text.
> 
> > I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1
> > but it also fails to render.  This therefore appears to be a bug in EWW.
> 
>  From my reading of RFC9110[1], this is *technically* a bug (we should 
> assume application/octet-stream, not text/plain), but that wouldn't fix 
> the rendering here; it would probably make things worse. However, per 
> the RFC, EWW would be within its rights to guess that the page is HTML, 
> e.g. by checking for "<!doctype html>". It also recommends having that 
> be an option that can be disabled, which is reasonable (and in keeping 
> with Emacs's design principles anyway).
> 
> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5

Thanks.  Would someone like to submit a patch along these lines?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-21  9:13   ` Eli Zaretskii
@ 2024-09-21 17:12     ` Jim Porter
  2024-09-23 15:43       ` Sebastián Monía
  2024-09-23 15:56       ` Sebastián Monía
  0 siblings, 2 replies; 12+ messages in thread
From: Jim Porter @ 2024-09-21 17:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 73133, ganimard

On 9/21/2024 2:13 AM, Eli Zaretskii wrote:
>> Date: Mon, 9 Sep 2024 23:06:56 -0700
>> From: Jim Porter <jporterbugs@gmail.com>
>>
>>   From my reading of RFC9110[1], this is *technically* a bug (we should
>> assume application/octet-stream, not text/plain), but that wouldn't fix
>> the rendering here; it would probably make things worse. However, per
>> the RFC, EWW would be within its rights to guess that the page is HTML,
>> e.g. by checking for "<!doctype html>". It also recommends having that
>> be an option that can be disabled, which is reasonable (and in keeping
>> with Emacs's design principles anyway).
>>
>> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5
> 
> Thanks.  Would someone like to submit a patch along these lines?

It'll probably be a couple weeks until I have time to write a patch, but 
if no one has done so by then, I'll look into it.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-21 17:12     ` Jim Porter
@ 2024-09-23 15:43       ` Sebastián Monía
  2024-09-28 10:58         ` Eli Zaretskii
  2024-09-23 15:56       ` Sebastián Monía
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastián Monía @ 2024-09-23 15:43 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

Jim Porter <jporterbugs@gmail.com> writes:

> On 9/21/2024 2:13 AM, Eli Zaretskii wrote:
>>> Date: Mon, 9 Sep 2024 23:06:56 -0700
>>> From: Jim Porter <jporterbugs@gmail.com>
>>>
>>>   From my reading of RFC9110[1], this is *technically* a bug (we should
>>> assume application/octet-stream, not text/plain), but that wouldn't fix
>>> the rendering here; it would probably make things worse. However, per
>>> the RFC, EWW would be within its rights to guess that the page is HTML,
>>> e.g. by checking for "<!doctype html>". It also recommends having that
>>> be an option that can be disabled, which is reasonable (and in keeping
>>> with Emacs's design principles anyway).
>>>
>>> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5
>> Thanks.  Would someone like to submit a patch along these lines?
>
> It'll probably be a couple weeks until I have time to write a patch,
> but if no one has done so by then, I'll look into it.

Would the patch attached work?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: eww-use-doctype-fallback --]
[-- Type: text/x-patch, Size: 2863 bytes --]

From 499abe197e6d245228be853731314e19148bb658 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?=
 <sebastian.monia@sebasmonia.com>
Date: Mon, 23 Sep 2024 11:40:18 -0400
Subject: [PATCH] Add option eww-use-doctype-fallback, code to detect if a page
 has a valid doctype tag, and use it as alternative to a content-type header

---
 lisp/net/eww.el | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index a651d9d5020..59a146c8392 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -170,6 +170,14 @@ the first item is the program, and the rest are the arguments."
   :type '(choice (const :tag "Never" nil)
                  regexp))
 
+(defcustom eww-use-doctype-fallback t
+  "Accept a DOCTYPE tag as evidence that page content is HTML.
+This is used only when the page does not have a valid Content-Type
+header."
+  :version "30.1"
+  :group 'eww
+  :type 'boolean)
+
 (defcustom eww-browse-url-new-window-is-tab 'tab-bar
   "Whether to open up new windows in a tab or a new buffer.
 If t, then open the URL in a new tab rather than a new buffer if
@@ -630,6 +638,18 @@ Currently this means either text/html or application/xhtml+xml."
   (member content-type '("text/html"
 			 "application/xhtml+xml")))
 
+(defun eww--doctype-html-p (data-buffer)
+  "Return non-nil if DATA-BUFFER contains a doctype declaration."
+  ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype
+  (let ((case-fold-search t)
+        (target
+         "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)"))
+    (with-current-buffer data-buffer
+      (goto-char (point-min))
+      ;; match basic <!doctype html> and also legacy variants as
+      ;; specified in link above
+      (re-search-forward target nil t))))
+
 (defun eww--rename-buffer ()
   "Rename the current EWW buffer.
 The renaming scheme is performed in accordance with
@@ -695,7 +715,9 @@ The renaming scheme is performed in accordance with
                               url))
               (goto-char (point-min))
               (eww-display-html (or encode charset) url nil point buffer))
-	     ((eww-html-p (car content-type))
+	     ((or (eww-html-p (car content-type))
+                  (and eww-use-doctype-fallback
+                       (eww--doctype-html-p data-buffer)))
               (eww-display-html (or encode charset) url nil point buffer))
 	     ((equal (car content-type) "application/pdf")
 	      (eww-display-pdf))
@@ -717,7 +739,7 @@ The renaming scheme is performed in accordance with
               (setq buffer-undo-list nil)))
         (kill-buffer data-buffer)))
     (unless (buffer-live-p buffer)
-      (kill-buffer data-buffer))))
+      (kill-buffer data-buffer)))
 
 (defun eww-parse-headers ()
   (let ((headers nil))
-- 
2.45.2.windows.1


[-- Attachment #3: Type: text/plain, Size: 54 bytes --]


-- 
Sebastián Monía
https://site.sebasmonia.com/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-21 17:12     ` Jim Porter
  2024-09-23 15:43       ` Sebastián Monía
@ 2024-09-23 15:56       ` Sebastián Monía
  2024-09-24 18:31         ` Jim Porter
  1 sibling, 1 reply; 12+ messages in thread
From: Sebastián Monía @ 2024-09-23 15:56 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard

[-- Attachment #1: Type: text/plain, Size: 158 bytes --]


Hi all,

Would something like the attached patch work?

Thanks,
Seb

PS: I think I sent this to just one person by mistake instead of a wide
reply, my bad.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: eww-use-doctype-fallback --]
[-- Type: text/x-patch, Size: 2863 bytes --]

From 499abe197e6d245228be853731314e19148bb658 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?=
 <sebastian.monia@sebasmonia.com>
Date: Mon, 23 Sep 2024 11:40:18 -0400
Subject: [PATCH] Add option eww-use-doctype-fallback, code to detect if a page
 has a valid doctype tag, and use it as alternative to a content-type header

---
 lisp/net/eww.el | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index a651d9d5020..59a146c8392 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -170,6 +170,14 @@ the first item is the program, and the rest are the arguments."
   :type '(choice (const :tag "Never" nil)
                  regexp))
 
+(defcustom eww-use-doctype-fallback t
+  "Accept a DOCTYPE tag as evidence that page content is HTML.
+This is used only when the page does not have a valid Content-Type
+header."
+  :version "30.1"
+  :group 'eww
+  :type 'boolean)
+
 (defcustom eww-browse-url-new-window-is-tab 'tab-bar
   "Whether to open up new windows in a tab or a new buffer.
 If t, then open the URL in a new tab rather than a new buffer if
@@ -630,6 +638,18 @@ Currently this means either text/html or application/xhtml+xml."
   (member content-type '("text/html"
 			 "application/xhtml+xml")))
 
+(defun eww--doctype-html-p (data-buffer)
+  "Return non-nil if DATA-BUFFER contains a doctype declaration."
+  ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype
+  (let ((case-fold-search t)
+        (target
+         "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)"))
+    (with-current-buffer data-buffer
+      (goto-char (point-min))
+      ;; match basic <!doctype html> and also legacy variants as
+      ;; specified in link above
+      (re-search-forward target nil t))))
+
 (defun eww--rename-buffer ()
   "Rename the current EWW buffer.
 The renaming scheme is performed in accordance with
@@ -695,7 +715,9 @@ The renaming scheme is performed in accordance with
                               url))
               (goto-char (point-min))
               (eww-display-html (or encode charset) url nil point buffer))
-	     ((eww-html-p (car content-type))
+	     ((or (eww-html-p (car content-type))
+                  (and eww-use-doctype-fallback
+                       (eww--doctype-html-p data-buffer)))
               (eww-display-html (or encode charset) url nil point buffer))
 	     ((equal (car content-type) "application/pdf")
 	      (eww-display-pdf))
@@ -717,7 +739,7 @@ The renaming scheme is performed in accordance with
               (setq buffer-undo-list nil)))
         (kill-buffer data-buffer)))
     (unless (buffer-live-p buffer)
-      (kill-buffer data-buffer))))
+      (kill-buffer data-buffer)))
 
 (defun eww-parse-headers ()
   (let ((headers nil))
-- 
2.45.2.windows.1


[-- Attachment #3: Type: text/plain, Size: 54 bytes --]


-- 
Sebastián Monía
https://site.sebasmonia.com/

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-23 15:56       ` Sebastián Monía
@ 2024-09-24 18:31         ` Jim Porter
  2024-09-25 20:46           ` Sebastián Monía
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Porter @ 2024-09-24 18:31 UTC (permalink / raw)
  To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard

On 9/23/2024 8:56 AM, Sebastián Monía wrote:
> Would something like the attached patch work?

I was actually thinking something more general, like a defcustom named 
'eww-guess-content-type-functions', which would be a list of functions 
where the first non-nil result is the guessed Content-Type. That way, we 
could extend this to other content types (for example, maybe we'd want 
to look for the magic headers for various image formats too; we don't 
have to do that in this bug).

I think your 'eww--doctype-html-p' function would work nicely with a 
couple small tweaks as one of the functions in 
'eww-guess-content-type-functions' though.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-24 18:31         ` Jim Porter
@ 2024-09-25 20:46           ` Sebastián Monía
  2024-09-26  1:59             ` Jim Porter
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastián Monía @ 2024-09-25 20:46 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard

Hi Jim,

Jim Porter <jporterbugs@gmail.com> writes:
> I was actually thinking something more general, like a defcustom named
> 'eww-guess-content-type-functions', which would be a list of functions
> where the first non-nil result is the guessed Content-Type. That way,
> we could extend this to other content types (for example, maybe we'd
> want to look for the magic headers for various image formats too; we
> don't have to do that in this bug).

I think the functions for the new defcustom should accept the
content-type, headers (since both are already parsed by that time), and
the entire buffer. If you agree, I can give your suggestion a shot, if
not let me know what do you think would work.

> I think your 'eww--doctype-html-p' function would work nicely with a
> couple small tweaks as one of the functions in
> 'eww-guess-content-type-functions' though.

Thanks!
I would also have the current '(eww-html-p (car content-type))' wrapped
in a function `eww--content-type-html-p` and put both functions in the
defcustom, first content type then doctype.



-- 
Sebastián Monía
https://site.sebasmonia.com/





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-25 20:46           ` Sebastián Monía
@ 2024-09-26  1:59             ` Jim Porter
  2024-09-30 17:10               ` Sebastián Monía
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Porter @ 2024-09-26  1:59 UTC (permalink / raw)
  To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard

On 9/25/2024 1:46 PM, Sebastián Monía wrote:
> Jim Porter <jporterbugs@gmail.com> writes:
>> I was actually thinking something more general, like a defcustom named
>> 'eww-guess-content-type-functions', which would be a list of functions
>> where the first non-nil result is the guessed Content-Type. That way,
>> we could extend this to other content types (for example, maybe we'd
>> want to look for the magic headers for various image formats too; we
>> don't have to do that in this bug).
> 
> I think the functions for the new defcustom should accept the
> content-type, headers (since both are already parsed by that time), and
> the entire buffer. If you agree, I can give your suggestion a shot, if
> not let me know what do you think would work.

I think we'd only want to run this hook if the Content-Type is absent 
from the headers (its job is to *guess* a content type, after all), so 
I'd expect the signature to be the list of headers + the buffer.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-23 15:43       ` Sebastián Monía
@ 2024-09-28 10:58         ` Eli Zaretskii
  2024-09-30 15:52           ` Sebastián Monía
  0 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2024-09-28 10:58 UTC (permalink / raw)
  To: Sebastián Monía; +Cc: jporterbugs, 73133, ganimard

> From: Sebastián Monía <sebastian@sebasmonia.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  73133@debbugs.gnu.org,  ganimard@tuta.io
> Date: Mon, 23 Sep 2024 11:43:36 -0400
> 
> +(defcustom eww-use-doctype-fallback t
> +  "Accept a DOCTYPE tag as evidence that page content is HTML.

This should say

  "Whether to accept the DOCTYPE tag as evidence that page content is HTML."

> +This is used only when the page does not have a valid Content-Type
> +header."
> +  :version "30.1"
               ^^^^
This should be "31.1"

> +(defun eww--doctype-html-p (data-buffer)
> +  "Return non-nil if DATA-BUFFER contains a doctype declaration."

Not just "doctype declaration", but "HTML doctype declaration", right?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-28 10:58         ` Eli Zaretskii
@ 2024-09-30 15:52           ` Sebastián Monía
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastián Monía @ 2024-09-30 15:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: jporterbugs, 73133, ganimard


Eli Zaretskii <eliz@gnu.org> writes:
>> +(defcustom eww-use-doctype-fallback t
>> +  "Accept a DOCTYPE tag as evidence that page content is HTML.
>
> This should say
>
>   "Whether to accept the DOCTYPE tag as evidence that page content is HTML."

>> +  :version "30.1"
>                ^^^^
> This should be "31.1"

Will correct these (although the defcustom might change completely)

>> +(defun eww--doctype-html-p (data-buffer)
>> +  "Return non-nil if DATA-BUFFER contains a doctype declaration."
>
> Not just "doctype declaration", but "HTML doctype declaration", right?

Same here.

Thanks for the feedback!


-- 
Sebastián Monía
https://site.sebasmonia.com/





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#73133: 29.2; EWW fails to render some webpages
  2024-09-26  1:59             ` Jim Porter
@ 2024-09-30 17:10               ` Sebastián Monía
  0 siblings, 0 replies; 12+ messages in thread
From: Sebastián Monía @ 2024-09-30 17:10 UTC (permalink / raw)
  To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard

Hello!

I was looking into this today and considering our options.

Jim Porter <jporterbugs@gmail.com> writes:
> On 9/25/2024 1:46 PM, Sebastián Monía wrote:
>> Jim Porter <jporterbugs@gmail.com> writes:
>>> I was actually thinking something more general, like a defcustom named
>>> 'eww-guess-content-type-functions', which would be a list of functions
>>> where the first non-nil result is the guessed Content-Type. That way,
>>> we could extend this to other content types (for example, maybe we'd
>>> want to look for the magic headers for various image formats too; we
>>> don't have to do that in this bug).

We aren't really guessing the content-type, at least in the scope of my
original patch, and probably this bug. We just want to know if the page
is HTML to render it, in these snippets (part of eww-render):

;; original cond
((eww-html-p (car content-type))
   (eww-display-html (or encode charset) url nil point buffer))

;; one possible alternative 
((or (eww-html-p (car content-type))
     ;; alternative mechanism to detect if the page is HTML
     ;; via <doctype...>, or other tests.
     )
   (eww-display-html (or encode charset) url nil point buffer))

We could instead change 'eww-html-p' to accept the content-type, other
headers and buffer. And in that function, as a fallback, call the
functions in 'eww-guess-content-type-functions' and return non-nil for
HTML.

The reason I am suggesting this is that there is no benefit to have a
generic mechanism to detect the Content Type, without heavily modifying
'eww-render'. It only matters in the context of deciding whether to
render the HTML or displaying it as-is, other cases are handled in
eww-render already.

Hope that made sense!

I can always address Eli's comments in the context of my original patch,
too, for a much simpler (and of course, limited) solution.

-- 
Sebastián Monía
https://site.sebasmonia.com/





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-09-30 17:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-08 20:52 bug#73133: 29.2; EWW fails to render some webpages Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-10  6:06 ` Jim Porter
2024-09-21  9:13   ` Eli Zaretskii
2024-09-21 17:12     ` Jim Porter
2024-09-23 15:43       ` Sebastián Monía
2024-09-28 10:58         ` Eli Zaretskii
2024-09-30 15:52           ` Sebastián Monía
2024-09-23 15:56       ` Sebastián Monía
2024-09-24 18:31         ` Jim Porter
2024-09-25 20:46           ` Sebastián Monía
2024-09-26  1:59             ` Jim Porter
2024-09-30 17:10               ` Sebastián Monía

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).