From: "Sebastián Monía" <sebastian@sebasmonia.com>
To: Jim Porter <jporterbugs@gmail.com>
Cc: Eli Zaretskii <eliz@gnu.org>, 73133@debbugs.gnu.org, ganimard@tuta.io
Subject: bug#73133: 29.2; EWW fails to render some webpages
Date: Mon, 23 Sep 2024 11:43:36 -0400 [thread overview]
Message-ID: <thqnv7ymxtnb.fsf@sebasmonia.com> (raw)
In-Reply-To: <7eb7b048-06ea-5751-56e1-590689c8c318@gmail.com> (Jim Porter's message of "Sat, 21 Sep 2024 10:12:22 -0700")
[-- Attachment #1: Type: text/plain, Size: 970 bytes --]
Jim Porter <jporterbugs@gmail.com> writes:
> On 9/21/2024 2:13 AM, Eli Zaretskii wrote:
>>> Date: Mon, 9 Sep 2024 23:06:56 -0700
>>> From: Jim Porter <jporterbugs@gmail.com>
>>>
>>> From my reading of RFC9110[1], this is *technically* a bug (we should
>>> assume application/octet-stream, not text/plain), but that wouldn't fix
>>> the rendering here; it would probably make things worse. However, per
>>> the RFC, EWW would be within its rights to guess that the page is HTML,
>>> e.g. by checking for "<!doctype html>". It also recommends having that
>>> be an option that can be disabled, which is reasonable (and in keeping
>>> with Emacs's design principles anyway).
>>>
>>> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5
>> Thanks. Would someone like to submit a patch along these lines?
>
> It'll probably be a couple weeks until I have time to write a patch,
> but if no one has done so by then, I'll look into it.
Would the patch attached work?
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: eww-use-doctype-fallback --]
[-- Type: text/x-patch, Size: 2863 bytes --]
From 499abe197e6d245228be853731314e19148bb658 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?=
<sebastian.monia@sebasmonia.com>
Date: Mon, 23 Sep 2024 11:40:18 -0400
Subject: [PATCH] Add option eww-use-doctype-fallback, code to detect if a page
has a valid doctype tag, and use it as alternative to a content-type header
---
lisp/net/eww.el | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index a651d9d5020..59a146c8392 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -170,6 +170,14 @@ the first item is the program, and the rest are the arguments."
:type '(choice (const :tag "Never" nil)
regexp))
+(defcustom eww-use-doctype-fallback t
+ "Accept a DOCTYPE tag as evidence that page content is HTML.
+This is used only when the page does not have a valid Content-Type
+header."
+ :version "30.1"
+ :group 'eww
+ :type 'boolean)
+
(defcustom eww-browse-url-new-window-is-tab 'tab-bar
"Whether to open up new windows in a tab or a new buffer.
If t, then open the URL in a new tab rather than a new buffer if
@@ -630,6 +638,18 @@ Currently this means either text/html or application/xhtml+xml."
(member content-type '("text/html"
"application/xhtml+xml")))
+(defun eww--doctype-html-p (data-buffer)
+ "Return non-nil if DATA-BUFFER contains a doctype declaration."
+ ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype
+ (let ((case-fold-search t)
+ (target
+ "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)"))
+ (with-current-buffer data-buffer
+ (goto-char (point-min))
+ ;; match basic <!doctype html> and also legacy variants as
+ ;; specified in link above
+ (re-search-forward target nil t))))
+
(defun eww--rename-buffer ()
"Rename the current EWW buffer.
The renaming scheme is performed in accordance with
@@ -695,7 +715,9 @@ The renaming scheme is performed in accordance with
url))
(goto-char (point-min))
(eww-display-html (or encode charset) url nil point buffer))
- ((eww-html-p (car content-type))
+ ((or (eww-html-p (car content-type))
+ (and eww-use-doctype-fallback
+ (eww--doctype-html-p data-buffer)))
(eww-display-html (or encode charset) url nil point buffer))
((equal (car content-type) "application/pdf")
(eww-display-pdf))
@@ -717,7 +739,7 @@ The renaming scheme is performed in accordance with
(setq buffer-undo-list nil)))
(kill-buffer data-buffer)))
(unless (buffer-live-p buffer)
- (kill-buffer data-buffer))))
+ (kill-buffer data-buffer)))
(defun eww-parse-headers ()
(let ((headers nil))
--
2.45.2.windows.1
[-- Attachment #3: Type: text/plain, Size: 54 bytes --]
--
Sebastián Monía
https://site.sebasmonia.com/
next prev parent reply other threads:[~2024-09-23 15:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-08 20:52 bug#73133: 29.2; EWW fails to render some webpages Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-09-10 6:06 ` Jim Porter
2024-09-21 9:13 ` Eli Zaretskii
2024-09-21 17:12 ` Jim Porter
2024-09-23 15:43 ` Sebastián Monía [this message]
2024-09-28 10:58 ` Eli Zaretskii
2024-09-30 15:52 ` Sebastián Monía
2024-09-23 15:56 ` Sebastián Monía
2024-09-24 18:31 ` Jim Porter
2024-09-25 20:46 ` Sebastián Monía
2024-09-26 1:59 ` Jim Porter
2024-09-30 17:10 ` Sebastián Monía
2024-10-03 23:39 ` Jim Porter
2024-10-09 3:30 ` Sebastián Monía
2024-10-09 3:42 ` Jim Porter
2024-10-10 2:08 ` Sebastián Monía
2024-10-14 4:35 ` Jim Porter
2024-10-14 14:03 ` Eli Zaretskii
2024-10-15 11:43 ` Sebastián Monía
2024-10-19 7:46 ` Eli Zaretskii
2024-10-19 17:56 ` Sebastián Monía
2024-10-20 19:17 ` Jim Porter
2024-10-21 1:48 ` Sebastián Monía
2024-10-22 4:59 ` Jim Porter
2024-10-22 12:35 ` Sebastián Monía
2024-10-22 12:36 ` Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-10-23 10:43 ` Mattias Engdegård
2024-10-23 16:19 ` Mattias Engdegård
2024-10-23 18:51 ` Jim Porter
2024-10-24 3:35 ` Sebastián Monía
2024-10-24 17:13 ` Sebastián Monía
2024-10-28 15:45 ` Mattias Engdegård
2024-10-30 15:21 ` Sebastián Monía
2024-11-02 11:35 ` Eli Zaretskii
2024-10-24 3:32 ` Sebastián Monía
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=thqnv7ymxtnb.fsf@sebasmonia.com \
--to=sebastian@sebasmonia.com \
--cc=73133@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=ganimard@tuta.io \
--cc=jporterbugs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.