* bug#73133: 29.2; EWW fails to render some webpages @ 2024-09-08 20:52 Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors 2024-09-10 6:06 ` Jim Porter 2024-10-23 10:43 ` Mattias Engdegård 0 siblings, 2 replies; 34+ messages in thread From: Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-09-08 20:52 UTC (permalink / raw) To: 73133 [-- Attachment #1: Type: text/plain, Size: 7951 bytes --] To Whom it may concern, I have recently discovered the website gastonle.ru, however it does not render with Emacs Web Wowser. It appears to be a relatively simple website and I cannot see what would prohibit it from rendering. I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1 but it also fails to render. This therefore appears to be a bug in EWW. --- In GNU Emacs 29.2 (build 1, aarch64-apple-darwin21.6.0, NS appkit-2113.60 Version 12.6.6 (Build 21G646)) of 2024-01-19 built on armbob.lan Windowing system distributor 'Apple', version 10.3.2487 System Description: macOS 14.2.1 Configured using: 'configure --with-ns '--enable-locallisppath=/Library/Application Support/Emacs/${version}/site-lisp:/Library/Application Support/Emacs/site-lisp' --with-modules 'CFLAGS=-DFD_SETSIZE=10000 -DDARWIN_UNLIMITED_SELECT' --with-x-toolkit=no' Configured features: ACL GLIB GMP GNUTLS JPEG JSON LIBXML2 MODULES NOTIFY KQUEUE NS PDUMPER PNG RSVG SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER ZLIB Important settings: value of $LANG: en_NZ.UTF-8 locale-coding-system: utf-8-unix Major mode: Markdown Minor modes in effect: yas-global-mode: t yas-minor-mode: t global-git-commit-mode: t magit-auto-revert-mode: t shell-dirtrack-mode: t server-mode: t TeX-PDF-mode: t TeX-source-correlate-mode: t global-display-line-numbers-mode: t display-line-numbers-mode: t whitespace-mode: t global-page-break-lines-mode: t override-global-mode: t tooltip-mode: t global-eldoc-mode: t eldoc-mode: t show-paren-mode: t electric-indent-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t line-number-mode: t transient-mark-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t Load-path shadows: /Users/ganimard/.emacs.d/elpa/transient-20230919.2146/transient hides /Applications/Emacs.app/Contents/Resources/lisp/transient <http://Emacs.app/Contents/Resources/lisp/transient> Features: (shadow sort mail-extr emacsbug files-x vc-hg vc-bzr vc-src vc-sccs vc-svn vc-cvs vc-rcs log-view vc bug-reference help-fns radix-tree magit-patch magit-subtree magit-gitignore magit-ediff ediff ediff-merg ediff-mult ediff-wind ediff-diff ediff-help ediff-init ediff-util magit-extras face-remap misearch multi-isearch vc-git vc-dispatcher markdown-mode color dired-aux disp-table hl-todo flycheck forth-mode forth-spec forth-smie smie forth-syntax llvm-mode splunk-mode ess lisp-mnt ess-utils ess-custom go-mode find-file ffap etags fileloop xref rust-utils rust-mode rust-rustfmt rust-playpen rust-compile rust-cargo yasnippet magit-submodule magit-blame magit-stash magit-reflog magit-bisect magit-push magit-pull magit-fetch magit-clone magit-remote magit-commit magit-sequence magit-notes magit-worktree magit-tag magit-merge magit-branch magit-reset magit-files magit-refs magit-status magit magit-repos magit-apply magit-wip magit-log which-func imenu magit-diff smerge-mode diff diff-mode git-commit log-edit pcvs-util add-log magit-core magit-autorevert autorevert magit-margin magit-transient magit-process with-editor shell server magit-mode transient magit-git magit-base magit-section cursor-sensor dash auctex-latexmk latex latex-flymake flymake-proc flymake project compile warnings tex-ispell tex-style tex texmathp latex-preview-pane doc-view filenotify jka-compr image-mode exif auctex ebib ebib-reading-list ebib-notes org-element org-persist xdg org-id org-refile org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-src ob-comint org-pcomplete pcomplete comint ansi-osc ansi-color org-list org-footnote org-faces org-entities noutline outline icons ob-emacs-lisp ob-core ob-eval org-cycle org-table org-keys oc org-loaddefs find-func cal-menu calendar cal-loaddefs ol org-fold org-fold-core org-compat ring avl-tree generator org-version org-macs ebib-filters ebib-keywords ebib-utils ebib-db message sendmail yank-media puny dired dired-loaddefs rfc822 mml mml-sec epa derived epg rfc6068 epg-config gnus-util text-property-search mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils gmm-utils mailheader format-spec parsebib rx hl-line pp crm bibtex iso8601 time-date writeroom-mode visual-fill-column olivetti multiple-cursors mc-separate-operations rectangular-region-mode mc-mark-pop mc-edit-lines mc-hide-unmatched-lines-mode mc-mark-more thingatpt mc-cycle-cursors multiple-cursors-core advice rect move-text no-littering compat paredit edmacro kmacro display-line-numbers whitespace page-break-lines smart-mode-line-atom-one-dark-theme cl-extra help-mode atom-one-dark-theme use-package use-package-ensure use-package-delight use-package-diminish use-package-bind-key bind-key easy-mmode use-package-core finder-inf atom-one-dark-theme-autoloads auctex-latexmk-autoloads auctex-autoloads tex-site company-autoloads dracula-theme-autoloads ebib-autoloads ess-autoloads flycheck-autoloads forth-mode-autoloads gdscript-mode-autoloads go-mode-autoloads hl-todo-autoloads impatient-mode-autoloads htmlize-autoloads julia-formatter-autoloads just-mode-autoloads latex-preview-pane-autoloads llvm-ts-mode-autoloads lsp-docker-autoloads lsp-julia-autoloads julia-mode-autoloads lsp-ui-autoloads lsp-mode-autoloads ht-autoloads lv-autoloads magit-autoloads pcase git-commit-autoloads magit-section-autoloads move-text-autoloads multiple-cursors-autoloads no-littering-autoloads olivetti-autoloads package-lint-autoloads page-break-lines-autoloads paredit-autoloads parsebib-autoloads pkg-info-autoloads epl-autoloads quelpa-use-package-autoloads quelpa-autoloads rustic-autoloads markdown-mode-autoloads f-autoloads dash-autoloads rust-mode-autoloads s-autoloads session-async-autoloads simple-httpd-autoloads smart-mode-line-atom-one-dark-theme-autoloads smart-mode-line-autoloads rich-minority-autoloads spinner-autoloads splunk-mode-autoloads transient-autoloads with-editor-autoloads compat-autoloads info writeroom-mode-autoloads visual-fill-column-autoloads xterm-color-autoloads yaml-autoloads yaml-mode-autoloads yasnippet-autoloads package browse-url url url-proxy url-privacy url-expand url-methods url-history url-cookie generate-lisp-file url-domsuf url-util mailcap url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs password-cache json subr-x map byte-opt gv bytecomp byte-compile url-vars cl-loaddefs cl-lib rmc iso-transl tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel term/ns-win ns-win ucs-normalize mule-util term/common-win tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic indonesian philippine cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget keymap hashtable-print-readable backquote threads kqueue cocoa ns multi-tty make-network-process emacs) Memory information: ((conses 16 412027 70117) (symbols 48 34112 0) (strings 32 128155 6447) (string-bytes 1 4038566) (vectors 16 67754) (vector-slots 8 739746 70880) (floats 8 294 368) (intervals 56 6200 53) (buffers 984 43)) [-- Attachment #2: Type: text/html, Size: 12370 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-08 20:52 bug#73133: 29.2; EWW fails to render some webpages Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-09-10 6:06 ` Jim Porter 2024-09-21 9:13 ` Eli Zaretskii 2024-10-23 10:43 ` Mattias Engdegård 1 sibling, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-09-10 6:06 UTC (permalink / raw) To: Ganimard, 73133 On 9/8/2024 1:52 PM, Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors wrote: > I have recently discovered the website gastonle.ru, however it does not > render with Emacs Web Wowser. It appears to be a relatively simple > website and I cannot see what would prohibit it from rendering. Checking that page via curl, it appears that it doesn't return a Content-Type header. In the absence of that header, EWW assumes that the page is plain text. > I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1 > but it also fails to render. This therefore appears to be a bug in EWW. From my reading of RFC9110[1], this is *technically* a bug (we should assume application/octet-stream, not text/plain), but that wouldn't fix the rendering here; it would probably make things worse. However, per the RFC, EWW would be within its rights to guess that the page is HTML, e.g. by checking for "<!doctype html>". It also recommends having that be an option that can be disabled, which is reasonable (and in keeping with Emacs's design principles anyway). [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5 ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-10 6:06 ` Jim Porter @ 2024-09-21 9:13 ` Eli Zaretskii 2024-09-21 17:12 ` Jim Porter 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2024-09-21 9:13 UTC (permalink / raw) To: Jim Porter; +Cc: 73133, ganimard > Date: Mon, 9 Sep 2024 23:06:56 -0700 > From: Jim Porter <jporterbugs@gmail.com> > > On 9/8/2024 1:52 PM, Ganimard via Bug reports for GNU Emacs, the Swiss > army knife of text editors wrote: > > I have recently discovered the website gastonle.ru, however it does not > > render with Emacs Web Wowser. It appears to be a relatively simple > > website and I cannot see what would prohibit it from rendering. > > Checking that page via curl, it appears that it doesn't return a > Content-Type header. In the absence of that header, EWW assumes that the > page is plain text. > > > I have also tried it on an Ubuntu 22.04.4 LTS distro running Emacs 28.1 > > but it also fails to render. This therefore appears to be a bug in EWW. > > From my reading of RFC9110[1], this is *technically* a bug (we should > assume application/octet-stream, not text/plain), but that wouldn't fix > the rendering here; it would probably make things worse. However, per > the RFC, EWW would be within its rights to guess that the page is HTML, > e.g. by checking for "<!doctype html>". It also recommends having that > be an option that can be disabled, which is reasonable (and in keeping > with Emacs's design principles anyway). > > [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5 Thanks. Would someone like to submit a patch along these lines? ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-21 9:13 ` Eli Zaretskii @ 2024-09-21 17:12 ` Jim Porter 2024-09-23 15:43 ` Sebastián Monía 2024-09-23 15:56 ` Sebastián Monía 0 siblings, 2 replies; 34+ messages in thread From: Jim Porter @ 2024-09-21 17:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 73133, ganimard On 9/21/2024 2:13 AM, Eli Zaretskii wrote: >> Date: Mon, 9 Sep 2024 23:06:56 -0700 >> From: Jim Porter <jporterbugs@gmail.com> >> >> From my reading of RFC9110[1], this is *technically* a bug (we should >> assume application/octet-stream, not text/plain), but that wouldn't fix >> the rendering here; it would probably make things worse. However, per >> the RFC, EWW would be within its rights to guess that the page is HTML, >> e.g. by checking for "<!doctype html>". It also recommends having that >> be an option that can be disabled, which is reasonable (and in keeping >> with Emacs's design principles anyway). >> >> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5 > > Thanks. Would someone like to submit a patch along these lines? It'll probably be a couple weeks until I have time to write a patch, but if no one has done so by then, I'll look into it. ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-21 17:12 ` Jim Porter @ 2024-09-23 15:43 ` Sebastián Monía 2024-09-28 10:58 ` Eli Zaretskii 2024-09-23 15:56 ` Sebastián Monía 1 sibling, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-09-23 15:43 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard [-- Attachment #1: Type: text/plain, Size: 970 bytes --] Jim Porter <jporterbugs@gmail.com> writes: > On 9/21/2024 2:13 AM, Eli Zaretskii wrote: >>> Date: Mon, 9 Sep 2024 23:06:56 -0700 >>> From: Jim Porter <jporterbugs@gmail.com> >>> >>> From my reading of RFC9110[1], this is *technically* a bug (we should >>> assume application/octet-stream, not text/plain), but that wouldn't fix >>> the rendering here; it would probably make things worse. However, per >>> the RFC, EWW would be within its rights to guess that the page is HTML, >>> e.g. by checking for "<!doctype html>". It also recommends having that >>> be an option that can be disabled, which is reasonable (and in keeping >>> with Emacs's design principles anyway). >>> >>> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5 >> Thanks. Would someone like to submit a patch along these lines? > > It'll probably be a couple weeks until I have time to write a patch, > but if no one has done so by then, I'll look into it. Would the patch attached work? [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: eww-use-doctype-fallback --] [-- Type: text/x-patch, Size: 2863 bytes --] From 499abe197e6d245228be853731314e19148bb658 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?= <sebastian.monia@sebasmonia.com> Date: Mon, 23 Sep 2024 11:40:18 -0400 Subject: [PATCH] Add option eww-use-doctype-fallback, code to detect if a page has a valid doctype tag, and use it as alternative to a content-type header --- lisp/net/eww.el | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index a651d9d5020..59a146c8392 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -170,6 +170,14 @@ the first item is the program, and the rest are the arguments." :type '(choice (const :tag "Never" nil) regexp)) +(defcustom eww-use-doctype-fallback t + "Accept a DOCTYPE tag as evidence that page content is HTML. +This is used only when the page does not have a valid Content-Type +header." + :version "30.1" + :group 'eww + :type 'boolean) + (defcustom eww-browse-url-new-window-is-tab 'tab-bar "Whether to open up new windows in a tab or a new buffer. If t, then open the URL in a new tab rather than a new buffer if @@ -630,6 +638,18 @@ Currently this means either text/html or application/xhtml+xml." (member content-type '("text/html" "application/xhtml+xml"))) +(defun eww--doctype-html-p (data-buffer) + "Return non-nil if DATA-BUFFER contains a doctype declaration." + ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype + (let ((case-fold-search t) + (target + "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) + (with-current-buffer data-buffer + (goto-char (point-min)) + ;; match basic <!doctype html> and also legacy variants as + ;; specified in link above + (re-search-forward target nil t)))) + (defun eww--rename-buffer () "Rename the current EWW buffer. The renaming scheme is performed in accordance with @@ -695,7 +715,9 @@ The renaming scheme is performed in accordance with url)) (goto-char (point-min)) (eww-display-html (or encode charset) url nil point buffer)) - ((eww-html-p (car content-type)) + ((or (eww-html-p (car content-type)) + (and eww-use-doctype-fallback + (eww--doctype-html-p data-buffer))) (eww-display-html (or encode charset) url nil point buffer)) ((equal (car content-type) "application/pdf") (eww-display-pdf)) @@ -717,7 +739,7 @@ The renaming scheme is performed in accordance with (setq buffer-undo-list nil))) (kill-buffer data-buffer))) (unless (buffer-live-p buffer) - (kill-buffer data-buffer)))) + (kill-buffer data-buffer))) (defun eww-parse-headers () (let ((headers nil)) -- 2.45.2.windows.1 [-- Attachment #3: Type: text/plain, Size: 54 bytes --] -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply related [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-23 15:43 ` Sebastián Monía @ 2024-09-28 10:58 ` Eli Zaretskii 2024-09-30 15:52 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2024-09-28 10:58 UTC (permalink / raw) To: Sebastián Monía; +Cc: jporterbugs, 73133, ganimard > From: Sebastián Monía <sebastian@sebasmonia.com> > Cc: Eli Zaretskii <eliz@gnu.org>, 73133@debbugs.gnu.org, ganimard@tuta.io > Date: Mon, 23 Sep 2024 11:43:36 -0400 > > +(defcustom eww-use-doctype-fallback t > + "Accept a DOCTYPE tag as evidence that page content is HTML. This should say "Whether to accept the DOCTYPE tag as evidence that page content is HTML." > +This is used only when the page does not have a valid Content-Type > +header." > + :version "30.1" ^^^^ This should be "31.1" > +(defun eww--doctype-html-p (data-buffer) > + "Return non-nil if DATA-BUFFER contains a doctype declaration." Not just "doctype declaration", but "HTML doctype declaration", right? ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-28 10:58 ` Eli Zaretskii @ 2024-09-30 15:52 ` Sebastián Monía 0 siblings, 0 replies; 34+ messages in thread From: Sebastián Monía @ 2024-09-30 15:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: jporterbugs, 73133, ganimard Eli Zaretskii <eliz@gnu.org> writes: >> +(defcustom eww-use-doctype-fallback t >> + "Accept a DOCTYPE tag as evidence that page content is HTML. > > This should say > > "Whether to accept the DOCTYPE tag as evidence that page content is HTML." >> + :version "30.1" > ^^^^ > This should be "31.1" Will correct these (although the defcustom might change completely) >> +(defun eww--doctype-html-p (data-buffer) >> + "Return non-nil if DATA-BUFFER contains a doctype declaration." > > Not just "doctype declaration", but "HTML doctype declaration", right? Same here. Thanks for the feedback! -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-21 17:12 ` Jim Porter 2024-09-23 15:43 ` Sebastián Monía @ 2024-09-23 15:56 ` Sebastián Monía 2024-09-24 18:31 ` Jim Porter 1 sibling, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-09-23 15:56 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard [-- Attachment #1: Type: text/plain, Size: 158 bytes --] Hi all, Would something like the attached patch work? Thanks, Seb PS: I think I sent this to just one person by mistake instead of a wide reply, my bad. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: eww-use-doctype-fallback --] [-- Type: text/x-patch, Size: 2863 bytes --] From 499abe197e6d245228be853731314e19148bb658 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?= <sebastian.monia@sebasmonia.com> Date: Mon, 23 Sep 2024 11:40:18 -0400 Subject: [PATCH] Add option eww-use-doctype-fallback, code to detect if a page has a valid doctype tag, and use it as alternative to a content-type header --- lisp/net/eww.el | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index a651d9d5020..59a146c8392 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -170,6 +170,14 @@ the first item is the program, and the rest are the arguments." :type '(choice (const :tag "Never" nil) regexp)) +(defcustom eww-use-doctype-fallback t + "Accept a DOCTYPE tag as evidence that page content is HTML. +This is used only when the page does not have a valid Content-Type +header." + :version "30.1" + :group 'eww + :type 'boolean) + (defcustom eww-browse-url-new-window-is-tab 'tab-bar "Whether to open up new windows in a tab or a new buffer. If t, then open the URL in a new tab rather than a new buffer if @@ -630,6 +638,18 @@ Currently this means either text/html or application/xhtml+xml." (member content-type '("text/html" "application/xhtml+xml"))) +(defun eww--doctype-html-p (data-buffer) + "Return non-nil if DATA-BUFFER contains a doctype declaration." + ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype + (let ((case-fold-search t) + (target + "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) + (with-current-buffer data-buffer + (goto-char (point-min)) + ;; match basic <!doctype html> and also legacy variants as + ;; specified in link above + (re-search-forward target nil t)))) + (defun eww--rename-buffer () "Rename the current EWW buffer. The renaming scheme is performed in accordance with @@ -695,7 +715,9 @@ The renaming scheme is performed in accordance with url)) (goto-char (point-min)) (eww-display-html (or encode charset) url nil point buffer)) - ((eww-html-p (car content-type)) + ((or (eww-html-p (car content-type)) + (and eww-use-doctype-fallback + (eww--doctype-html-p data-buffer))) (eww-display-html (or encode charset) url nil point buffer)) ((equal (car content-type) "application/pdf") (eww-display-pdf)) @@ -717,7 +739,7 @@ The renaming scheme is performed in accordance with (setq buffer-undo-list nil))) (kill-buffer data-buffer))) (unless (buffer-live-p buffer) - (kill-buffer data-buffer)))) + (kill-buffer data-buffer))) (defun eww-parse-headers () (let ((headers nil)) -- 2.45.2.windows.1 [-- Attachment #3: Type: text/plain, Size: 54 bytes --] -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply related [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-23 15:56 ` Sebastián Monía @ 2024-09-24 18:31 ` Jim Porter 2024-09-25 20:46 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-09-24 18:31 UTC (permalink / raw) To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard On 9/23/2024 8:56 AM, Sebastián Monía wrote: > Would something like the attached patch work? I was actually thinking something more general, like a defcustom named 'eww-guess-content-type-functions', which would be a list of functions where the first non-nil result is the guessed Content-Type. That way, we could extend this to other content types (for example, maybe we'd want to look for the magic headers for various image formats too; we don't have to do that in this bug). I think your 'eww--doctype-html-p' function would work nicely with a couple small tweaks as one of the functions in 'eww-guess-content-type-functions' though. ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-24 18:31 ` Jim Porter @ 2024-09-25 20:46 ` Sebastián Monía 2024-09-26 1:59 ` Jim Porter 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-09-25 20:46 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard Hi Jim, Jim Porter <jporterbugs@gmail.com> writes: > I was actually thinking something more general, like a defcustom named > 'eww-guess-content-type-functions', which would be a list of functions > where the first non-nil result is the guessed Content-Type. That way, > we could extend this to other content types (for example, maybe we'd > want to look for the magic headers for various image formats too; we > don't have to do that in this bug). I think the functions for the new defcustom should accept the content-type, headers (since both are already parsed by that time), and the entire buffer. If you agree, I can give your suggestion a shot, if not let me know what do you think would work. > I think your 'eww--doctype-html-p' function would work nicely with a > couple small tweaks as one of the functions in > 'eww-guess-content-type-functions' though. Thanks! I would also have the current '(eww-html-p (car content-type))' wrapped in a function `eww--content-type-html-p` and put both functions in the defcustom, first content type then doctype. -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-25 20:46 ` Sebastián Monía @ 2024-09-26 1:59 ` Jim Porter 2024-09-30 17:10 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-09-26 1:59 UTC (permalink / raw) To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard On 9/25/2024 1:46 PM, Sebastián Monía wrote: > Jim Porter <jporterbugs@gmail.com> writes: >> I was actually thinking something more general, like a defcustom named >> 'eww-guess-content-type-functions', which would be a list of functions >> where the first non-nil result is the guessed Content-Type. That way, >> we could extend this to other content types (for example, maybe we'd >> want to look for the magic headers for various image formats too; we >> don't have to do that in this bug). > > I think the functions for the new defcustom should accept the > content-type, headers (since both are already parsed by that time), and > the entire buffer. If you agree, I can give your suggestion a shot, if > not let me know what do you think would work. I think we'd only want to run this hook if the Content-Type is absent from the headers (its job is to *guess* a content type, after all), so I'd expect the signature to be the list of headers + the buffer. ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-26 1:59 ` Jim Porter @ 2024-09-30 17:10 ` Sebastián Monía 2024-10-03 23:39 ` Jim Porter 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-09-30 17:10 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard Hello! I was looking into this today and considering our options. Jim Porter <jporterbugs@gmail.com> writes: > On 9/25/2024 1:46 PM, Sebastián Monía wrote: >> Jim Porter <jporterbugs@gmail.com> writes: >>> I was actually thinking something more general, like a defcustom named >>> 'eww-guess-content-type-functions', which would be a list of functions >>> where the first non-nil result is the guessed Content-Type. That way, >>> we could extend this to other content types (for example, maybe we'd >>> want to look for the magic headers for various image formats too; we >>> don't have to do that in this bug). We aren't really guessing the content-type, at least in the scope of my original patch, and probably this bug. We just want to know if the page is HTML to render it, in these snippets (part of eww-render): ;; original cond ((eww-html-p (car content-type)) (eww-display-html (or encode charset) url nil point buffer)) ;; one possible alternative ((or (eww-html-p (car content-type)) ;; alternative mechanism to detect if the page is HTML ;; via <doctype...>, or other tests. ) (eww-display-html (or encode charset) url nil point buffer)) We could instead change 'eww-html-p' to accept the content-type, other headers and buffer. And in that function, as a fallback, call the functions in 'eww-guess-content-type-functions' and return non-nil for HTML. The reason I am suggesting this is that there is no benefit to have a generic mechanism to detect the Content Type, without heavily modifying 'eww-render'. It only matters in the context of deciding whether to render the HTML or displaying it as-is, other cases are handled in eww-render already. Hope that made sense! I can always address Eli's comments in the context of my original patch, too, for a much simpler (and of course, limited) solution. -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-30 17:10 ` Sebastián Monía @ 2024-10-03 23:39 ` Jim Porter 2024-10-09 3:30 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-10-03 23:39 UTC (permalink / raw) To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard On 9/30/2024 10:10 AM, Sebastián Monía wrote: > We aren't really guessing the content-type, at least in the scope of my > original patch, and probably this bug. We just want to know if the page > is HTML to render it, in these snippets (part of eww-render): What I was thinking about was something like this (with some appropriate implementation for 'eww--guess-content-type', possibly accepting args as needed): diff --git a/lisp/net/eww.el b/lisp/net/eww.el index b5d2f20781a..1c134717cc9 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -659,7 +659,7 @@ eww-render (content-type (mail-header-parse-content-type (if (zerop (length (cdr (assoc "content-type" headers)))) - "text/plain" + (eww--guess-content-type) (cdr (assoc "content-type" headers))))) (charset (intern (downcase ^ permalink raw reply related [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-03 23:39 ` Jim Porter @ 2024-10-09 3:30 ` Sebastián Monía 2024-10-09 3:42 ` Jim Porter 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-10-09 3:30 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard [-- Attachment #1: Type: text/plain, Size: 1076 bytes --] Jim Porter <jporterbugs@gmail.com> writes: > On 9/30/2024 10:10 AM, Sebastián Monía wrote: >> We aren't really guessing the content-type, at least in the scope of my >> original patch, and probably this bug. We just want to know if the page >> is HTML to render it, in these snippets (part of eww-render): > > What I was thinking about was something like this (with some > appropriate implementation for 'eww--guess-content-type', possibly > accepting args as needed): > > diff --git a/lisp/net/eww.el b/lisp/net/eww.el > index b5d2f20781a..1c134717cc9 100644 > --- a/lisp/net/eww.el > +++ b/lisp/net/eww.el > @@ -659,7 +659,7 @@ eww-render > (content-type > (mail-header-parse-content-type > (if (zerop (length (cdr (assoc "content-type" headers)))) > - "text/plain" > + (eww--guess-content-type) > (cdr (assoc "content-type" headers))))) > (charset (intern > (downcase Hello! Attached a new patch that goes in the direction outlined above, let me know what you think. Cheers, Seb [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: patch --] [-- Type: text/x-patch, Size: 3056 bytes --] From 309a7d729665f14964a550f57f589a79705e23d6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?= <sebastian@sebasmonia.com> Date: Tue, 8 Oct 2024 23:26:42 -0400 Subject: [PATCH] Add customization to let EWW guess content-type if needed (bug#73133) --- lisp/net/eww.el | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index b5d2f20781a..0a9a621f3e5 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -108,6 +108,19 @@ eww-suggest-uris eww-current-url eww-bookmark-urls)) +(defcustom eww-guess-content-type-functions + '(eww--html-if-doctype) + "List of functions used to guess a page's content-type. +These are only used when the page does not have a valid Content-Type +header. Functions are called in order, until one of them returns the +value to be used as Content-Type. They receive two parameters: an alist +of headers, and the buffer that holds the complete response. If the +list is exhausted, eww assumes \"text/plain\" so the user can see the +markup." + :version "31.1" + :group 'eww + :type '(repeat function)) + (defcustom eww-bookmarks-directory user-emacs-directory "Directory where bookmark files will be stored." :version "25.1" @@ -630,6 +643,31 @@ eww-html-p (member content-type '("text/html" "application/xhtml+xml"))) +(defun eww--guess-content-type (headers response-buffer) + "Use HEADERS and RESPONSE to guess the Content-Type. +Will call each function in `eww-guess-content-type-functions', until one +of them returns a value. This mechanism is used only if there isn't a +valid Content-Type header. If none of the functions can guess, return +\"text/plain\", so at least the mark up is displayed." + (let ((first-guess (seq-some + (lambda (f) (funcall f headers response-buffer)) + eww-guess-content-type-functions))) + (or first-guess "text/plain"))) + +(defun eww--html-if-doctype (headers response-buffer) + "Return \"text/html\" if RESPONSE-BUFFER has an HTML doctype declaration. +HEADERS is unused." + ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype + (let ((case-fold-search t) + (target + "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) + (with-current-buffer response-buffer + (goto-char (point-min)) + ;; match basic <!doctype html> and also legacy variants as + ;; specified in link above + (when (re-search-forward target nil t) + "text/html")))) + (defun eww--rename-buffer () "Rename the current EWW buffer. The renaming scheme is performed in accordance with @@ -659,7 +697,7 @@ eww-render (content-type (mail-header-parse-content-type (if (zerop (length (cdr (assoc "content-type" headers)))) - "text/plain" + (eww--guess-content-type headers buffer) (cdr (assoc "content-type" headers))))) (charset (intern (downcase -- 2.43.0 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-09 3:30 ` Sebastián Monía @ 2024-10-09 3:42 ` Jim Porter 2024-10-10 2:08 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-10-09 3:42 UTC (permalink / raw) To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard On 10/8/2024 8:30 PM, Sebastián Monía wrote: > Attached a new patch that goes in the direction outlined above, let me > know what you think. Thanks, I think this looks good overall (though I haven't run with your patch locally). Just one comment below. > + (let ((first-guess (seq-some > + (lambda (f) (funcall f headers response-buffer)) > + eww-guess-content-type-functions))) > + (or first-guess "text/plain"))) I believe this could be: (or (run-hook-with-args-until-success 'eww-guess-content-type-functions headers response-buffer) "text/plain") ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-09 3:42 ` Jim Porter @ 2024-10-10 2:08 ` Sebastián Monía 2024-10-14 4:35 ` Jim Porter 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-10-10 2:08 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard [-- Attachment #1: Type: text/plain, Size: 838 bytes --] Jim Porter <jporterbugs@gmail.com> writes: > I believe this could be: > > (or (run-hook-with-args-until-success > 'eww-guess-content-type-functions headers response-buffer) > "text/plain") TIL. I landed in seq-some looking for something like run-hook-with-args-until-sucess. So I actually learned two days in a row! :) Attached a modified patch. I also noticed and corrected another error, that broke things when using the "g" (reload) command. As for testing, I used this: (defun do-ask (headers response) (when (y-or-n-p "decide?") (if (y-or-n-p "render?") "text/html" "text/plain"))) (setq eww-guess-content-type-functions '(do-ask eww--html-if-doctype)) And then reverse the order of the functions. Using "regular" pages and the one reported in the bug. Also tested with no functions. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: patch bug 73133 --] [-- Type: text/x-patch, Size: 2995 bytes --] From 5239cf0add09f69276ae21c13efb2fe665297234 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?= <sebastian@sebasmonia.com> Date: Tue, 8 Oct 2024 23:26:42 -0400 Subject: [PATCH] Add customization to let EWW guess content-type if needed (bug#73133) --- lisp/net/eww.el | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index b5d2f20781a..30e780a44d9 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -108,6 +108,19 @@ eww-suggest-uris eww-current-url eww-bookmark-urls)) +(defcustom eww-guess-content-type-functions + '(eww--html-if-doctype) + "List of functions used to guess a page's content-type. +These are only used when the page does not have a valid Content-Type +header. Functions are called in order, until one of them returns the +value to be used as Content-Type. They receive two parameters: an alist +of headers, and the buffer that holds the complete response. If the +list is exhausted, eww assumes \"text/plain\" so the user can see the +markup." + :version "31.1" + :group 'eww + :type '(repeat function)) + (defcustom eww-bookmarks-directory user-emacs-directory "Directory where bookmark files will be stored." :version "25.1" @@ -630,6 +643,30 @@ eww-html-p (member content-type '("text/html" "application/xhtml+xml"))) +(defun eww--guess-content-type (headers response-buffer) + "Use HEADERS and RESPONSE to guess the Content-Type. +Will call each function in `eww-guess-content-type-functions', until one +of them returns a value. This mechanism is used only if there isn't a +valid Content-Type header. If none of the functions can guess, return +\"text/plain\", so at least the mark up is displayed." + (or (run-hook-with-args-until-success + 'eww-guess-content-type-functions headers response-buffer) + "text/plain")) + +(defun eww--html-if-doctype (headers response-buffer) + "Return \"text/html\" if RESPONSE-BUFFER has an HTML doctype declaration. +HEADERS is unused." + ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype + (let ((case-fold-search t) + (target + "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) + (with-current-buffer response-buffer + (goto-char (point-min)) + ;; match basic <!doctype html> and also legacy variants as + ;; specified in link above + (when (re-search-forward target nil t) + "text/html")))) + (defun eww--rename-buffer () "Rename the current EWW buffer. The renaming scheme is performed in accordance with @@ -659,7 +696,7 @@ eww-render (content-type (mail-header-parse-content-type (if (zerop (length (cdr (assoc "content-type" headers)))) - "text/plain" + (eww--guess-content-type headers (current-buffer)) (cdr (assoc "content-type" headers))))) (charset (intern (downcase -- 2.43.0 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-10 2:08 ` Sebastián Monía @ 2024-10-14 4:35 ` Jim Porter 2024-10-14 14:03 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-10-14 4:35 UTC (permalink / raw) To: Sebastián Monía; +Cc: Eli Zaretskii, 73133, ganimard On 10/9/2024 7:08 PM, Sebastián Monía wrote: > Attached a modified patch. I also noticed and corrected another error, > that broke things when using the "g" (reload) command. Thanks, I think this looks good overall. I just noticed one small nit (which I can fix when merging): > +(defun eww--html-if-doctype (headers response-buffer) > + "Return \"text/html\" if RESPONSE-BUFFER has an HTML doctype declaration. > +HEADERS is unused." If an argument is unused, the convention is to prefix it with an underscore like "_headers". Then Flymake won't complain about an unused variable. :) One last question: do you have FSF copyright assignment paperwork filled out? If you haven't already, you'll need to fill that out before we can merge this. (I don't think I have access to the full list of people who've filled out paperwork, so I'm not sure if you've already done this.) ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-14 4:35 ` Jim Porter @ 2024-10-14 14:03 ` Eli Zaretskii 2024-10-15 11:43 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2024-10-14 14:03 UTC (permalink / raw) To: Jim Porter; +Cc: sebastian, 73133, ganimard > Date: Sun, 13 Oct 2024 21:35:33 -0700 > Cc: Eli Zaretskii <eliz@gnu.org>, 73133@debbugs.gnu.org, ganimard@tuta.io > From: Jim Porter <jporterbugs@gmail.com> > > One last question: do you have FSF copyright assignment paperwork filled > out? AFAIK, Sebastián is in the middle of the assignment process, but it was not yet completed. ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-14 14:03 ` Eli Zaretskii @ 2024-10-15 11:43 ` Sebastián Monía 2024-10-19 7:46 ` Eli Zaretskii 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-10-15 11:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Jim Porter, 73133, ganimard Eli Zaretskii <eliz@gnu.org> writes: >> Date: Sun, 13 Oct 2024 21:35:33 -0700 >> Cc: Eli Zaretskii <eliz@gnu.org>, 73133@debbugs.gnu.org, ganimard@tuta.io >> From: Jim Porter <jporterbugs@gmail.com> >> >> One last question: do you have FSF copyright assignment paperwork filled >> out? > > AFAIK, Sebastián is in the middle of the assignment process, but it > was not yet completed. This is correct. I sent the form signed, didn't hear back yet. -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-15 11:43 ` Sebastián Monía @ 2024-10-19 7:46 ` Eli Zaretskii 2024-10-19 17:56 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Eli Zaretskii @ 2024-10-19 7:46 UTC (permalink / raw) To: Sebastián Monía; +Cc: jporterbugs, 73133, ganimard > From: Sebastián Monía <sebastian@sebasmonia.com> > Cc: Jim Porter <jporterbugs@gmail.com>, 73133@debbugs.gnu.org, > ganimard@tuta.io > Date: Tue, 15 Oct 2024 07:43:40 -0400 > > > Eli Zaretskii <eliz@gnu.org> writes: > >> Date: Sun, 13 Oct 2024 21:35:33 -0700 > >> Cc: Eli Zaretskii <eliz@gnu.org>, 73133@debbugs.gnu.org, ganimard@tuta.io > >> From: Jim Porter <jporterbugs@gmail.com> > >> > >> One last question: do you have FSF copyright assignment paperwork filled > >> out? > > > > AFAIK, Sebastián is in the middle of the assignment process, but it > > was not yet completed. > > This is correct. I sent the form signed, didn't hear back yet. The legal paperwork is now done, so Sebastián, please update the patch to fix the nit with unused argument HEADERS in eww--html-if-doctype, and resubmit, so we could install the changes. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-19 7:46 ` Eli Zaretskii @ 2024-10-19 17:56 ` Sebastián Monía 2024-10-20 19:17 ` Jim Porter 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-10-19 17:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: jporterbugs, 73133, ganimard [-- Attachment #1: Type: text/plain, Size: 449 bytes --] Eli Zaretskii <eliz@gnu.org> writes: > The legal paperwork is now done, so Sebastián, please update the patch > to fix the nit with unused argument HEADERS in eww--html-if-doctype, > and resubmit, so we could install the changes. > > Thanks. What a momentous ocassion :) Attached the patch with that correction (and a small dostring fix that 'checkdoc' caught) Thank you everyone for your help in this process. Regards, Seb [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: bug73133-doctype --] [-- Type: text/x-patch, Size: 3003 bytes --] From e35f4502383e368747d5f2bd8bcb9ed872315029 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?= <sebastian@sebasmonia.com> Date: Tue, 8 Oct 2024 23:26:42 -0400 Subject: [PATCH] Add customization to let EWW guess content-type if needed (bug#73133) --- lisp/net/eww.el | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index b5d2f20781a..147982057c5 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -108,6 +108,19 @@ eww-suggest-uris eww-current-url eww-bookmark-urls)) +(defcustom eww-guess-content-type-functions + '(eww--html-if-doctype) + "List of functions used to guess a page's content-type. +These are only used when the page does not have a valid Content-Type +header. Functions are called in order, until one of them returns the +value to be used as Content-Type. They receive two parameters: an alist +of headers, and the buffer that holds the complete response. If the +list is exhausted, eww assumes \"text/plain\" so the user can see the +markup." + :version "31.1" + :group 'eww + :type '(repeat function)) + (defcustom eww-bookmarks-directory user-emacs-directory "Directory where bookmark files will be stored." :version "25.1" @@ -630,6 +643,30 @@ eww-html-p (member content-type '("text/html" "application/xhtml+xml"))) +(defun eww--guess-content-type (headers response-buffer) + "Use HEADERS and RESPONSE-BUFFER to guess the Content-Type. +Will call each function in `eww-guess-content-type-functions', until one +of them returns a value. This mechanism is used only if there isn't a +valid Content-Type header. If none of the functions can guess, return +\"text/plain\", so at least the mark up is displayed." + (or (run-hook-with-args-until-success + 'eww-guess-content-type-functions headers response-buffer) + "text/plain")) + +(defun eww--html-if-doctype (_headers response-buffer) + "Return \"text/html\" if RESPONSE-BUFFER has an HTML doctype declaration. +HEADERS is unused." + ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype + (let ((case-fold-search t) + (target + "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) + (with-current-buffer response-buffer + (goto-char (point-min)) + ;; match basic <!doctype html> and also legacy variants as + ;; specified in link above + (when (re-search-forward target nil t) + "text/html")))) + (defun eww--rename-buffer () "Rename the current EWW buffer. The renaming scheme is performed in accordance with @@ -659,7 +696,7 @@ eww-render (content-type (mail-header-parse-content-type (if (zerop (length (cdr (assoc "content-type" headers)))) - "text/plain" + (eww--guess-content-type headers (current-buffer)) (cdr (assoc "content-type" headers))))) (charset (intern (downcase -- 2.43.0 [-- Attachment #3: Type: text/plain, Size: 56 bytes --] -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply related [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-19 17:56 ` Sebastián Monía @ 2024-10-20 19:17 ` Jim Porter 2024-10-21 1:48 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-10-20 19:17 UTC (permalink / raw) To: Sebastián Monía, Eli Zaretskii; +Cc: 73133, ganimard On 10/19/2024 10:56 AM, Sebastián Monía wrote: > Thank you everyone for your help in this process. One last thought before I merge this: I notice that when we can't guess a Content-Type, we use "text/plain" as a fallback. Per RFC-9110[1], the fallback should be "application/octet-stream". I tested this out in EWW, and we still display "application/octet-stream" pages as text in EWW, so there's no difference in behavior by default vs "text/plain". However, users who customize 'eww-use-external-browser-for-content-type' could make pages like that open externally, which I think makes sense. For non-HTML pages with no actual Content-Type header, they're at least reasonably likely to be binary files, so you'd probably want to download them rather than display them. Does anyone else have any thoughts on the relative merits of falling back to "application/octet-stream" vs "text/plain"? If we go with the former, I can update the patch when I merge. [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.3-5 ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-20 19:17 ` Jim Porter @ 2024-10-21 1:48 ` Sebastián Monía 2024-10-22 4:59 ` Jim Porter 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-10-21 1:48 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, 73133, ganimard Jim Porter <jporterbugs@gmail.com> writes: > On 10/19/2024 10:56 AM, Sebastián Monía wrote: >> Thank you everyone for your help in this process. > > One last thought before I merge this: I notice that when we can't > guess a Content-Type, we use "text/plain" as a fallback. Per > RFC-9110[1], the fallback should be "application/octet-stream". I used text/plain only because it was the original behaviour, not a particularly interesting reason! > However, users who customize > 'eww-use-external-browser-for-content-type' could make pages like that > open externally, which I think makes sense. > [...] > Does anyone else have any thoughts on the relative merits of falling > back to "application/octet-stream" vs "text/plain"? If we go with the > former, I can update the patch when I merge. I think it is a reasonable change. TIL about that option, too. Regards, Seb -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-21 1:48 ` Sebastián Monía @ 2024-10-22 4:59 ` Jim Porter 2024-10-22 12:35 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-10-22 4:59 UTC (permalink / raw) To: Sebastián Monía; +Cc: Eli Zaretskii, 73133-done, ganimard On 10/20/2024 6:48 PM, Sebastián Monía wrote: > I used text/plain only because it was the original behaviour, not a > particularly interesting reason! Thanks, I've now pushed this change to the master branch as 9074a9f496b, so I'm closing this bug. (Of course, if there's anything remaining to do here, just let me know.) ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-22 4:59 ` Jim Porter @ 2024-10-22 12:35 ` Sebastián Monía 2024-10-22 12:36 ` Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-10-22 12:35 UTC (permalink / raw) To: Jim Porter; +Cc: Eli Zaretskii, 73133-done, ganimard Jim Porter <jporterbugs@gmail.com> writes: > On 10/20/2024 6:48 PM, Sebastián Monía wrote: >> I used text/plain only because it was the original behaviour, not a >> particularly interesting reason! > > Thanks, I've now pushed this change to the master branch as > 9074a9f496b, so I'm closing this bug. (Of course, if there's anything > remaining to do here, just let me know.) Not that I can think of. Thank you for fixing this changelog too, will keep that in mind for future patches :) -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-22 12:35 ` Sebastián Monía @ 2024-10-22 12:36 ` Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 0 replies; 34+ messages in thread From: Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-10-22 12:36 UTC (permalink / raw) To: Sebastián Monía; +Cc: Jim Porter, Eli Zaretskii, 73133 Done [-- Attachment #1: Type: text/plain, Size: 712 bytes --] Thanks for all your work, Jim and Sebastián and others! G 23 Oct 2024, 1:35 am by sebastian@sebasmonia.com: > Jim Porter <jporterbugs@gmail.com> writes: > >> On 10/20/2024 6:48 PM, Sebastián Monía wrote: >> >>> I used text/plain only because it was the original behaviour, not a >>> particularly interesting reason! >>> >> >> Thanks, I've now pushed this change to the master branch as >> 9074a9f496b, so I'm closing this bug. (Of course, if there's anything >> remaining to do here, just let me know.) >> > > Not that I can think of. Thank you for fixing this changelog too, will > keep that in mind for future patches :) > > -- > Sebastián Monía > https://site.sebasmonia.com/ > [-- Attachment #2: Type: text/html, Size: 1471 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-09-08 20:52 bug#73133: 29.2; EWW fails to render some webpages Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors 2024-09-10 6:06 ` Jim Porter @ 2024-10-23 10:43 ` Mattias Engdegård 2024-10-23 16:19 ` Mattias Engdegård ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: Mattias Engdegård @ 2024-10-23 10:43 UTC (permalink / raw) To: Sebastián Monía; +Cc: Jim Porter, Eli Zaretskii, 73133, ganimard Sebastián, thanks for your contribution! A few minor points about this part: 663 (let ((case-fold-search t) 664 (target 665 "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) 666 (with-current-buffer response-buffer First of all, `case-fold-search` becomes buffer-local if set, so binding it before changing buffer won't help. You need to do it the other way around. The regexp is a bit muddled. (Carets here apply to the quoted line below.) 665 "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) ...................................^ Why match the terminating `>` in one branch (without DOCTYPE legacy string) but not the other? ..................................................^^ Useless backslash(es) here. Did you mean to include something else? (Relint found this one, which is what brought me here.) .............................................................^ Why the `+`? According to the reference, there should be one single or double quote here. (https://html.spec.whatwg.org/multipage/syntax.html#doctype-legacy-string) ................................^^^............^^^ These two capture groups don't seem to be used; you probably meant to use non-capturing \(?:...\) brackets. ..................................................^^^^^^^^ A character alternative would be better here: ["']. An exact translation of your regexp to the rx notation might be: (rx "<!doctype" (+ " ") "html" (* " ") (group (| ">" (: "system" (+ " ") (+ (group (| "\"" "'"))) "about:legacy-compat")))) but perhaps you meant something like (rx "<!doctype" (+ " ") "html" (? (* " ") "system" (+ " ") (| "\"" "'") "about:legacy-compat" (| "\"" "'")) (* " ") ">") ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-23 10:43 ` Mattias Engdegård @ 2024-10-23 16:19 ` Mattias Engdegård 2024-10-23 18:51 ` Jim Porter 2024-10-24 3:32 ` Sebastián Monía 2 siblings, 0 replies; 34+ messages in thread From: Mattias Engdegård @ 2024-10-23 16:19 UTC (permalink / raw) To: Sebastián Monía Cc: Jim Porter, Eli Zaretskii, control, 73133, ganimard reopen 73133 stop Re-opening the bug so that we don't forget to remedy the above points. ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-23 10:43 ` Mattias Engdegård 2024-10-23 16:19 ` Mattias Engdegård @ 2024-10-23 18:51 ` Jim Porter 2024-10-24 3:35 ` Sebastián Monía 2024-10-24 3:32 ` Sebastián Monía 2 siblings, 1 reply; 34+ messages in thread From: Jim Porter @ 2024-10-23 18:51 UTC (permalink / raw) To: Mattias Engdegård, Sebastián Monía Cc: Eli Zaretskii, 73133, ganimard On 10/23/2024 3:43 AM, Mattias Engdegård wrote: > An exact translation of your regexp to the rx notation might be: > > (rx "<!doctype" (+ " ") "html" (* " ") > (group > (| ">" > (: "system" (+ " ") (+ (group (| "\"" "'"))) > "about:legacy-compat")))) > > but perhaps you meant something like > > (rx "<!doctype" (+ " ") "html" > (? (* " ") "system" (+ " ") > (| "\"" "'") "about:legacy-compat" (| "\"" "'")) > (* " ") ">") Thoughts on just simplifying to checking for "<!doctype html"? That way, we'd also guess "text/html" for all the (mostly obsolete) HTML doctypes here: <https://www.w3.org/QA/2002/04/valid-dtd-list.html>. (Technically the XHTML ones should be "application/xhtml+xml" but I don't think that makes any difference for EWW.) ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-23 18:51 ` Jim Porter @ 2024-10-24 3:35 ` Sebastián Monía 2024-10-24 17:13 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-10-24 3:35 UTC (permalink / raw) To: Jim Porter; +Cc: 73133, Mattias Engdegård, Eli Zaretskii, ganimard Jim Porter <jporterbugs@gmail.com> writes: > Thoughts on just simplifying to checking for "<!doctype html"? That > way, we'd also guess "text/html" for all the (mostly obsolete) HTML > doctypes here: <https://www.w3.org/QA/2002/04/valid-dtd-list.html>. It sounds like a good idea, can provide a patch in a couple days (maybe tomorrow). That leaves some time for dissenting voices to express any concerns with this approach. -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-24 3:35 ` Sebastián Monía @ 2024-10-24 17:13 ` Sebastián Monía 2024-10-28 15:45 ` Mattias Engdegård 0 siblings, 1 reply; 34+ messages in thread From: Sebastián Monía @ 2024-10-24 17:13 UTC (permalink / raw) To: Jim Porter; +Cc: 73133, Mattias Engdegård, Eli Zaretskii, ganimard [-- Attachment #1: Type: text/plain, Size: 558 bytes --] Sebastián Monía <sebastian@sebasmonia.com> writes: > Jim Porter <jporterbugs@gmail.com> writes: >> Thoughts on just simplifying to checking for "<!doctype html"? That >> way, we'd also guess "text/html" for all the (mostly obsolete) HTML >> doctypes here: <https://www.w3.org/QA/2002/04/valid-dtd-list.html>. > > It sounds like a good idea, can provide a patch in a couple days (maybe > tomorrow). That leaves some time for dissenting voices to express any > concerns with this approach. Attached a patch with the corrections mentioned so far. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: bug#73133 --] [-- Type: text/x-patch, Size: 1683 bytes --] From 952930c78dcfe7e4bb3a32504805239ae32073e9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?= <sebastian.monia@sebasmonia.com> Date: Thu, 24 Oct 2024 13:09:11 -0400 Subject: [PATCH] More lax doctype check in EWW (bug#73133) The regexp to match doctype tags was simplified and will match more legacy entries; also correct binding of case-fold-search. * lisp/net/eww.el (eww--html buffer-list): Update function. --- lisp/net/eww.el | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index 7bbbeadaedd..71e4d720b74 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -660,15 +660,14 @@ eww--html-if-doctype "Return \"text/html\" if RESPONSE-BUFFER has an HTML doctype declaration. HEADERS is unused." ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype - (let ((case-fold-search t) - (target - "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) - (with-current-buffer response-buffer - (goto-char (point-min)) - ;; match basic <!doctype html> and also legacy variants as - ;; specified in link above - (when (re-search-forward target nil t) - "text/html")))) + (with-current-buffer response-buffer + (let ((case-fold-search t)) + (save-excursion + (goto-char (point-min)) + ;; match basic <!doctype html> and also legacy variants as + ;; specified in link above - being purposely lax about it + (when (re-search-forward "<!doctype html" nil t) + "text/html"))))) (defun eww--rename-buffer () "Rename the current EWW buffer. -- 2.45.2.windows.1 [-- Attachment #3: Type: text/plain, Size: 54 bytes --] -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply related [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-24 17:13 ` Sebastián Monía @ 2024-10-28 15:45 ` Mattias Engdegård 2024-10-30 15:21 ` Sebastián Monía 0 siblings, 1 reply; 34+ messages in thread From: Mattias Engdegård @ 2024-10-28 15:45 UTC (permalink / raw) To: Sebastián Monía; +Cc: Jim Porter, Eli Zaretskii, 73133, ganimard 24 okt. 2024 kl. 19.13 skrev Sebastián Monía <sebastian@sebasmonia.com>: > Attached a patch with the corrections mentioned so far. Fine as far as I'm concerned. You could use `search-forward` instead of `re-search-forward` since you aren't actually using a regexp any more. ^ permalink raw reply [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-28 15:45 ` Mattias Engdegård @ 2024-10-30 15:21 ` Sebastián Monía 0 siblings, 0 replies; 34+ messages in thread From: Sebastián Monía @ 2024-10-30 15:21 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Jim Porter, Eli Zaretskii, 73133, ganimard [-- Attachment #1: Type: text/plain, Size: 358 bytes --] Mattias Engdegård <mattias.engdegard@gmail.com> writes: > 24 okt. 2024 kl. 19.13 skrev Sebastián Monía <sebastian@sebasmonia.com>: > >> Attached a patch with the corrections mentioned so far. > > Fine as far as I'm concerned. You could use `search-forward` instead > of `re-search-forward` since you aren't actually using a regexp any > more. > [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: search-forward --] [-- Type: text/x-patch, Size: 1680 bytes --] From ab4a00e3ae5c8b2f6a9d3355df0ee406dbccaee8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?= <sebastian.monia@sebasmonia.com> Date: Thu, 24 Oct 2024 13:09:11 -0400 Subject: [PATCH] More lax doctype check in EWW (bug#73133) The regexp to match doctype tags was simplified and will match more legacy entries; also correct binding of case-fold-search. * lisp/net/eww.el (eww--html buffer-list): Update function. --- lisp/net/eww.el | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index 7bbbeadaedd..ec2f4e494e4 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -660,15 +660,14 @@ eww--html-if-doctype "Return \"text/html\" if RESPONSE-BUFFER has an HTML doctype declaration. HEADERS is unused." ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype - (let ((case-fold-search t) - (target - "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) - (with-current-buffer response-buffer - (goto-char (point-min)) - ;; match basic <!doctype html> and also legacy variants as - ;; specified in link above - (when (re-search-forward target nil t) - "text/html")))) + (with-current-buffer response-buffer + (let ((case-fold-search t)) + (save-excursion + (goto-char (point-min)) + ;; match basic <!doctype html> and also legacy variants as + ;; specified in link above - being purposely lax about it + (when (search-forward "<!doctype html" nil t) + "text/html"))))) (defun eww--rename-buffer () "Rename the current EWW buffer. -- 2.45.2.windows.1 [-- Attachment #3: Type: text/plain, Size: 95 bytes --] New patch that uses search-forward :) -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply related [flat|nested] 34+ messages in thread
* bug#73133: 29.2; EWW fails to render some webpages 2024-10-23 10:43 ` Mattias Engdegård 2024-10-23 16:19 ` Mattias Engdegård 2024-10-23 18:51 ` Jim Porter @ 2024-10-24 3:32 ` Sebastián Monía 2 siblings, 0 replies; 34+ messages in thread From: Sebastián Monía @ 2024-10-24 3:32 UTC (permalink / raw) To: Mattias Engdegård; +Cc: Jim Porter, Eli Zaretskii, 73133, ganimard Mattias Engdegård <mattias.engdegard@gmail.com> writes: > Sebastián, thanks for your contribution! A few minor points about this part: > > 663 (let ((case-fold-search t) > 664 (target > 665 "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) > 666 (with-current-buffer response-buffer > > First of all, `case-fold-search` becomes buffer-local if set, so binding it before changing buffer won't help. You need to do it the other way around. Thank you for picking this up! Makes wonder what I did wrong when testing, that it worked OK. Will correct it in the next patch. > The regexp is a bit muddled. (Carets here apply to the quoted line below.) > > 665 "<!doctype +html *\\(>\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) > ...................................^ > Why match the terminating `>` in one branch (without DOCTYPE legacy string) but not the other? The idea was to match exactly a "modern" doctype declaration, or softly a legacy one since they are more likely to have...wonky? markup. > ..................................................^^ > Useless backslash(es) here. Did you mean to include something else? > (Relint found this one, which is what brought me here.) I don't think so, it is an honest mistake. I rarely write regexps in elisp code (or any code, for that matter :) haha), only interactive use > .............................................................^ > Why the `+`? According to the reference, there should be one single or double quote here. > (https://html.spec.whatwg.org/multipage/syntax.html#doctype-legacy-string) > > ................................^^^............^^^ > These two capture groups don't seem to be used; you probably meant to use non-capturing \(?:...\) brackets. This is correct (just read on non-capturing groups). > ..................................................^^^^^^^^ > A character alternative would be better here: ["']. > > An exact translation of your regexp to the rx notation might be: Despite all the mistakes in the regex above, and a few tries to understand it, the rx notation doesn't really click for me. I am more than happy to use either of the versions you provided. Thank you for your review! -- Sebastián Monía https://site.sebasmonia.com/ ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2024-10-30 15:21 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-08 20:52 bug#73133: 29.2; EWW fails to render some webpages Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors 2024-09-10 6:06 ` Jim Porter 2024-09-21 9:13 ` Eli Zaretskii 2024-09-21 17:12 ` Jim Porter 2024-09-23 15:43 ` Sebastián Monía 2024-09-28 10:58 ` Eli Zaretskii 2024-09-30 15:52 ` Sebastián Monía 2024-09-23 15:56 ` Sebastián Monía 2024-09-24 18:31 ` Jim Porter 2024-09-25 20:46 ` Sebastián Monía 2024-09-26 1:59 ` Jim Porter 2024-09-30 17:10 ` Sebastián Monía 2024-10-03 23:39 ` Jim Porter 2024-10-09 3:30 ` Sebastián Monía 2024-10-09 3:42 ` Jim Porter 2024-10-10 2:08 ` Sebastián Monía 2024-10-14 4:35 ` Jim Porter 2024-10-14 14:03 ` Eli Zaretskii 2024-10-15 11:43 ` Sebastián Monía 2024-10-19 7:46 ` Eli Zaretskii 2024-10-19 17:56 ` Sebastián Monía 2024-10-20 19:17 ` Jim Porter 2024-10-21 1:48 ` Sebastián Monía 2024-10-22 4:59 ` Jim Porter 2024-10-22 12:35 ` Sebastián Monía 2024-10-22 12:36 ` Ganimard via Bug reports for GNU Emacs, the Swiss army knife of text editors 2024-10-23 10:43 ` Mattias Engdegård 2024-10-23 16:19 ` Mattias Engdegård 2024-10-23 18:51 ` Jim Porter 2024-10-24 3:35 ` Sebastián Monía 2024-10-24 17:13 ` Sebastián Monía 2024-10-28 15:45 ` Mattias Engdegård 2024-10-30 15:21 ` Sebastián Monía 2024-10-24 3:32 ` Sebastián Monía
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).