From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?Q?Sebasti=C3=A1n_?= =?UTF-8?Q?Mon=C3=ADa?= Newsgroups: gmane.emacs.bugs Subject: bug#73133: 29.2; EWW fails to render some webpages Date: Tue, 08 Oct 2024 23:30:03 -0400 Message-ID: <87y12y7y2s.fsf@sebasmonia.com> References: <86plox4bef.fsf@gnu.org> <7eb7b048-06ea-5751-56e1-590689c8c318@gmail.com> <8e285069-6e95-de49-dd46-92ce49b94372@gmail.com> <5e49a521-a191-15db-6368-6ca0f046d68a@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8723"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Eli Zaretskii , 73133@debbugs.gnu.org, ganimard@tuta.io To: Jim Porter Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Oct 09 05:31:12 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1syNPp-00024c-9F for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 09 Oct 2024 05:31:10 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1syNPd-0007aY-Ug; Tue, 08 Oct 2024 23:30:58 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1syNPa-0007PY-Gx for bug-gnu-emacs@gnu.org; Tue, 08 Oct 2024 23:30:55 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1syNPZ-0005jA-U7 for bug-gnu-emacs@gnu.org; Tue, 08 Oct 2024 23:30:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:Date:References:In-Reply-To:From:To:Subject; bh=XpEDLtKItcSmR6ILvltuWMUbo4Y7SXNBkAeqt2zTm1Q=; b=rNl8Rai4sdHazDW+v9QtfQHi1e88nX8FS8bA9+RbOItw87IEnpWC+MX4iXivTcGo42pToz0l8lWMI576r4LijiCtSEB3W+Isahotpl8h1R45+P0vVNpcCnbc7JLHmlxAqR8C8tCeYxLo3rfvEjAh/IGmTmjwNrfGaam8S5fn+GHvN56QWQ/G3plaIb5ShUYadU1YUXPtYoebgQtwCj0n9EskDpIfWf7oxFEO0GNMZIYjWShE9x3hHX7PowviK1BgD3td5hA0Hkm6o232B6z2wF4dwHGLjII0WH+FpmGr3F5mQEj6nxQGsYuel5IMX0/3i87Sd+vEn8oh2BQp/gH7Og==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1syNPi-0001PY-Bk for bug-gnu-emacs@gnu.org; Tue, 08 Oct 2024 23:31:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: =?UTF-8?Q?Sebasti=C3=A1n_?= =?UTF-8?Q?Mon=C3=ADa?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 09 Oct 2024 03:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 73133 X-GNU-PR-Package: emacs Original-Received: via spool by 73133-submit@debbugs.gnu.org id=B73133.17284446245370 (code B ref 73133); Wed, 09 Oct 2024 03:31:02 +0000 Original-Received: (at 73133) by debbugs.gnu.org; 9 Oct 2024 03:30:24 +0000 Original-Received: from localhost ([127.0.0.1]:55118 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1syNP5-0001OY-Df for submit@debbugs.gnu.org; Tue, 08 Oct 2024 23:30:23 -0400 Original-Received: from fhigh-a1-smtp.messagingengine.com ([103.168.172.152]:38947) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1syNP1-0001Hc-R0 for 73133@debbugs.gnu.org; Tue, 08 Oct 2024 23:30:22 -0400 Original-Received: from phl-compute-03.internal (phl-compute-03.phl.internal [10.202.2.43]) by mailfhigh.phl.internal (Postfix) with ESMTP id C0D1E11401B1; Tue, 8 Oct 2024 23:30:04 -0400 (EDT) Original-Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Tue, 08 Oct 2024 23:30:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sebasmonia.com; h=cc:cc:content-type:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm2; t=1728444604; x= 1728531004; bh=XpEDLtKItcSmR6ILvltuWMUbo4Y7SXNBkAeqt2zTm1Q=; b=G 2jZ7AblpHVHClG1DF8mN4TJnFEzzMwNKWe4T9L4V7yB2cxuz660SbbquN8CJXNH3 VpQhxFQnfSg58rXmxBk9aw9OMg6Nn+kP7eR9dcfYJT5J6ij6zv6CRbHT0DS5AgU5 FM/rHEy6CJrZmB5iZty8WqwS3OxJtLU/YraR7skNXKfhRwOKgw5L2vLjKs8+BYfA yntL/9yu6+WrTwIxzcVXzVimeJXlntRSGrSPmULAqFMqRKsUkXP2wa0+iMlMSQ6V C6PBGl+VafOTwWG/xuk8W9xqzs63EX8whvKiAvgZIsXggEXuQZqaymgh8/4+Xhga iADK8dbhoCLpJ35AQdOeQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1728444604; x=1728531004; bh=XpEDLtKItcSmR6ILvltuWMUbo4Y7 SXNBkAeqt2zTm1Q=; b=M/R+08j34ekuFzoGuwG83Wmjz4H3M/+N+L37iZQGahSN mMkhmNpwV5aos85coK4cIK9fITBmguUEOjvAccVfFLlJ67t48gVOrWnI5/Nw7b0X kspdrwVn/Q5PIdlwN+VD/iDUvfPFUwPG3JDXzTUyxQiTAZzVahEgza7d10PF+dhe tLKbWaO/cpL1xVMaC1DgaecPDukIfi07uZlHk9DqeY33cSMTwmhSdXa7yrHlTPKN ww1kt5AOilkfsNiQY0m2ehV5K588oeN3iQTUGfMEpAlBTt8Jja8WKbjsoUGRjGr8 FW4yTomhVdFa57ZwHKz+xGHTvl3E1lR6DwPH4AsaKQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrvdefvddgjedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggvpdfu rfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnh htshculddquddttddmnecujfgurhephffvvefujghffffkfgggtgesmhdtreertderjeen ucfhrhhomhepufgvsggrshhtihojnhcuofhonhovrgcuoehsvggsrghsthhirghnsehsvg gsrghsmhhonhhirgdrtghomheqnecuggftrfgrthhtvghrnhepheekvdeufffgleeuvdeu tefhjeettedufeffleehjeeguefggfetteefvddvkeetnecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomhepshgvsggrshhtihgrnhesshgvsggrshhm ohhnihgrrdgtohhmpdhnsggprhgtphhtthhopeegpdhmohguvgepshhmthhpohhuthdprh gtphhtthhopehgrghnihhmrghrugesthhuthgrrdhiohdprhgtphhtthhopeejfedufeef seguvggssghughhsrdhgnhhurdhorhhgpdhrtghpthhtohepvghlihiisehgnhhurdhorh hgpdhrtghpthhtohepjhhpohhrthgvrhgsuhhgshesghhmrghilhdrtghomh X-ME-Proxy: Feedback-ID: iab2c46da:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 8 Oct 2024 23:30:04 -0400 (EDT) In-Reply-To: <5e49a521-a191-15db-6368-6ca0f046d68a@gmail.com> (Jim Porter's message of "Thu, 3 Oct 2024 16:39:06 -0700") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:293185 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Jim Porter writes: > On 9/30/2024 10:10 AM, Sebasti=C3=A1n Mon=C3=ADa wrote: >> We aren't really guessing the content-type, at least in the scope of my >> original patch, and probably this bug. We just want to know if the page >> is HTML to render it, in these snippets (part of eww-render): > > What I was thinking about was something like this (with some > appropriate implementation for 'eww--guess-content-type', possibly > accepting args as needed): > > diff --git a/lisp/net/eww.el b/lisp/net/eww.el > index b5d2f20781a..1c134717cc9 100644 > --- a/lisp/net/eww.el > +++ b/lisp/net/eww.el > @@ -659,7 +659,7 @@ eww-render > (content-type > (mail-header-parse-content-type > (if (zerop (length (cdr (assoc "content-type" headers)))) > - "text/plain" > + (eww--guess-content-type) > (cdr (assoc "content-type" headers))))) > (charset (intern > (downcase Hello! Attached a new patch that goes in the direction outlined above, let me know what you think. Cheers, Seb --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0001-Add-customization-to-let-EWW-guess-content-type-if-n.patch Content-Description: patch >From 309a7d729665f14964a550f57f589a79705e23d6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?= Date: Tue, 8 Oct 2024 23:26:42 -0400 Subject: [PATCH] Add customization to let EWW guess content-type if needed (bug#73133) --- lisp/net/eww.el | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/lisp/net/eww.el b/lisp/net/eww.el index b5d2f20781a..0a9a621f3e5 100644 --- a/lisp/net/eww.el +++ b/lisp/net/eww.el @@ -108,6 +108,19 @@ eww-suggest-uris eww-current-url eww-bookmark-urls)) +(defcustom eww-guess-content-type-functions + '(eww--html-if-doctype) + "List of functions used to guess a page's content-type. +These are only used when the page does not have a valid Content-Type +header. Functions are called in order, until one of them returns the +value to be used as Content-Type. They receive two parameters: an alist +of headers, and the buffer that holds the complete response. If the +list is exhausted, eww assumes \"text/plain\" so the user can see the +markup." + :version "31.1" + :group 'eww + :type '(repeat function)) + (defcustom eww-bookmarks-directory user-emacs-directory "Directory where bookmark files will be stored." :version "25.1" @@ -630,6 +643,31 @@ eww-html-p (member content-type '("text/html" "application/xhtml+xml"))) +(defun eww--guess-content-type (headers response-buffer) + "Use HEADERS and RESPONSE to guess the Content-Type. +Will call each function in `eww-guess-content-type-functions', until one +of them returns a value. This mechanism is used only if there isn't a +valid Content-Type header. If none of the functions can guess, return +\"text/plain\", so at least the mark up is displayed." + (let ((first-guess (seq-some + (lambda (f) (funcall f headers response-buffer)) + eww-guess-content-type-functions))) + (or first-guess "text/plain"))) + +(defun eww--html-if-doctype (headers response-buffer) + "Return \"text/html\" if RESPONSE-BUFFER has an HTML doctype declaration. +HEADERS is unused." + ;; https://html.spec.whatwg.org/multipage/syntax.html#the-doctype + (let ((case-fold-search t) + (target + "\\|system +\\(\\\"\\|'\\)+about:legacy-compat\\)")) + (with-current-buffer response-buffer + (goto-char (point-min)) + ;; match basic and also legacy variants as + ;; specified in link above + (when (re-search-forward target nil t) + "text/html")))) + (defun eww--rename-buffer () "Rename the current EWW buffer. The renaming scheme is performed in accordance with @@ -659,7 +697,7 @@ eww-render (content-type (mail-header-parse-content-type (if (zerop (length (cdr (assoc "content-type" headers)))) - "text/plain" + (eww--guess-content-type headers buffer) (cdr (assoc "content-type" headers))))) (charset (intern (downcase -- 2.43.0 --=-=-=--