unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#68254: EWW ‘readable’ by default
@ 2024-01-05  7:35 Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-05 11:52 ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-05  7:35 UTC (permalink / raw)
  To: 68254

[-- Attachment #1: Type: text/plain, Size: 371 bytes --]

I nearly always prefer reading webpages in EWW after running the eww-readable command. Can it be possible to have EWW open webpages in the ‘readable’ view by default, but let you display the full (pre–​eww-readable​) render of a webpage with a command? I.e. have an inverse of the current setup, where you have to manually toggle the readable view.
—Navajeeth

[-- Attachment #2: Type: text/html, Size: 461 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-01-05  7:35 bug#68254: EWW ‘readable’ by default Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-01-05 11:52 ` Eli Zaretskii
       [not found]   ` <poNSnv1DQ7L71-FirbCx9nuQ8gqLlPGTIjDYk2pKo2_H3BPuJArYQ2ziQ4pyADSxHCY5cU40D6MUzRqBAZE3pEcFmnzFPD49xunpLyh1UqI=@proton.me>
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-01-05 11:52 UTC (permalink / raw)
  To: Navajeeth; +Cc: 68254

> Date: Fri, 05 Jan 2024 07:35:56 +0000
> From:  Navajeeth via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> I nearly always prefer reading webpages in EWW after running the eww-readable command. Can
> it be possible to have EWW open webpages in the ‘readable’ view by default, but let you display the
> full (pre–​eww-readable​) render of a webpage with a command? I.e. have an inverse of the current
> setup, where you have to manually toggle the readable view.

Did you try to add 'eww-readable' to eww-after-render-hook?





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
       [not found]   ` <poNSnv1DQ7L71-FirbCx9nuQ8gqLlPGTIjDYk2pKo2_H3BPuJArYQ2ziQ4pyADSxHCY5cU40D6MUzRqBAZE3pEcFmnzFPD49xunpLyh1UqI=@proton.me>
@ 2024-01-05 13:35     ` Eli Zaretskii
  2024-03-17 19:24       ` Jim Porter
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-01-05 13:35 UTC (permalink / raw)
  To: Navajeeth; +Cc: 68254

[Please use Reply All to reply, so that the bug tracker is CC'ed.]

> Date: Fri, 05 Jan 2024 12:08:29 +0000
> From: Navajeeth <yvv0@proton.me>
> 
> I’ve tried that method. While at first it appears to work how I want, it’s sub-optimal because it clutters
> your history with two version of every webpage you open: one the full non-readable version and then
> the readable version generated by the after-render-hook. Going back in the history is a chore,
> you need to press ‘l’ twice to go back one webpage.
> 
> I used to tolerate it for a while, but now I feel that there could be a better way.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-01-05 13:35     ` Eli Zaretskii
@ 2024-03-17 19:24       ` Jim Porter
  2024-03-18  4:32         ` Adam Porter
  2024-03-18 12:37         ` Eli Zaretskii
  0 siblings, 2 replies; 14+ messages in thread
From: Jim Porter @ 2024-03-17 19:24 UTC (permalink / raw)
  To: Eli Zaretskii, Navajeeth; +Cc: 68254

[-- Attachment #1: Type: text/plain, Size: 1397 bytes --]

On 1/5/2024 5:35 AM, Eli Zaretskii wrote:
> [Please use Reply All to reply, so that the bug tracker is CC'ed.]
> 
>> Date: Fri, 05 Jan 2024 12:08:29 +0000
>> From: Navajeeth <yvv0@proton.me>
>>
>> I’ve tried that method. While at first it appears to work how I want, it’s sub-optimal because it clutters
>> your history with two version of every webpage you open: one the full non-readable version and then
>> the readable version generated by the after-render-hook. Going back in the history is a chore,
>> you need to press ‘l’ twice to go back one webpage.
>>
>> I used to tolerate it for a while, but now I feel that there could be a better way.

Here's a patch for this. It turns 'eww-readable' into a toggle (using 
the same semantics as minor modes), and also adds an option to prevent 
adding a new history entry for each call.

After this patch, you could set 'eww-readable-adds-to-history' to nil 
and add 'eww-readable' to 'eww-after-render-hook', and then everything 
should work ok. With those settings, you could then call 'eww-readable' 
to display the full page if needed.

(There might be some value in adding another new option that lets you 
specify a list of regexps to match pages that should start in readable 
mode; then it would be easy for users to enable that for 
"https://example\.com/.*" or similar. We can do that later if there's 
any demand for it, though.)

[-- Attachment #2: 0001-Allow-toggling-readable-mode-in-EWW.patch --]
[-- Type: text/plain, Size: 5305 bytes --]

From 345df3a8f255717a653465513ac9ad9a43c4945f Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sun, 17 Mar 2024 12:01:59 -0700
Subject: [PATCH] Allow toggling "readable" mode in EWW

Additionally, add an option to prevent adding a new history entry for
each call of 'eww-readable' (bug#68254).

* lisp/net/eww.el (eww-readable-adds-to-history): New option.
(eww-readable): Toggle "readable" mode interactively, like with a minor
mode.  Consult 'eww-readable-adds-to-history'.

* doc/misc/eww.texi (Basics): Describe the new behavior.

* etc/NEWS: Announce this change.
---
 doc/misc/eww.texi |  5 +++++
 etc/NEWS          | 12 ++++++++++++
 lisp/net/eww.el   | 46 ++++++++++++++++++++++++++++++++++++----------
 3 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index d31fcf1802b..bec58da3e21 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -146,6 +146,11 @@ Basics
 which part of the document contains the ``readable'' text, and will
 only display this part.  This usually gets rid of menus and the like.
 
+When called interactively, this command toggles the display of the
+readable parts.  With a positive prefix argument, always display the
+readable parts, and with a zero or negative prefix, display the full
+page.
+
 @findex eww-toggle-fonts
 @vindex shr-use-fonts
 @kindex F
diff --git a/etc/NEWS b/etc/NEWS
index b02712dd21c..b23754fb17f 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1054,6 +1054,18 @@ entries newer than the current page.  To change the behavior when
 browsing from "historical" pages, you can customize
 'eww-before-browse-history-function'.
 
++++
+*** 'eww-readable' now toggles display of the readable parts of a web page.
+When called interactively, 'eww-readable' toggles whether to display
+only the readable parts of a page or the full page.  With a positive
+prefix argument, always display the readable parts, and with a zero or
+negative prefix, always display the full page.
+
+---
+*** New option 'eww-readable-adds-to-history'.
+When non-nil (the default), calling 'eww-readable' adds a new entry to
+the EWW page history.
+
 ** go-ts-mode
 
 +++
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 54847bdf396..305357f8f2f 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,11 @@ eww-url-transformers
   :type '(repeat function)
   :version "29.1")
 
+(defcustom eww-readable-adds-to-history t
+  "If non-nil, calling `eww-readable' adds a new entry to the history."
+  :type 'boolean
+  :version "30.1")
+
 (defface eww-form-submit
   '((((type x w32 ns haiku pgtk android) (class color))	; Like default mode line
      :box (:line-width 2 :style released-button)
@@ -1055,14 +1060,31 @@ eww-toggle-paragraph-direction
                "automatic"
              bidi-paragraph-direction)))
 
-(defun eww-readable ()
-  "View the main \"readable\" parts of the current web page.
+(defun eww-readable (&optional arg)
+  "Toggle display of only the main \"readable\" parts of the current web page.
 This command uses heuristics to find the parts of the web page that
-contains the main textual portion, leaving out navigation menus and
-the like."
-  (interactive nil eww-mode)
+contains the main textual portion, leaving out navigation menus and the
+like.
+
+If called interactively, toggle the display of the readable parts.  If
+the prefix argument is positive, display the readable parts, and if it
+is zero or negative, display the full page.
+
+If called from Lisp, toggle the display of the readable parts if ARG is
+`toggle'.  Display the readable parts if ARG is nil, omitted, or is a
+positive number.  Display the full page if ARG is a negative number."
+  (interactive (list (if current-prefix-arg
+                         (prefix-numeric-value current-prefix-arg)
+                       'toggle))
+               eww-mode)
   (let* ((old-data eww-data)
-	 (dom (with-temp-buffer
+	 (make-readable (cond
+                         ((eq arg 'toggle)
+                          (not (plist-get old-data :readable)))
+                         ((and (numberp arg) (< arg 1))
+                          nil)
+                         (t t)))
+         (dom (with-temp-buffer
 		(insert (plist-get old-data :source))
 		(condition-case nil
 		    (decode-coding-region (point-min) (point-max) 'utf-8)
@@ -1071,14 +1093,18 @@ eww-readable
 		(libxml-parse-html-region (point-min) (point-max))))
          (base (plist-get eww-data :url)))
     (eww-score-readability dom)
-    (eww-save-history)
-    (eww--before-browse)
+    (when eww-readable-adds-to-history
+      (eww-save-history)
+      (eww--before-browse))
     (eww-display-html nil nil
-                      (list 'base (list (cons 'href base))
-                            (eww-highest-readability dom))
+                      (if make-readable
+                          (list 'base (list (cons 'href base))
+                                (eww-highest-readability dom))
+                        dom)
 		      nil (current-buffer))
     (dolist (elem '(:source :url :title :next :previous :up :peer))
       (plist-put eww-data elem (plist-get old-data elem)))
+    (plist-put eww-data :readable make-readable)
     (eww--after-page-change)))
 
 (defun eww-score-readability (node)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-17 19:24       ` Jim Porter
@ 2024-03-18  4:32         ` Adam Porter
  2024-03-18  5:17           ` Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-18  5:18           ` Jim Porter
  2024-03-18 12:37         ` Eli Zaretskii
  1 sibling, 2 replies; 14+ messages in thread
From: Adam Porter @ 2024-03-18  4:32 UTC (permalink / raw)
  To: jporterbugs; +Cc: 68254, eliz, yvv0

Hi all,

I'm not sure it would be a good idea to enable eww-readable by default. 
IME eww-readable is not reliably effective enough to be used by default. 
  I think that if it were, too many users would find that EWW would 
produce unusable results by default, and they'd likely blame EWW itself 
rather than eww-readable, being unaware that eww-readable were even 
involved.

I wish this weren't the case, but the modern Web is too, er, modern, I'm 
afraid.

I like Jim's idea of having an option of URL-matching regexps that 
automatically activate eww-readable.  That does sound useful.

My two cents.

--Adam





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-18  4:32         ` Adam Porter
@ 2024-03-18  5:17           ` Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-18  5:44             ` Jim Porter
  2024-03-18  5:18           ` Jim Porter
  1 sibling, 1 reply; 14+ messages in thread
From: Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-18  5:17 UTC (permalink / raw)
  To: Adam Porter; +Cc: 68254, jporterbugs, eliz

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

Thank you so much for the patch, @Jim! I dunno how to apply patches, but I’ll learn and try yours out as soon as I can.

Having regexps to match to turn readability on is a good start. I hope there will be a more convenient way to do that, other than having to manually add to that list in your init.el; maybe a function that proactively asks you, when you apply readability, if you’d like to add it to that list with a ‘y or n’.

Albeit I find myself opening a lot of small blogs and personal websites in EWW. A lot of different domain names. Both a function that asks to automatically add to a readability-on list and manually adding to that list sound like a hassle.

I think a better way to go would be to have a readability-off list for the readability-minor-mode. In my experience, with the kind of sites I open with EWW (textual sites without a lot of graphics or JavaScript), the list of ones where ‘eww-readable’ doesn’t work is a lot smaller than the ones where it does.

But I agree with @Adam that readability shouldn’t be on as the default behaviour. I gave this thread a bad subject line. I meant in the sense: I wanted the option to turn it on and replace the default behaviour for me, because I was finding that most of the sites I was opening with EWW were working better with readability. And perhaps have it as an option for everyone to turn on.

—Navajeeth

[-- Attachment #2: Type: text/html, Size: 1572 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-18  4:32         ` Adam Porter
  2024-03-18  5:17           ` Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-18  5:18           ` Jim Porter
  1 sibling, 0 replies; 14+ messages in thread
From: Jim Porter @ 2024-03-18  5:18 UTC (permalink / raw)
  To: Adam Porter; +Cc: 68254, eliz, yvv0

On 3/17/2024 9:32 PM, Adam Porter wrote:
> I'm not sure it would be a good idea to enable eww-readable by default. 
> IME eww-readable is not reliably effective enough to be used by default. 
>   I think that if it were, too many users would find that EWW would 
> produce unusable results by default, and they'd likely blame EWW itself 
> rather than eww-readable, being unaware that eww-readable were even 
> involved.

I agree overall. It's hard to know for sure if a web page will look ok 
in readable mode without trying it first.

That's why I opted to keep the default behavior unchanged in my patch. 
It just makes it possible to add 'eww-readable' to 
'eww-after-render-hook' without producing duplicate history entries. 
That way, if most of the pages you visit *are* readable, you can set it 
up like that and still get to the full view by calling 'eww-readable' again.

> I like Jim's idea of having an option of URL-matching regexps that 
> automatically activate eww-readable.  That does sound useful.

Yeah, I think I might add that in, since 1) I'd find it useful, 2) it 
should be easy, and 3) the Safari browser already supports this, so 
there's already precedent elsewhere. (It's arguably even more relevant 
for EWW than Safari, since many webpages are a real mess in EWW without 
readable-mode.)





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-18  5:17           ` Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-18  5:44             ` Jim Porter
  0 siblings, 0 replies; 14+ messages in thread
From: Jim Porter @ 2024-03-18  5:44 UTC (permalink / raw)
  To: Navajeeth, Adam Porter; +Cc: 68254, eliz

On 3/17/2024 10:17 PM, Navajeeth via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
> Thank you so much for the patch, @Jim! I dunno how to apply patches, but 
> I’ll learn and try yours out as soon as I can.

Even though I contribute to Emacs, I tend to use the latest proper 
release as my daily editor (so I'm using 29.2 now).[1] If I want to use 
a patch I wrote on an older release, I take the new version of all the 
relevant functions and then override the old ones using 'advice-add':

   (defun updated-eww-readable (&optional arg)
    ;; new implementation here
    )
   (advice-add 'eww-readable :override 'updated-eww-readable)

If the patch has merged to the master branch, I usually wrap that with 
'(when (< emacs-major-version 30) ...)' so that it doesn't do anything 
on the master builds, and also so I know to remove it when 30.1 comes 
out and I prune my init.el.

> I think a better way to go would be to have a /readability-off/ list for 
> the readability-minor-mode. In my experience, with the kind of sites I 
> open with EWW (textual sites without a lot of graphics or JavaScript), 
> the list of ones where ‘eww-readable’ doesn’t work is a lot smaller than 
> the ones where it does.

I was thinking about doing something like this. The list of regexps 
could include a way to say both "if this regexp matches, use readable 
mode" and "if this regexp matches, *don't* use readable mode". Then you 
could make the list look something like this:

  '(("^https://example.com/" . not-readable)
    ".*")

That would make every page except those from https://example.com use 
readable mode. I think that would be the most flexible for complex 
cases, while still being simple for the common case (a list of "plain" 
regexps for readable-mode pages). It would also make it easy to have 
most of a site (except for one section) use readable-mode.

[1] Mainly I just want to avoid having to worry about updating Emacs 
master and then ending up with a broken Emacs. Murphy's Law dictates 
that that will always occur at the worst possible time.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-17 19:24       ` Jim Porter
  2024-03-18  4:32         ` Adam Porter
@ 2024-03-18 12:37         ` Eli Zaretskii
  2024-03-19  0:00           ` Jim Porter
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-03-18 12:37 UTC (permalink / raw)
  To: Jim Porter; +Cc: 68254, yvv0

> Date: Sun, 17 Mar 2024 12:24:26 -0700
> Cc: 68254@debbugs.gnu.org
> From: Jim Porter <jporterbugs@gmail.com>
> 
> Here's a patch for this. It turns 'eww-readable' into a toggle (using 
> the same semantics as minor modes), and also adds an option to prevent 
> adding a new history entry for each call.

Thanks.

> +When called interactively, this command toggles the display of the
> +readable parts.  With a positive prefix argument, always display the
> +readable parts, and with a zero or negative prefix, display the full
> +page.

The imperative form ("display") is what we use in the doc strings, but
it is not really appropriate for the manual.  Here we say "the
function displays" or "it displays" instead, which is consistent with
the first sentence in the above paragraph.

> +(defun eww-readable (&optional arg)
> +  "Toggle display of only the main \"readable\" parts of the current web page.
>  This command uses heuristics to find the parts of the web page that
> -contains the main textual portion, leaving out navigation menus and
> -the like."
> -  (interactive nil eww-mode)
> +contains the main textual portion, leaving out navigation menus and the

"contain" (since it refers to "parts", in plural).

> +If called interactively, toggle the display of the readable parts.  If
> +the prefix argument is positive, display the readable parts, and if it
> +is zero or negative, display the full page.
> +
> +If called from Lisp, toggle the display of the readable parts if ARG is
> +`toggle'.  Display the readable parts if ARG is nil, omitted, or is a
> +positive number.  Display the full page if ARG is a negative number."

This doc string should mention eww-readable-adds-to-history.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-18 12:37         ` Eli Zaretskii
@ 2024-03-19  0:00           ` Jim Porter
  2024-03-21 10:51             ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Porter @ 2024-03-19  0:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 68254, yvv0

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

On 3/18/2024 5:37 AM, Eli Zaretskii wrote:
>> Date: Sun, 17 Mar 2024 12:24:26 -0700
>> Cc: 68254@debbugs.gnu.org
>> From: Jim Porter <jporterbugs@gmail.com>
>>
>> Here's a patch for this. It turns 'eww-readable' into a toggle (using
>> the same semantics as minor modes), and also adds an option to prevent
>> adding a new history entry for each call.
> 
> Thanks.

Thanks for looking. I've addressed all of your comments, and made some 
more extensive changes to the implementation. I split up some of the 
logic in the first patch so that it's easier to reuse without error, and 
then added 'eww-readable-urls' in the second.

Because of how much I changed, I'd like to add some regression tests to 
make sure everything still works correctly, but otherwise these patches 
should be ready to go.

[-- Attachment #2: 0001-Allow-toggling-readable-mode-in-EWW.patch --]
[-- Type: text/plain, Size: 10793 bytes --]

From 4839990148e2a58cf44c04547994611392ff1955 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sun, 17 Mar 2024 12:01:59 -0700
Subject: [PATCH 1/2] Allow toggling "readable" mode in EWW

Additionally, add an option to prevent adding a new history entry for
each call of 'eww-readable' (bug#68254).

* lisp/net/eww.el (eww-retrieve):

* lisp/net/eww.el (eww-readable-adds-to-history): New option.
(eww-retrieve): Make sure we call CALLBACK in all configurations.
(eww-render): Simplify how to pass encoding.
(eww--parse-html-region, eww-display-document): New functions, extracted
from...
(eww-display-html): ... here.
(eww-document-base): New function.
(eww-readable): Toggle "readable" mode interactively, like with a minor
mode.  Consult 'eww-readable-adds-to-history'.
(eww-reload): Use 'eshell-display-document'.

* doc/misc/eww.texi (Basics): Describe the new behavior.

* etc/NEWS: Announce this change.
---
 doc/misc/eww.texi |   5 ++
 etc/NEWS          |  12 +++++
 lisp/net/eww.el   | 127 ++++++++++++++++++++++++++++++----------------
 3 files changed, 99 insertions(+), 45 deletions(-)

diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index d31fcf1802b..522034c874d 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -146,6 +146,11 @@ Basics
 which part of the document contains the ``readable'' text, and will
 only display this part.  This usually gets rid of menus and the like.
 
+  When called interactively, this command toggles the display of the
+readable parts.  With a positive prefix argument, this command always
+displays the readable parts, and with a zero or negative prefix, it
+always displays the full page.
+
 @findex eww-toggle-fonts
 @vindex shr-use-fonts
 @kindex F
diff --git a/etc/NEWS b/etc/NEWS
index b02712dd21c..b23754fb17f 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1054,6 +1054,18 @@ entries newer than the current page.  To change the behavior when
 browsing from "historical" pages, you can customize
 'eww-before-browse-history-function'.
 
++++
+*** 'eww-readable' now toggles display of the readable parts of a web page.
+When called interactively, 'eww-readable' toggles whether to display
+only the readable parts of a page or the full page.  With a positive
+prefix argument, always display the readable parts, and with a zero or
+negative prefix, always display the full page.
+
+---
+*** New option 'eww-readable-adds-to-history'.
+When non-nil (the default), calling 'eww-readable' adds a new entry to
+the EWW page history.
+
 ** go-ts-mode
 
 +++
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 54847bdf396..fd697846988 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,11 @@ eww-url-transformers
   :type '(repeat function)
   :version "29.1")
 
+(defcustom eww-readable-adds-to-history t
+  "If non-nil, calling `eww-readable' adds a new entry to the history."
+  :type 'boolean
+  :version "30.1")
+
 (defface eww-form-submit
   '((((type x w32 ns haiku pgtk android) (class color))	; Like default mode line
      :box (:line-width 2 :style released-button)
@@ -464,11 +469,11 @@ eww
 (defun eww-retrieve (url callback cbargs)
   (cond
    ((null eww-retrieve-command)
-    (url-retrieve url #'eww-render cbargs))
+    (url-retrieve url callback cbargs))
    ((eq eww-retrieve-command 'sync)
     (let ((data-buffer (url-retrieve-synchronously url)))
       (with-current-buffer data-buffer
-        (apply #'eww-render nil cbargs))))
+        (apply callback nil cbargs))))
    (t
     (let ((buffer (generate-new-buffer " *eww retrieve*"))
           (error-buffer (generate-new-buffer " *eww error*")))
@@ -673,9 +678,9 @@ eww-render
               (insert (format "<a href=%S>Direct link to the document</a>"
                               url))
               (goto-char (point-min))
-	      (eww-display-html charset url nil point buffer encode))
+              (eww-display-html (or encode charset) url nil point buffer))
 	     ((eww-html-p (car content-type))
-	      (eww-display-html charset url nil point buffer encode))
+              (eww-display-html (or encode charset) url nil point buffer))
 	     ((equal (car content-type) "application/pdf")
 	      (eww-display-pdf))
 	     ((string-match-p "\\`image/" (car content-type))
@@ -723,37 +728,43 @@ eww-detect-charset
 	      "[\t\n\r ]*<\\?xml[\t\n\r ]+[^>]*encoding=\"\\([^\"]+\\)")
 	     (match-string 1)))))
 
+(defun eww--parse-html-region (start end &optional encode)
+  "Parse the HTML between START and END, returning the DOM as an S-expression.
+Use ENCODE to decode the region; if nil, decode as UTF-8.
+
+This replaces the region with the preprocessed HTML."
+  (setq encode (or encode 'utf-8))
+  (with-restriction start end
+    (condition-case nil
+        (decode-coding-region (point-min) (point-max) encode)
+      (coding-system-error nil))
+    ;; Remove CRLF and replace NUL with &#0; before parsing.
+    (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
+      (replace-match (if (match-beginning 1) "" "&#0;") t t))
+    (eww--preprocess-html (point-min) (point-max))
+    (libxml-parse-html-region (point-min) (point-max))))
+
+(defsubst eww-document-base (url dom)
+  `(base ((href . ,url)) ,dom))
+
 (declare-function libxml-parse-html-region "xml.c"
 		  (start end &optional base-url discard-comments))
 
-(defun eww-display-html (charset url &optional document point buffer encode)
+(defun eww-display-document (document &optional point buffer)
   (unless (fboundp 'libxml-parse-html-region)
     (error "This function requires Emacs to be compiled with libxml2"))
+  (setq buffer (or buffer (current-buffer)))
   (unless (buffer-live-p buffer)
     (error "Buffer %s doesn't exist" buffer))
   ;; There should be a better way to abort loading images
   ;; asynchronously.
   (setq url-queue nil)
-  (let ((document
-	 (or document
-	     (list
-	      'base (list (cons 'href url))
-	      (progn
-		(setq encode (or encode charset 'utf-8))
-		(condition-case nil
-		    (decode-coding-region (point) (point-max) encode)
-		  (coding-system-error nil))
-		(save-excursion
-		  ;; Remove CRLF and replace NUL with &#0; before parsing.
-		  (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
-		    (replace-match (if (match-beginning 1) "" "&#0;") t t)))
-                (eww--preprocess-html (point) (point-max))
-		(libxml-parse-html-region (point) (point-max))))))
-	(source (and (null document)
-		     (buffer-substring (point) (point-max)))))
+  (let ((url (when (eq (car document) 'base)
+               (alist-get 'href (cadr document)))))
+    (unless url
+      (error "Document is missing base URL"))
     (with-current-buffer buffer
       (setq bidi-paragraph-direction nil)
-      (plist-put eww-data :source source)
       (plist-put eww-data :dom document)
       (let ((inhibit-read-only t)
 	    (inhibit-modification-hooks t)
@@ -794,6 +805,16 @@ eww-display-html
 	    (forward-line 1)))))
       (eww-size-text-inputs))))
 
+(defun eww-display-html (charset url &optional document point buffer)
+  (let ((source (buffer-substring (point) (point-max))))
+    (with-current-buffer buffer
+      (plist-put eww-data :source source)))
+  (eww-display-document
+   (or document
+       (eww-document-base
+        url (eww--parse-html-region (point) (point-max) charset)))
+   point buffer))
+
 (defun eww-handle-link (dom)
   (let* ((rel (dom-attr dom 'rel))
 	 (href (dom-attr dom 'href))
@@ -1055,30 +1076,47 @@ eww-toggle-paragraph-direction
                "automatic"
              bidi-paragraph-direction)))
 
-(defun eww-readable ()
-  "View the main \"readable\" parts of the current web page.
+(defun eww-readable (&optional arg)
+  "Toggle display of only the main \"readable\" parts of the current web page.
 This command uses heuristics to find the parts of the web page that
-contains the main textual portion, leaving out navigation menus and
-the like."
-  (interactive nil eww-mode)
+contain the main textual portion, leaving out navigation menus and the
+like.
+
+If called interactively, toggle the display of the readable parts.  If
+the prefix argument is positive, display the readable parts, and if it
+is zero or negative, display the full page.
+
+If called from Lisp, toggle the display of the readable parts if ARG is
+`toggle'.  Display the readable parts if ARG is nil, omitted, or is a
+positive number.  Display the full page if ARG is a negative number.
+
+When `eww-readable-adds-to-history' is non-nil, calling this function
+adds a new entry to `eww-history'."
+  (interactive (list (if current-prefix-arg
+                         (prefix-numeric-value current-prefix-arg)
+                       'toggle))
+               eww-mode)
   (let* ((old-data eww-data)
-	 (dom (with-temp-buffer
+	 (make-readable (cond
+                         ((eq arg 'toggle)
+                          (not (plist-get old-data :readable)))
+                         ((and (numberp arg) (< arg 1))
+                          nil)
+                         (t t)))
+         (dom (with-temp-buffer
 		(insert (plist-get old-data :source))
-		(condition-case nil
-		    (decode-coding-region (point-min) (point-max) 'utf-8)
-		  (coding-system-error nil))
-                (eww--preprocess-html (point-min) (point-max))
-		(libxml-parse-html-region (point-min) (point-max))))
+                (eww--parse-html-region (point-min) (point-max))))
          (base (plist-get eww-data :url)))
-    (eww-score-readability dom)
-    (eww-save-history)
-    (eww--before-browse)
-    (eww-display-html nil nil
-                      (list 'base (list (cons 'href base))
-                            (eww-highest-readability dom))
-		      nil (current-buffer))
-    (dolist (elem '(:source :url :title :next :previous :up :peer))
-      (plist-put eww-data elem (plist-get old-data elem)))
+    (when make-readable
+      (eww-score-readability dom)
+      (setq dom (eww-highest-readability dom)))
+    (when eww-readable-adds-to-history
+      (eww-save-history)
+      (eww--before-browse)
+      (dolist (elem '(:source :url :title :next :previous :up :peer))
+        (plist-put eww-data elem (plist-get old-data elem))))
+    (eww-display-document (eww-document-base base dom))
+    (plist-put eww-data :readable make-readable)
     (eww--after-page-change)))
 
 (defun eww-score-readability (node)
@@ -1398,8 +1436,7 @@ eww-reload
     (if local
 	(if (null (plist-get eww-data :dom))
 	    (error "No current HTML data")
-	  (eww-display-html 'utf-8 url (plist-get eww-data :dom)
-			    (point) (current-buffer)))
+	  (eww-display-document (plist-get eww-data :dom) (point)))
       (let ((parsed (url-generic-parse-url url)))
         (if (equal (url-type parsed) "file")
             ;; Use Tramp instead of url.el for files (since url.el
-- 
2.25.1


[-- Attachment #3: 0002-Add-eww-readable-urls.patch --]
[-- Type: text/plain, Size: 4771 bytes --]

From a6634a1d5d0cb440554eeaa5a014406e40ffeee9 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Mon, 18 Mar 2024 16:52:34 -0700
Subject: [PATCH 2/2] Add 'eww-readable-urls'

* lisp/net/eww.el (eww-readable-urls): New option.
(eww-default-readable-p): New function...
(eww-display-html): ... use it.

* doc/misc/eww.texi (Basics): Document 'eww-readable-urls'.

* etc/NEWS: Announce this change.
---
 doc/misc/eww.texi | 16 ++++++++++++++++
 etc/NEWS          |  6 ++++++
 lisp/net/eww.el   | 38 +++++++++++++++++++++++++++++++++-----
 3 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index 522034c874d..a08d6694892 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -151,6 +151,22 @@ Basics
 displays the readable parts, and with a zero or negative prefix, it
 always displays the full page.
 
+@vindex eww-readable-urls
+  If you want EWW to render a certain page in ``readable'' mode by
+default, you can add a regular expression matching its URL to
+@code{eww-readable-urls}.  Each entry can either be a regular expression
+as a string or a cons cell of the form @code{(@var{regexp}
+. @var{readability})}. If @var{readability} is non-@code{nil}, this
+behaves the same as the string form; otherwise, URLs matching
+@var{regexp} will never be displayed in readable mode by default.  For
+example, you can use this to make all pages default to readable mode,
+except for a few outliers:
+
+@example
+(setq eww-readable-urls '(("https://example\\.com/" . nil)
+                          ".*"))
+@end example
+
 @findex eww-toggle-fonts
 @vindex shr-use-fonts
 @kindex F
diff --git a/etc/NEWS b/etc/NEWS
index b23754fb17f..2af00f712a4 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1061,6 +1061,12 @@ only the readable parts of a page or the full page.  With a positive
 prefix argument, always display the readable parts, and with a zero or
 negative prefix, always display the full page.
 
++++
+*** New option 'eww-readable-urls'.
+This is a list of regular expressions matching the URLs where EWW should
+display only the readable parts by default.  For more details, see
+"(eww) Basics" in the EWW manual.
+
 ---
 *** New option 'eww-readable-adds-to-history'.
 When non-nil (the default), calling 'eww-readable' adds a new entry to
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index fd697846988..9505378e040 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,19 @@ eww-url-transformers
   :type '(repeat function)
   :version "29.1")
 
+(defcustom eww-readable-urls nil
+  "A list of regexps matching URLs to display in readable mode by default.
+Each element can be either a string regexp or a cons cell of the
+form (REGEXP . READABILITY).  If READABILITY is non-nil, this behaves
+the same as the string form; otherwise, URLs matching REGEXP will never
+be displayed in readable mode by default."
+  :type '(repeat (choice (string :tag "Readable URL")
+                         (cons :tag "URL and Readability"
+                               (string :tag "URL")
+                               (radio (const :tag "Readable" t)
+                                      (const :tag "Non-readable" nil)))))
+  :version "30.1")
+
 (defcustom eww-readable-adds-to-history t
   "If non-nil, calling `eww-readable' adds a new entry to the history."
   :type 'boolean
@@ -809,11 +822,13 @@ eww-display-html
   (let ((source (buffer-substring (point) (point-max))))
     (with-current-buffer buffer
       (plist-put eww-data :source source)))
-  (eww-display-document
-   (or document
-       (eww-document-base
-        url (eww--parse-html-region (point) (point-max) charset)))
-   point buffer))
+  (unless document
+    (let ((dom (eww--parse-html-region (point) (point-max) charset)))
+      (when (eww-default-readable-p url)
+        (eww-score-readability dom)
+        (setq dom (eww-highest-readability dom)))
+      (setq document (eww-document-base url dom))))
+  (eww-display-document document point buffer))
 
 (defun eww-handle-link (dom)
   (let* ((rel (dom-attr dom 'rel))
@@ -1159,6 +1174,19 @@ eww-highest-readability
 	  (setq result highest))))
     result))
 
+(defun eww-default-readable-p (url)
+  "Return non-nil if URL should be displayed in readable mode by default.
+This consults the entries in `eww-readable-urls' (which see)."
+  (catch 'found
+    (let (result)
+      (dolist (regexp eww-readable-urls)
+        (if (consp regexp)
+            (setq result (cdr regexp)
+                  regexp (car regexp))
+          (setq result t))
+        (when (string-match regexp url)
+          (throw 'found result))))))
+
 (defvar-keymap eww-mode-map
   "g" #'eww-reload             ;FIXME: revert-buffer-function instead!
   "G" #'eww
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-19  0:00           ` Jim Porter
@ 2024-03-21 10:51             ` Eli Zaretskii
  2024-03-22  5:46               ` Jim Porter
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-03-21 10:51 UTC (permalink / raw)
  To: Jim Porter; +Cc: 68254, yvv0

> Date: Mon, 18 Mar 2024 17:00:33 -0700
> Cc: 68254@debbugs.gnu.org, yvv0@proton.me
> From: Jim Porter <jporterbugs@gmail.com>
> 
> Thanks for looking. I've addressed all of your comments, and made some 
> more extensive changes to the implementation. I split up some of the 
> logic in the first patch so that it's easier to reuse without error, and 
> then added 'eww-readable-urls' in the second.

Thanks, I have some minor nits below.

> Because of how much I changed, I'd like to add some regression tests to 
> make sure everything still works correctly, but otherwise these patches 
> should be ready to go.

Yes, tests would be good.

> ++++
> +*** 'eww-readable' now toggles display of the readable parts of a web page.
> +When called interactively, 'eww-readable' toggles whether to display
> +only the readable parts of a page or the full page.  With a positive
> +prefix argument, always display the readable parts, and with a zero or
> +negative prefix, always display the full page.

You say "toggles", but then "display".  It is better to make the style
consistent.

> +(defun eww--parse-html-region (start end &optional encode)
> +  "Parse the HTML between START and END, returning the DOM as an S-expression.
> +Use ENCODE to decode the region; if nil, decode as UTF-8.

It is better to call the argument DECODE, not ENCODE.

> +@vindex eww-readable-urls
> +  If you want EWW to render a certain page in ``readable'' mode by
> +default, you can add a regular expression matching its URL to
> +@code{eww-readable-urls}.  Each entry can either be a regular expression
> +as a string or a cons cell of the form @code{(@var{regexp}
> +. @var{readability})}. If @var{readability} is non-@code{nil}, this
                        ^^
Please use @w to prevent breaking long expressions between two lines.
Also, please leave two spaces between sentences.

> +(defcustom eww-readable-urls nil
> +  "A list of regexps matching URLs to display in readable mode by default.
> +Each element can be either a string regexp or a cons cell of the
> +form (REGEXP . READABILITY).  If READABILITY is non-nil, this behaves
> +the same as the string form; otherwise, URLs matching REGEXP will never
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
What do you mean by "the same as the string form"? which string form?





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-21 10:51             ` Eli Zaretskii
@ 2024-03-22  5:46               ` Jim Porter
  2024-03-23  7:48                 ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Porter @ 2024-03-22  5:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 68254, yvv0

[-- Attachment #1: Type: text/plain, Size: 2174 bytes --]

On 3/21/2024 3:51 AM, Eli Zaretskii wrote:
> Yes, tests would be good.

I've now added tests. (Good thing too, since I found a minor bug while 
writing them!)

>> ++++
>> +*** 'eww-readable' now toggles display of the readable parts of a web page.
>> +When called interactively, 'eww-readable' toggles whether to display
>> +only the readable parts of a page or the full page.  With a positive
>> +prefix argument, always display the readable parts, and with a zero or
>> +negative prefix, always display the full page.
> 
> You say "toggles", but then "display".  It is better to make the style
> consistent.

Fixed.

>> +(defun eww--parse-html-region (start end &optional encode)
>> +  "Parse the HTML between START and END, returning the DOM as an S-expression.
>> +Use ENCODE to decode the region; if nil, decode as UTF-8.
> 
> It is better to call the argument DECODE, not ENCODE.

I changed this to CODING-SYSTEM, since that's what 
'decode-coding-region' calls the argument.

>> +@vindex eww-readable-urls
>> +  If you want EWW to render a certain page in ``readable'' mode by
>> +default, you can add a regular expression matching its URL to
>> +@code{eww-readable-urls}.  Each entry can either be a regular expression
>> +as a string or a cons cell of the form @code{(@var{regexp}
>> +. @var{readability})}. If @var{readability} is non-@code{nil}, this
>                          ^^
> Please use @w to prevent breaking long expressions between two lines.
> Also, please leave two spaces between sentences.

Thanks, both fixed. I never knew about @w.

>> +(defcustom eww-readable-urls nil
>> +  "A list of regexps matching URLs to display in readable mode by default.
>> +Each element can be either a string regexp or a cons cell of the
>> +form (REGEXP . READABILITY).  If READABILITY is non-nil, this behaves
>> +the same as the string form; otherwise, URLs matching REGEXP will never
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> What do you mean by "the same as the string form"? which string form?

I've tried to clarify this. By "string form", I meant the "string 
regexp" mentioned previously; in my new patch, I describe that as "a 
regular expression in string form".

[-- Attachment #2: 0001-Allow-toggling-readable-mode-in-EWW.patch --]
[-- Type: text/plain, Size: 14455 bytes --]

From 59d3e64d46434cf8ad13d3329ef73e78bd0b56b6 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sun, 17 Mar 2024 12:01:59 -0700
Subject: [PATCH 1/2] Allow toggling "readable" mode in EWW

Additionally, add an option to prevent adding a new history entry for
each call of 'eww-readable' (bug#68254).

* lisp/net/eww.el (eww-retrieve):

* lisp/net/eww.el (eww-readable-adds-to-history): New option.
(eww-retrieve): Make sure we call CALLBACK in all configurations.
(eww-render): Simplify how to pass encoding.
(eww--parse-html-region, eww-display-document): New functions, extracted
from...
(eww-display-html): ... here.
(eww-document-base): New function.
(eww-readable): Toggle "readable" mode interactively, like with a minor
mode.  Consult 'eww-readable-adds-to-history'.
(eww-reload): Use 'eshell-display-document'.

* test/lisp/net/eww-tests.el (eww-test--with-mock-retrieve): Fix indent.
(eww-test/display/html, eww-test/readable/toggle-display): New tests.

* doc/misc/eww.texi (Basics): Describe the new behavior.

* etc/NEWS: Announce this change.
---
 doc/misc/eww.texi          |   5 ++
 etc/NEWS                   |  12 ++++
 lisp/net/eww.el            | 127 ++++++++++++++++++++++++-------------
 test/lisp/net/eww-tests.el |  57 ++++++++++++++++-
 4 files changed, 155 insertions(+), 46 deletions(-)

diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index d31fcf1802b..522034c874d 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -146,6 +146,11 @@ Basics
 which part of the document contains the ``readable'' text, and will
 only display this part.  This usually gets rid of menus and the like.
 
+  When called interactively, this command toggles the display of the
+readable parts.  With a positive prefix argument, this command always
+displays the readable parts, and with a zero or negative prefix, it
+always displays the full page.
+
 @findex eww-toggle-fonts
 @vindex shr-use-fonts
 @kindex F
diff --git a/etc/NEWS b/etc/NEWS
index b02712dd21c..dd4c1ea2fac 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1054,6 +1054,18 @@ entries newer than the current page.  To change the behavior when
 browsing from "historical" pages, you can customize
 'eww-before-browse-history-function'.
 
++++
+*** 'eww-readable' now toggles display of the readable parts of a web page.
+When called interactively, 'eww-readable' toggles whether to display
+only the readable parts of a page or the full page.  With a positive
+prefix argument, it always displays the readable parts, and with a zero
+or negative prefix, it always displays the full page.
+
+---
+*** New option 'eww-readable-adds-to-history'.
+When non-nil (the default), calling 'eww-readable' adds a new entry to
+the EWW page history.
+
 ** go-ts-mode
 
 +++
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 54847bdf396..54b65d35164 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,11 @@ eww-url-transformers
   :type '(repeat function)
   :version "29.1")
 
+(defcustom eww-readable-adds-to-history t
+  "If non-nil, calling `eww-readable' adds a new entry to the history."
+  :type 'boolean
+  :version "30.1")
+
 (defface eww-form-submit
   '((((type x w32 ns haiku pgtk android) (class color))	; Like default mode line
      :box (:line-width 2 :style released-button)
@@ -464,11 +469,11 @@ eww
 (defun eww-retrieve (url callback cbargs)
   (cond
    ((null eww-retrieve-command)
-    (url-retrieve url #'eww-render cbargs))
+    (url-retrieve url callback cbargs))
    ((eq eww-retrieve-command 'sync)
     (let ((data-buffer (url-retrieve-synchronously url)))
       (with-current-buffer data-buffer
-        (apply #'eww-render nil cbargs))))
+        (apply callback nil cbargs))))
    (t
     (let ((buffer (generate-new-buffer " *eww retrieve*"))
           (error-buffer (generate-new-buffer " *eww error*")))
@@ -673,9 +678,9 @@ eww-render
               (insert (format "<a href=%S>Direct link to the document</a>"
                               url))
               (goto-char (point-min))
-	      (eww-display-html charset url nil point buffer encode))
+              (eww-display-html (or encode charset) url nil point buffer))
 	     ((eww-html-p (car content-type))
-	      (eww-display-html charset url nil point buffer encode))
+              (eww-display-html (or encode charset) url nil point buffer))
 	     ((equal (car content-type) "application/pdf")
 	      (eww-display-pdf))
 	     ((string-match-p "\\`image/" (car content-type))
@@ -726,34 +731,40 @@ eww-detect-charset
 (declare-function libxml-parse-html-region "xml.c"
 		  (start end &optional base-url discard-comments))
 
-(defun eww-display-html (charset url &optional document point buffer encode)
+(defun eww--parse-html-region (start end &optional coding-system)
+  "Parse the HTML between START and END, returning the DOM as an S-expression.
+Use CODING-SYSTEM to decode the region; if nil, decode as UTF-8.
+
+This replaces the region with the preprocessed HTML."
+  (setq coding-system (or coding-system 'utf-8))
+  (with-restriction start end
+    (condition-case nil
+        (decode-coding-region (point-min) (point-max) coding-system)
+      (coding-system-error nil))
+    ;; Remove CRLF and replace NUL with &#0; before parsing.
+    (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
+      (replace-match (if (match-beginning 1) "" "&#0;") t t))
+    (eww--preprocess-html (point-min) (point-max))
+    (libxml-parse-html-region (point-min) (point-max))))
+
+(defsubst eww-document-base (url dom)
+  `(base ((href . ,url)) ,dom))
+
+(defun eww-display-document (document &optional point buffer)
   (unless (fboundp 'libxml-parse-html-region)
     (error "This function requires Emacs to be compiled with libxml2"))
+  (setq buffer (or buffer (current-buffer)))
   (unless (buffer-live-p buffer)
     (error "Buffer %s doesn't exist" buffer))
   ;; There should be a better way to abort loading images
   ;; asynchronously.
   (setq url-queue nil)
-  (let ((document
-	 (or document
-	     (list
-	      'base (list (cons 'href url))
-	      (progn
-		(setq encode (or encode charset 'utf-8))
-		(condition-case nil
-		    (decode-coding-region (point) (point-max) encode)
-		  (coding-system-error nil))
-		(save-excursion
-		  ;; Remove CRLF and replace NUL with &#0; before parsing.
-		  (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
-		    (replace-match (if (match-beginning 1) "" "&#0;") t t)))
-                (eww--preprocess-html (point) (point-max))
-		(libxml-parse-html-region (point) (point-max))))))
-	(source (and (null document)
-		     (buffer-substring (point) (point-max)))))
+  (let ((url (when (eq (car document) 'base)
+               (alist-get 'href (cadr document)))))
+    (unless url
+      (error "Document is missing base URL"))
     (with-current-buffer buffer
       (setq bidi-paragraph-direction nil)
-      (plist-put eww-data :source source)
       (plist-put eww-data :dom document)
       (let ((inhibit-read-only t)
 	    (inhibit-modification-hooks t)
@@ -794,6 +805,16 @@ eww-display-html
 	    (forward-line 1)))))
       (eww-size-text-inputs))))
 
+(defun eww-display-html (charset url &optional document point buffer)
+  (let ((source (buffer-substring (point) (point-max))))
+    (with-current-buffer buffer
+      (plist-put eww-data :source source)))
+  (eww-display-document
+   (or document
+       (eww-document-base
+        url (eww--parse-html-region (point) (point-max) charset)))
+   point buffer))
+
 (defun eww-handle-link (dom)
   (let* ((rel (dom-attr dom 'rel))
 	 (href (dom-attr dom 'href))
@@ -1055,30 +1076,47 @@ eww-toggle-paragraph-direction
                "automatic"
              bidi-paragraph-direction)))
 
-(defun eww-readable ()
-  "View the main \"readable\" parts of the current web page.
+(defun eww-readable (&optional arg)
+  "Toggle display of only the main \"readable\" parts of the current web page.
 This command uses heuristics to find the parts of the web page that
-contains the main textual portion, leaving out navigation menus and
-the like."
-  (interactive nil eww-mode)
+contain the main textual portion, leaving out navigation menus and the
+like.
+
+If called interactively, toggle the display of the readable parts.  If
+the prefix argument is positive, display the readable parts, and if it
+is zero or negative, display the full page.
+
+If called from Lisp, toggle the display of the readable parts if ARG is
+`toggle'.  Display the readable parts if ARG is nil, omitted, or is a
+positive number.  Display the full page if ARG is a negative number.
+
+When `eww-readable-adds-to-history' is non-nil, calling this function
+adds a new entry to `eww-history'."
+  (interactive (list (if current-prefix-arg
+                         (prefix-numeric-value current-prefix-arg)
+                       'toggle))
+               eww-mode)
   (let* ((old-data eww-data)
-	 (dom (with-temp-buffer
+	 (make-readable (cond
+                         ((eq arg 'toggle)
+                          (not (plist-get old-data :readable)))
+                         ((and (numberp arg) (< arg 1))
+                          nil)
+                         (t t)))
+         (dom (with-temp-buffer
 		(insert (plist-get old-data :source))
-		(condition-case nil
-		    (decode-coding-region (point-min) (point-max) 'utf-8)
-		  (coding-system-error nil))
-                (eww--preprocess-html (point-min) (point-max))
-		(libxml-parse-html-region (point-min) (point-max))))
+                (eww--parse-html-region (point-min) (point-max))))
          (base (plist-get eww-data :url)))
-    (eww-score-readability dom)
-    (eww-save-history)
-    (eww--before-browse)
-    (eww-display-html nil nil
-                      (list 'base (list (cons 'href base))
-                            (eww-highest-readability dom))
-		      nil (current-buffer))
-    (dolist (elem '(:source :url :title :next :previous :up :peer))
-      (plist-put eww-data elem (plist-get old-data elem)))
+    (when make-readable
+      (eww-score-readability dom)
+      (setq dom (eww-highest-readability dom)))
+    (when eww-readable-adds-to-history
+      (eww-save-history)
+      (eww--before-browse)
+      (dolist (elem '(:source :url :title :next :previous :up :peer))
+        (plist-put eww-data elem (plist-get old-data elem))))
+    (eww-display-document (eww-document-base base dom))
+    (plist-put eww-data :readable make-readable)
     (eww--after-page-change)))
 
 (defun eww-score-readability (node)
@@ -1398,8 +1436,7 @@ eww-reload
     (if local
 	(if (null (plist-get eww-data :dom))
 	    (error "No current HTML data")
-	  (eww-display-html 'utf-8 url (plist-get eww-data :dom)
-			    (point) (current-buffer)))
+	  (eww-display-document (plist-get eww-data :dom) (point)))
       (let ((parsed (url-generic-parse-url url)))
         (if (equal (url-type parsed) "file")
             ;; Use Tramp instead of url.el for files (since url.el
diff --git a/test/lisp/net/eww-tests.el b/test/lisp/net/eww-tests.el
index bd00893d503..a09e0a4f279 100644
--- a/test/lisp/net/eww-tests.el
+++ b/test/lisp/net/eww-tests.el
@@ -33,7 +33,7 @@ eww-test--with-mock-retrieve
   "Evaluate BODY with a mock implementation of `eww-retrieve'.
 This avoids network requests during our tests.  Additionally, prepare a
 temporary EWW buffer for our tests."
-  (declare (indent 1))
+  (declare (indent 0))
     `(cl-letf (((symbol-function 'eww-retrieve)
                 (lambda (url callback args)
                   (with-temp-buffer
@@ -48,6 +48,24 @@ eww-test--history-urls
 
 ;;; Tests:
 
+(ert-deftest eww-test/display/html ()
+  "Test displaying a simple HTML page."
+  (eww-test--with-mock-retrieve
+    (let ((eww-test--response-function
+           (lambda (url)
+             (concat "Content-Type: text/html\n\n"
+                     (format "<html><body><h1>Hello</h1>%s</body></html>"
+                             url)))))
+      (eww "example.invalid")
+      ;; Check that the buffer contains the rendered HTML.
+      (should (equal (buffer-string) "Hello\n\n\nhttp://example.invalid/\n"))
+      (should (equal (get-text-property (point-min) 'face)
+                     '(shr-text shr-h1)))
+      ;; Check that the DOM includes the `base'.
+      (should (equal (pcase (plist-get eww-data :dom)
+                       (`(base ((href . ,url)) ,_) url))
+                     "http://example.invalid/")))))
+
 (ert-deftest eww-test/history/new-page ()
   "Test that when visiting a new page, the previous one goes into the history."
   (eww-test--with-mock-retrieve
@@ -176,5 +194,42 @@ eww-test/history/before-navigate/clone-previous
                        "http://one.invalid/")))
       (should (= eww-history-position 0)))))
 
+(ert-deftest eww-test/readable/toggle-display ()
+  "Test toggling the display of the \"readable\" parts of a web page."
+  (eww-test--with-mock-retrieve
+    (let* ((shr-width most-positive-fixnum)
+           (shr-use-fonts nil)
+           (words (string-join
+                   (make-list
+                    20 "All work and no play makes Jack a dull boy.")
+                   " "))
+           (eww-test--response-function
+            (lambda (_url)
+              (concat "Content-Type: text/html\n\n"
+                      "<html><body>"
+                      "<a>This is an uninteresting sentence.</a>"
+                      "<div>"
+                      words
+                      "</div>"
+                      "</body></html>"))))
+      (eww "example.invalid")
+      ;; Make sure EWW renders the whole document.
+      (should-not (plist-get eww-data :readable))
+      (should (string-prefix-p
+               "This is an uninteresting sentence."
+               (buffer-substring-no-properties (point-min) (point-max))))
+      (eww-readable 'toggle)
+      ;; Now, EWW should render just the "readable" parts.
+      (should (plist-get eww-data :readable))
+      (should (string-match-p
+               (concat "\\`" (regexp-quote words) "\n*\\'")
+               (buffer-substring-no-properties (point-min) (point-max))))
+      (eww-readable 'toggle)
+      ;; Finally, EWW should render the whole document again.
+      (should-not (plist-get eww-data :readable))
+      (should (string-prefix-p
+               "This is an uninteresting sentence."
+               (buffer-substring-no-properties (point-min) (point-max)))))))
+
 (provide 'eww-tests)
 ;; eww-tests.el ends here
-- 
2.25.1


[-- Attachment #3: 0002-Add-eww-readable-urls.patch --]
[-- Type: text/plain, Size: 6013 bytes --]

From e12c94d8590f25e9bc9f64b3e269bc4ab10996dd Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Mon, 18 Mar 2024 16:52:34 -0700
Subject: [PATCH 2/2] Add 'eww-readable-urls'

* lisp/net/eww.el (eww-readable-urls): New option.
(eww-default-readable-p): New function...
(eww-display-html): ... use it.

* test/lisp/net/eww-tests.el (eww-test/readable/default-readable): New
test.

* doc/misc/eww.texi (Basics): Document 'eww-readable-urls'.

* etc/NEWS: Announce this change.
---
 doc/misc/eww.texi          | 16 +++++++++++++++
 etc/NEWS                   |  6 ++++++
 lisp/net/eww.el            | 41 +++++++++++++++++++++++++++++++++-----
 test/lisp/net/eww-tests.el | 12 +++++++++++
 4 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index 522034c874d..eec6b3c3299 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -151,6 +151,22 @@ Basics
 displays the readable parts, and with a zero or negative prefix, it
 always displays the full page.
 
+@vindex eww-readable-urls
+  If you want EWW to render a certain page in ``readable'' mode by
+default, you can add a regular expression matching its URL to
+@code{eww-readable-urls}.  Each entry can either be a regular expression
+in string form or a cons cell of the form
+@w{@code{(@var{regexp} . @var{readability})}}.  If @var{readability} is
+non-@code{nil}, this behaves the same as the string form; otherwise,
+URLs matching @var{regexp} will never be displayed in readable mode by
+default.  For example, you can use this to make all pages default to
+readable mode, except for a few outliers:
+
+@example
+(setq eww-readable-urls '(("https://example\\.com/" . nil)
+                          ".*"))
+@end example
+
 @findex eww-toggle-fonts
 @vindex shr-use-fonts
 @kindex F
diff --git a/etc/NEWS b/etc/NEWS
index dd4c1ea2fac..3704888dd9f 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1061,6 +1061,12 @@ only the readable parts of a page or the full page.  With a positive
 prefix argument, it always displays the readable parts, and with a zero
 or negative prefix, it always displays the full page.
 
++++
+*** New option 'eww-readable-urls'.
+This is a list of regular expressions matching the URLs where EWW should
+display only the readable parts by default.  For more details, see
+"(eww) Basics" in the EWW manual.
+
 ---
 *** New option 'eww-readable-adds-to-history'.
 When non-nil (the default), calling 'eww-readable' adds a new entry to
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 54b65d35164..a04bae7b931 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,20 @@ eww-url-transformers
   :type '(repeat function)
   :version "29.1")
 
+(defcustom eww-readable-urls nil
+  "A list of regexps matching URLs to display in readable mode by default.
+Each element can be one of the following forms: a regular expression in
+string form or a cons cell of the form (REGEXP . READABILITY).  If
+READABILITY is non-nil, this behaves the same as the string form;
+otherwise, URLs matching REGEXP will never be displayed in readable mode
+by default."
+  :type '(repeat (choice (string :tag "Readable URL")
+                         (cons :tag "URL and Readability"
+                               (string :tag "URL")
+                               (radio (const :tag "Readable" t)
+                                      (const :tag "Non-readable" nil)))))
+  :version "30.1")
+
 (defcustom eww-readable-adds-to-history t
   "If non-nil, calling `eww-readable' adds a new entry to the history."
   :type 'boolean
@@ -809,11 +823,15 @@ eww-display-html
   (let ((source (buffer-substring (point) (point-max))))
     (with-current-buffer buffer
       (plist-put eww-data :source source)))
-  (eww-display-document
-   (or document
-       (eww-document-base
-        url (eww--parse-html-region (point) (point-max) charset)))
-   point buffer))
+  (unless document
+    (let ((dom (eww--parse-html-region (point) (point-max) charset)))
+      (when (eww-default-readable-p url)
+        (eww-score-readability dom)
+        (setq dom (eww-highest-readability dom))
+        (with-current-buffer buffer
+          (plist-put eww-data :readable t)))
+      (setq document (eww-document-base url dom))))
+  (eww-display-document document point buffer))
 
 (defun eww-handle-link (dom)
   (let* ((rel (dom-attr dom 'rel))
@@ -1159,6 +1177,19 @@ eww-highest-readability
 	  (setq result highest))))
     result))
 
+(defun eww-default-readable-p (url)
+  "Return non-nil if URL should be displayed in readable mode by default.
+This consults the entries in `eww-readable-urls' (which see)."
+  (catch 'found
+    (let (result)
+      (dolist (regexp eww-readable-urls)
+        (if (consp regexp)
+            (setq result (cdr regexp)
+                  regexp (car regexp))
+          (setq result t))
+        (when (string-match regexp url)
+          (throw 'found result))))))
+
 (defvar-keymap eww-mode-map
   "g" #'eww-reload             ;FIXME: revert-buffer-function instead!
   "G" #'eww
diff --git a/test/lisp/net/eww-tests.el b/test/lisp/net/eww-tests.el
index a09e0a4f279..b83435e0bd9 100644
--- a/test/lisp/net/eww-tests.el
+++ b/test/lisp/net/eww-tests.el
@@ -231,5 +231,17 @@ eww-test/readable/toggle-display
                "This is an uninteresting sentence."
                (buffer-substring-no-properties (point-min) (point-max)))))))
 
+(ert-deftest eww-test/readable/default-readable ()
+  "Test that EWW displays readable parts of pages by default when applicable."
+    (eww-test--with-mock-retrieve
+    (let* ((eww-test--response-function
+            (lambda (_url)
+              (concat "Content-Type: text/html\n\n"
+                      "<html><body>Hello there</body></html>")))
+           (eww-readable-urls '("://example\\.invalid/")))
+      (eww "example.invalid")
+      ;; Make sure EWW uses "readable" mode.
+      (should (plist-get eww-data :readable)))))
+
 (provide 'eww-tests)
 ;; eww-tests.el ends here
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-22  5:46               ` Jim Porter
@ 2024-03-23  7:48                 ` Eli Zaretskii
  2024-03-23 17:26                   ` Jim Porter
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-03-23  7:48 UTC (permalink / raw)
  To: Jim Porter; +Cc: 68254, yvv0

> Date: Thu, 21 Mar 2024 22:46:50 -0700
> Cc: 68254@debbugs.gnu.org, yvv0@proton.me
> From: Jim Porter <jporterbugs@gmail.com>
> 
> I've tried to clarify this. By "string form", I meant the "string 
> regexp" mentioned previously; in my new patch, I describe that as "a 
> regular expression in string form".

Thanks.  The updated changeset LGTM, with a single minor comment:

> +(defcustom eww-readable-urls nil
> +  "A list of regexps matching URLs to display in readable mode by default.
> +Each element can be one of the following forms: a regular expression in
> +string form or a cons cell of the form (REGEXP . READABILITY).  If
> +READABILITY is non-nil, this behaves the same as the string form;
> +otherwise, URLs matching REGEXP will never be displayed in readable mode
> +by default."

The doc string of this user option should explain what is the
"readable mode", or at least have a hyper-link to eww-readable (which
does explain that).  Users who read this doc string should understand
what that mode does, and (unlike in the manual) there's no prior
context to rely upon.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#68254: EWW ‘readable’ by default
  2024-03-23  7:48                 ` Eli Zaretskii
@ 2024-03-23 17:26                   ` Jim Porter
  0 siblings, 0 replies; 14+ messages in thread
From: Jim Porter @ 2024-03-23 17:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 68254-done, yvv0

On 3/23/2024 12:48 AM, Eli Zaretskii wrote:
> The doc string of this user option should explain what is the
> "readable mode", or at least have a hyper-link to eww-readable (which
> does explain that).  Users who read this doc string should understand
> what that mode does, and (unlike in the manual) there's no prior
> context to rely upon.

Good point. I added the following to the docstring: "EWW will display 
matching URLs using `eww-readable' (which see)." I also merged this to 
the master branch as 4b0f5cdb01f, so closing this bug.





^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-23 17:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-05  7:35 bug#68254: EWW ‘readable’ by default Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-05 11:52 ` Eli Zaretskii
     [not found]   ` <poNSnv1DQ7L71-FirbCx9nuQ8gqLlPGTIjDYk2pKo2_H3BPuJArYQ2ziQ4pyADSxHCY5cU40D6MUzRqBAZE3pEcFmnzFPD49xunpLyh1UqI=@proton.me>
2024-01-05 13:35     ` Eli Zaretskii
2024-03-17 19:24       ` Jim Porter
2024-03-18  4:32         ` Adam Porter
2024-03-18  5:17           ` Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-18  5:44             ` Jim Porter
2024-03-18  5:18           ` Jim Porter
2024-03-18 12:37         ` Eli Zaretskii
2024-03-19  0:00           ` Jim Porter
2024-03-21 10:51             ` Eli Zaretskii
2024-03-22  5:46               ` Jim Porter
2024-03-23  7:48                 ` Eli Zaretskii
2024-03-23 17:26                   ` Jim Porter

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).