unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Jim Porter <jporterbugs@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 68254@debbugs.gnu.org, yvv0@proton.me
Subject: bug#68254: EWW ‘readable’ by default
Date: Mon, 18 Mar 2024 17:00:33 -0700	[thread overview]
Message-ID: <c917fe41-45d8-8ad9-0f61-75ff62e73f50@gmail.com> (raw)
In-Reply-To: <86ttl34swt.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

On 3/18/2024 5:37 AM, Eli Zaretskii wrote:
>> Date: Sun, 17 Mar 2024 12:24:26 -0700
>> Cc: 68254@debbugs.gnu.org
>> From: Jim Porter <jporterbugs@gmail.com>
>>
>> Here's a patch for this. It turns 'eww-readable' into a toggle (using
>> the same semantics as minor modes), and also adds an option to prevent
>> adding a new history entry for each call.
> 
> Thanks.

Thanks for looking. I've addressed all of your comments, and made some 
more extensive changes to the implementation. I split up some of the 
logic in the first patch so that it's easier to reuse without error, and 
then added 'eww-readable-urls' in the second.

Because of how much I changed, I'd like to add some regression tests to 
make sure everything still works correctly, but otherwise these patches 
should be ready to go.

[-- Attachment #2: 0001-Allow-toggling-readable-mode-in-EWW.patch --]
[-- Type: text/plain, Size: 10793 bytes --]

From 4839990148e2a58cf44c04547994611392ff1955 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sun, 17 Mar 2024 12:01:59 -0700
Subject: [PATCH 1/2] Allow toggling "readable" mode in EWW

Additionally, add an option to prevent adding a new history entry for
each call of 'eww-readable' (bug#68254).

* lisp/net/eww.el (eww-retrieve):

* lisp/net/eww.el (eww-readable-adds-to-history): New option.
(eww-retrieve): Make sure we call CALLBACK in all configurations.
(eww-render): Simplify how to pass encoding.
(eww--parse-html-region, eww-display-document): New functions, extracted
from...
(eww-display-html): ... here.
(eww-document-base): New function.
(eww-readable): Toggle "readable" mode interactively, like with a minor
mode.  Consult 'eww-readable-adds-to-history'.
(eww-reload): Use 'eshell-display-document'.

* doc/misc/eww.texi (Basics): Describe the new behavior.

* etc/NEWS: Announce this change.
---
 doc/misc/eww.texi |   5 ++
 etc/NEWS          |  12 +++++
 lisp/net/eww.el   | 127 ++++++++++++++++++++++++++++++----------------
 3 files changed, 99 insertions(+), 45 deletions(-)

diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index d31fcf1802b..522034c874d 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -146,6 +146,11 @@ Basics
 which part of the document contains the ``readable'' text, and will
 only display this part.  This usually gets rid of menus and the like.
 
+  When called interactively, this command toggles the display of the
+readable parts.  With a positive prefix argument, this command always
+displays the readable parts, and with a zero or negative prefix, it
+always displays the full page.
+
 @findex eww-toggle-fonts
 @vindex shr-use-fonts
 @kindex F
diff --git a/etc/NEWS b/etc/NEWS
index b02712dd21c..b23754fb17f 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1054,6 +1054,18 @@ entries newer than the current page.  To change the behavior when
 browsing from "historical" pages, you can customize
 'eww-before-browse-history-function'.
 
++++
+*** 'eww-readable' now toggles display of the readable parts of a web page.
+When called interactively, 'eww-readable' toggles whether to display
+only the readable parts of a page or the full page.  With a positive
+prefix argument, always display the readable parts, and with a zero or
+negative prefix, always display the full page.
+
+---
+*** New option 'eww-readable-adds-to-history'.
+When non-nil (the default), calling 'eww-readable' adds a new entry to
+the EWW page history.
+
 ** go-ts-mode
 
 +++
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 54847bdf396..fd697846988 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,11 @@ eww-url-transformers
   :type '(repeat function)
   :version "29.1")
 
+(defcustom eww-readable-adds-to-history t
+  "If non-nil, calling `eww-readable' adds a new entry to the history."
+  :type 'boolean
+  :version "30.1")
+
 (defface eww-form-submit
   '((((type x w32 ns haiku pgtk android) (class color))	; Like default mode line
      :box (:line-width 2 :style released-button)
@@ -464,11 +469,11 @@ eww
 (defun eww-retrieve (url callback cbargs)
   (cond
    ((null eww-retrieve-command)
-    (url-retrieve url #'eww-render cbargs))
+    (url-retrieve url callback cbargs))
    ((eq eww-retrieve-command 'sync)
     (let ((data-buffer (url-retrieve-synchronously url)))
       (with-current-buffer data-buffer
-        (apply #'eww-render nil cbargs))))
+        (apply callback nil cbargs))))
    (t
     (let ((buffer (generate-new-buffer " *eww retrieve*"))
           (error-buffer (generate-new-buffer " *eww error*")))
@@ -673,9 +678,9 @@ eww-render
               (insert (format "<a href=%S>Direct link to the document</a>"
                               url))
               (goto-char (point-min))
-	      (eww-display-html charset url nil point buffer encode))
+              (eww-display-html (or encode charset) url nil point buffer))
 	     ((eww-html-p (car content-type))
-	      (eww-display-html charset url nil point buffer encode))
+              (eww-display-html (or encode charset) url nil point buffer))
 	     ((equal (car content-type) "application/pdf")
 	      (eww-display-pdf))
 	     ((string-match-p "\\`image/" (car content-type))
@@ -723,37 +728,43 @@ eww-detect-charset
 	      "[\t\n\r ]*<\\?xml[\t\n\r ]+[^>]*encoding=\"\\([^\"]+\\)")
 	     (match-string 1)))))
 
+(defun eww--parse-html-region (start end &optional encode)
+  "Parse the HTML between START and END, returning the DOM as an S-expression.
+Use ENCODE to decode the region; if nil, decode as UTF-8.
+
+This replaces the region with the preprocessed HTML."
+  (setq encode (or encode 'utf-8))
+  (with-restriction start end
+    (condition-case nil
+        (decode-coding-region (point-min) (point-max) encode)
+      (coding-system-error nil))
+    ;; Remove CRLF and replace NUL with &#0; before parsing.
+    (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
+      (replace-match (if (match-beginning 1) "" "&#0;") t t))
+    (eww--preprocess-html (point-min) (point-max))
+    (libxml-parse-html-region (point-min) (point-max))))
+
+(defsubst eww-document-base (url dom)
+  `(base ((href . ,url)) ,dom))
+
 (declare-function libxml-parse-html-region "xml.c"
 		  (start end &optional base-url discard-comments))
 
-(defun eww-display-html (charset url &optional document point buffer encode)
+(defun eww-display-document (document &optional point buffer)
   (unless (fboundp 'libxml-parse-html-region)
     (error "This function requires Emacs to be compiled with libxml2"))
+  (setq buffer (or buffer (current-buffer)))
   (unless (buffer-live-p buffer)
     (error "Buffer %s doesn't exist" buffer))
   ;; There should be a better way to abort loading images
   ;; asynchronously.
   (setq url-queue nil)
-  (let ((document
-	 (or document
-	     (list
-	      'base (list (cons 'href url))
-	      (progn
-		(setq encode (or encode charset 'utf-8))
-		(condition-case nil
-		    (decode-coding-region (point) (point-max) encode)
-		  (coding-system-error nil))
-		(save-excursion
-		  ;; Remove CRLF and replace NUL with &#0; before parsing.
-		  (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
-		    (replace-match (if (match-beginning 1) "" "&#0;") t t)))
-                (eww--preprocess-html (point) (point-max))
-		(libxml-parse-html-region (point) (point-max))))))
-	(source (and (null document)
-		     (buffer-substring (point) (point-max)))))
+  (let ((url (when (eq (car document) 'base)
+               (alist-get 'href (cadr document)))))
+    (unless url
+      (error "Document is missing base URL"))
     (with-current-buffer buffer
       (setq bidi-paragraph-direction nil)
-      (plist-put eww-data :source source)
       (plist-put eww-data :dom document)
       (let ((inhibit-read-only t)
 	    (inhibit-modification-hooks t)
@@ -794,6 +805,16 @@ eww-display-html
 	    (forward-line 1)))))
       (eww-size-text-inputs))))
 
+(defun eww-display-html (charset url &optional document point buffer)
+  (let ((source (buffer-substring (point) (point-max))))
+    (with-current-buffer buffer
+      (plist-put eww-data :source source)))
+  (eww-display-document
+   (or document
+       (eww-document-base
+        url (eww--parse-html-region (point) (point-max) charset)))
+   point buffer))
+
 (defun eww-handle-link (dom)
   (let* ((rel (dom-attr dom 'rel))
 	 (href (dom-attr dom 'href))
@@ -1055,30 +1076,47 @@ eww-toggle-paragraph-direction
                "automatic"
              bidi-paragraph-direction)))
 
-(defun eww-readable ()
-  "View the main \"readable\" parts of the current web page.
+(defun eww-readable (&optional arg)
+  "Toggle display of only the main \"readable\" parts of the current web page.
 This command uses heuristics to find the parts of the web page that
-contains the main textual portion, leaving out navigation menus and
-the like."
-  (interactive nil eww-mode)
+contain the main textual portion, leaving out navigation menus and the
+like.
+
+If called interactively, toggle the display of the readable parts.  If
+the prefix argument is positive, display the readable parts, and if it
+is zero or negative, display the full page.
+
+If called from Lisp, toggle the display of the readable parts if ARG is
+`toggle'.  Display the readable parts if ARG is nil, omitted, or is a
+positive number.  Display the full page if ARG is a negative number.
+
+When `eww-readable-adds-to-history' is non-nil, calling this function
+adds a new entry to `eww-history'."
+  (interactive (list (if current-prefix-arg
+                         (prefix-numeric-value current-prefix-arg)
+                       'toggle))
+               eww-mode)
   (let* ((old-data eww-data)
-	 (dom (with-temp-buffer
+	 (make-readable (cond
+                         ((eq arg 'toggle)
+                          (not (plist-get old-data :readable)))
+                         ((and (numberp arg) (< arg 1))
+                          nil)
+                         (t t)))
+         (dom (with-temp-buffer
 		(insert (plist-get old-data :source))
-		(condition-case nil
-		    (decode-coding-region (point-min) (point-max) 'utf-8)
-		  (coding-system-error nil))
-                (eww--preprocess-html (point-min) (point-max))
-		(libxml-parse-html-region (point-min) (point-max))))
+                (eww--parse-html-region (point-min) (point-max))))
          (base (plist-get eww-data :url)))
-    (eww-score-readability dom)
-    (eww-save-history)
-    (eww--before-browse)
-    (eww-display-html nil nil
-                      (list 'base (list (cons 'href base))
-                            (eww-highest-readability dom))
-		      nil (current-buffer))
-    (dolist (elem '(:source :url :title :next :previous :up :peer))
-      (plist-put eww-data elem (plist-get old-data elem)))
+    (when make-readable
+      (eww-score-readability dom)
+      (setq dom (eww-highest-readability dom)))
+    (when eww-readable-adds-to-history
+      (eww-save-history)
+      (eww--before-browse)
+      (dolist (elem '(:source :url :title :next :previous :up :peer))
+        (plist-put eww-data elem (plist-get old-data elem))))
+    (eww-display-document (eww-document-base base dom))
+    (plist-put eww-data :readable make-readable)
     (eww--after-page-change)))
 
 (defun eww-score-readability (node)
@@ -1398,8 +1436,7 @@ eww-reload
     (if local
 	(if (null (plist-get eww-data :dom))
 	    (error "No current HTML data")
-	  (eww-display-html 'utf-8 url (plist-get eww-data :dom)
-			    (point) (current-buffer)))
+	  (eww-display-document (plist-get eww-data :dom) (point)))
       (let ((parsed (url-generic-parse-url url)))
         (if (equal (url-type parsed) "file")
             ;; Use Tramp instead of url.el for files (since url.el
-- 
2.25.1


[-- Attachment #3: 0002-Add-eww-readable-urls.patch --]
[-- Type: text/plain, Size: 4771 bytes --]

From a6634a1d5d0cb440554eeaa5a014406e40ffeee9 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Mon, 18 Mar 2024 16:52:34 -0700
Subject: [PATCH 2/2] Add 'eww-readable-urls'

* lisp/net/eww.el (eww-readable-urls): New option.
(eww-default-readable-p): New function...
(eww-display-html): ... use it.

* doc/misc/eww.texi (Basics): Document 'eww-readable-urls'.

* etc/NEWS: Announce this change.
---
 doc/misc/eww.texi | 16 ++++++++++++++++
 etc/NEWS          |  6 ++++++
 lisp/net/eww.el   | 38 +++++++++++++++++++++++++++++++++-----
 3 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index 522034c874d..a08d6694892 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -151,6 +151,22 @@ Basics
 displays the readable parts, and with a zero or negative prefix, it
 always displays the full page.
 
+@vindex eww-readable-urls
+  If you want EWW to render a certain page in ``readable'' mode by
+default, you can add a regular expression matching its URL to
+@code{eww-readable-urls}.  Each entry can either be a regular expression
+as a string or a cons cell of the form @code{(@var{regexp}
+. @var{readability})}. If @var{readability} is non-@code{nil}, this
+behaves the same as the string form; otherwise, URLs matching
+@var{regexp} will never be displayed in readable mode by default.  For
+example, you can use this to make all pages default to readable mode,
+except for a few outliers:
+
+@example
+(setq eww-readable-urls '(("https://example\\.com/" . nil)
+                          ".*"))
+@end example
+
 @findex eww-toggle-fonts
 @vindex shr-use-fonts
 @kindex F
diff --git a/etc/NEWS b/etc/NEWS
index b23754fb17f..2af00f712a4 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1061,6 +1061,12 @@ only the readable parts of a page or the full page.  With a positive
 prefix argument, always display the readable parts, and with a zero or
 negative prefix, always display the full page.
 
++++
+*** New option 'eww-readable-urls'.
+This is a list of regular expressions matching the URLs where EWW should
+display only the readable parts by default.  For more details, see
+"(eww) Basics" in the EWW manual.
+
 ---
 *** New option 'eww-readable-adds-to-history'.
 When non-nil (the default), calling 'eww-readable' adds a new entry to
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index fd697846988..9505378e040 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,19 @@ eww-url-transformers
   :type '(repeat function)
   :version "29.1")
 
+(defcustom eww-readable-urls nil
+  "A list of regexps matching URLs to display in readable mode by default.
+Each element can be either a string regexp or a cons cell of the
+form (REGEXP . READABILITY).  If READABILITY is non-nil, this behaves
+the same as the string form; otherwise, URLs matching REGEXP will never
+be displayed in readable mode by default."
+  :type '(repeat (choice (string :tag "Readable URL")
+                         (cons :tag "URL and Readability"
+                               (string :tag "URL")
+                               (radio (const :tag "Readable" t)
+                                      (const :tag "Non-readable" nil)))))
+  :version "30.1")
+
 (defcustom eww-readable-adds-to-history t
   "If non-nil, calling `eww-readable' adds a new entry to the history."
   :type 'boolean
@@ -809,11 +822,13 @@ eww-display-html
   (let ((source (buffer-substring (point) (point-max))))
     (with-current-buffer buffer
       (plist-put eww-data :source source)))
-  (eww-display-document
-   (or document
-       (eww-document-base
-        url (eww--parse-html-region (point) (point-max) charset)))
-   point buffer))
+  (unless document
+    (let ((dom (eww--parse-html-region (point) (point-max) charset)))
+      (when (eww-default-readable-p url)
+        (eww-score-readability dom)
+        (setq dom (eww-highest-readability dom)))
+      (setq document (eww-document-base url dom))))
+  (eww-display-document document point buffer))
 
 (defun eww-handle-link (dom)
   (let* ((rel (dom-attr dom 'rel))
@@ -1159,6 +1174,19 @@ eww-highest-readability
 	  (setq result highest))))
     result))
 
+(defun eww-default-readable-p (url)
+  "Return non-nil if URL should be displayed in readable mode by default.
+This consults the entries in `eww-readable-urls' (which see)."
+  (catch 'found
+    (let (result)
+      (dolist (regexp eww-readable-urls)
+        (if (consp regexp)
+            (setq result (cdr regexp)
+                  regexp (car regexp))
+          (setq result t))
+        (when (string-match regexp url)
+          (throw 'found result))))))
+
 (defvar-keymap eww-mode-map
   "g" #'eww-reload             ;FIXME: revert-buffer-function instead!
   "G" #'eww
-- 
2.25.1


  reply	other threads:[~2024-03-19  0:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-05  7:35 bug#68254: EWW ‘readable’ by default Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-05 11:52 ` Eli Zaretskii
     [not found]   ` <poNSnv1DQ7L71-FirbCx9nuQ8gqLlPGTIjDYk2pKo2_H3BPuJArYQ2ziQ4pyADSxHCY5cU40D6MUzRqBAZE3pEcFmnzFPD49xunpLyh1UqI=@proton.me>
2024-01-05 13:35     ` Eli Zaretskii
2024-03-17 19:24       ` Jim Porter
2024-03-18  4:32         ` Adam Porter
2024-03-18  5:17           ` Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-18  5:44             ` Jim Porter
2024-03-18  5:18           ` Jim Porter
2024-03-18 12:37         ` Eli Zaretskii
2024-03-19  0:00           ` Jim Porter [this message]
2024-03-21 10:51             ` Eli Zaretskii
2024-03-22  5:46               ` Jim Porter
2024-03-23  7:48                 ` Eli Zaretskii
2024-03-23 17:26                   ` Jim Porter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c917fe41-45d8-8ad9-0f61-75ff62e73f50@gmail.com \
    --to=jporterbugs@gmail.com \
    --cc=68254@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=yvv0@proton.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).