From: Jim Porter <jporterbugs@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 68254@debbugs.gnu.org, yvv0@proton.me
Subject: bug#68254: EWW ‘readable’ by default
Date: Mon, 18 Mar 2024 17:00:33 -0700 [thread overview]
Message-ID: <c917fe41-45d8-8ad9-0f61-75ff62e73f50@gmail.com> (raw)
In-Reply-To: <86ttl34swt.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 797 bytes --]
On 3/18/2024 5:37 AM, Eli Zaretskii wrote:
>> Date: Sun, 17 Mar 2024 12:24:26 -0700
>> Cc: 68254@debbugs.gnu.org
>> From: Jim Porter <jporterbugs@gmail.com>
>>
>> Here's a patch for this. It turns 'eww-readable' into a toggle (using
>> the same semantics as minor modes), and also adds an option to prevent
>> adding a new history entry for each call.
>
> Thanks.
Thanks for looking. I've addressed all of your comments, and made some
more extensive changes to the implementation. I split up some of the
logic in the first patch so that it's easier to reuse without error, and
then added 'eww-readable-urls' in the second.
Because of how much I changed, I'd like to add some regression tests to
make sure everything still works correctly, but otherwise these patches
should be ready to go.
[-- Attachment #2: 0001-Allow-toggling-readable-mode-in-EWW.patch --]
[-- Type: text/plain, Size: 10793 bytes --]
From 4839990148e2a58cf44c04547994611392ff1955 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Sun, 17 Mar 2024 12:01:59 -0700
Subject: [PATCH 1/2] Allow toggling "readable" mode in EWW
Additionally, add an option to prevent adding a new history entry for
each call of 'eww-readable' (bug#68254).
* lisp/net/eww.el (eww-retrieve):
* lisp/net/eww.el (eww-readable-adds-to-history): New option.
(eww-retrieve): Make sure we call CALLBACK in all configurations.
(eww-render): Simplify how to pass encoding.
(eww--parse-html-region, eww-display-document): New functions, extracted
from...
(eww-display-html): ... here.
(eww-document-base): New function.
(eww-readable): Toggle "readable" mode interactively, like with a minor
mode. Consult 'eww-readable-adds-to-history'.
(eww-reload): Use 'eshell-display-document'.
* doc/misc/eww.texi (Basics): Describe the new behavior.
* etc/NEWS: Announce this change.
---
doc/misc/eww.texi | 5 ++
etc/NEWS | 12 +++++
lisp/net/eww.el | 127 ++++++++++++++++++++++++++++++----------------
3 files changed, 99 insertions(+), 45 deletions(-)
diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index d31fcf1802b..522034c874d 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -146,6 +146,11 @@ Basics
which part of the document contains the ``readable'' text, and will
only display this part. This usually gets rid of menus and the like.
+ When called interactively, this command toggles the display of the
+readable parts. With a positive prefix argument, this command always
+displays the readable parts, and with a zero or negative prefix, it
+always displays the full page.
+
@findex eww-toggle-fonts
@vindex shr-use-fonts
@kindex F
diff --git a/etc/NEWS b/etc/NEWS
index b02712dd21c..b23754fb17f 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1054,6 +1054,18 @@ entries newer than the current page. To change the behavior when
browsing from "historical" pages, you can customize
'eww-before-browse-history-function'.
++++
+*** 'eww-readable' now toggles display of the readable parts of a web page.
+When called interactively, 'eww-readable' toggles whether to display
+only the readable parts of a page or the full page. With a positive
+prefix argument, always display the readable parts, and with a zero or
+negative prefix, always display the full page.
+
+---
+*** New option 'eww-readable-adds-to-history'.
+When non-nil (the default), calling 'eww-readable' adds a new entry to
+the EWW page history.
+
** go-ts-mode
+++
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 54847bdf396..fd697846988 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,11 @@ eww-url-transformers
:type '(repeat function)
:version "29.1")
+(defcustom eww-readable-adds-to-history t
+ "If non-nil, calling `eww-readable' adds a new entry to the history."
+ :type 'boolean
+ :version "30.1")
+
(defface eww-form-submit
'((((type x w32 ns haiku pgtk android) (class color)) ; Like default mode line
:box (:line-width 2 :style released-button)
@@ -464,11 +469,11 @@ eww
(defun eww-retrieve (url callback cbargs)
(cond
((null eww-retrieve-command)
- (url-retrieve url #'eww-render cbargs))
+ (url-retrieve url callback cbargs))
((eq eww-retrieve-command 'sync)
(let ((data-buffer (url-retrieve-synchronously url)))
(with-current-buffer data-buffer
- (apply #'eww-render nil cbargs))))
+ (apply callback nil cbargs))))
(t
(let ((buffer (generate-new-buffer " *eww retrieve*"))
(error-buffer (generate-new-buffer " *eww error*")))
@@ -673,9 +678,9 @@ eww-render
(insert (format "<a href=%S>Direct link to the document</a>"
url))
(goto-char (point-min))
- (eww-display-html charset url nil point buffer encode))
+ (eww-display-html (or encode charset) url nil point buffer))
((eww-html-p (car content-type))
- (eww-display-html charset url nil point buffer encode))
+ (eww-display-html (or encode charset) url nil point buffer))
((equal (car content-type) "application/pdf")
(eww-display-pdf))
((string-match-p "\\`image/" (car content-type))
@@ -723,37 +728,43 @@ eww-detect-charset
"[\t\n\r ]*<\\?xml[\t\n\r ]+[^>]*encoding=\"\\([^\"]+\\)")
(match-string 1)))))
+(defun eww--parse-html-region (start end &optional encode)
+ "Parse the HTML between START and END, returning the DOM as an S-expression.
+Use ENCODE to decode the region; if nil, decode as UTF-8.
+
+This replaces the region with the preprocessed HTML."
+ (setq encode (or encode 'utf-8))
+ (with-restriction start end
+ (condition-case nil
+ (decode-coding-region (point-min) (point-max) encode)
+ (coding-system-error nil))
+ ;; Remove CRLF and replace NUL with � before parsing.
+ (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
+ (replace-match (if (match-beginning 1) "" "�") t t))
+ (eww--preprocess-html (point-min) (point-max))
+ (libxml-parse-html-region (point-min) (point-max))))
+
+(defsubst eww-document-base (url dom)
+ `(base ((href . ,url)) ,dom))
+
(declare-function libxml-parse-html-region "xml.c"
(start end &optional base-url discard-comments))
-(defun eww-display-html (charset url &optional document point buffer encode)
+(defun eww-display-document (document &optional point buffer)
(unless (fboundp 'libxml-parse-html-region)
(error "This function requires Emacs to be compiled with libxml2"))
+ (setq buffer (or buffer (current-buffer)))
(unless (buffer-live-p buffer)
(error "Buffer %s doesn't exist" buffer))
;; There should be a better way to abort loading images
;; asynchronously.
(setq url-queue nil)
- (let ((document
- (or document
- (list
- 'base (list (cons 'href url))
- (progn
- (setq encode (or encode charset 'utf-8))
- (condition-case nil
- (decode-coding-region (point) (point-max) encode)
- (coding-system-error nil))
- (save-excursion
- ;; Remove CRLF and replace NUL with � before parsing.
- (while (re-search-forward "\\(\r$\\)\\|\0" nil t)
- (replace-match (if (match-beginning 1) "" "�") t t)))
- (eww--preprocess-html (point) (point-max))
- (libxml-parse-html-region (point) (point-max))))))
- (source (and (null document)
- (buffer-substring (point) (point-max)))))
+ (let ((url (when (eq (car document) 'base)
+ (alist-get 'href (cadr document)))))
+ (unless url
+ (error "Document is missing base URL"))
(with-current-buffer buffer
(setq bidi-paragraph-direction nil)
- (plist-put eww-data :source source)
(plist-put eww-data :dom document)
(let ((inhibit-read-only t)
(inhibit-modification-hooks t)
@@ -794,6 +805,16 @@ eww-display-html
(forward-line 1)))))
(eww-size-text-inputs))))
+(defun eww-display-html (charset url &optional document point buffer)
+ (let ((source (buffer-substring (point) (point-max))))
+ (with-current-buffer buffer
+ (plist-put eww-data :source source)))
+ (eww-display-document
+ (or document
+ (eww-document-base
+ url (eww--parse-html-region (point) (point-max) charset)))
+ point buffer))
+
(defun eww-handle-link (dom)
(let* ((rel (dom-attr dom 'rel))
(href (dom-attr dom 'href))
@@ -1055,30 +1076,47 @@ eww-toggle-paragraph-direction
"automatic"
bidi-paragraph-direction)))
-(defun eww-readable ()
- "View the main \"readable\" parts of the current web page.
+(defun eww-readable (&optional arg)
+ "Toggle display of only the main \"readable\" parts of the current web page.
This command uses heuristics to find the parts of the web page that
-contains the main textual portion, leaving out navigation menus and
-the like."
- (interactive nil eww-mode)
+contain the main textual portion, leaving out navigation menus and the
+like.
+
+If called interactively, toggle the display of the readable parts. If
+the prefix argument is positive, display the readable parts, and if it
+is zero or negative, display the full page.
+
+If called from Lisp, toggle the display of the readable parts if ARG is
+`toggle'. Display the readable parts if ARG is nil, omitted, or is a
+positive number. Display the full page if ARG is a negative number.
+
+When `eww-readable-adds-to-history' is non-nil, calling this function
+adds a new entry to `eww-history'."
+ (interactive (list (if current-prefix-arg
+ (prefix-numeric-value current-prefix-arg)
+ 'toggle))
+ eww-mode)
(let* ((old-data eww-data)
- (dom (with-temp-buffer
+ (make-readable (cond
+ ((eq arg 'toggle)
+ (not (plist-get old-data :readable)))
+ ((and (numberp arg) (< arg 1))
+ nil)
+ (t t)))
+ (dom (with-temp-buffer
(insert (plist-get old-data :source))
- (condition-case nil
- (decode-coding-region (point-min) (point-max) 'utf-8)
- (coding-system-error nil))
- (eww--preprocess-html (point-min) (point-max))
- (libxml-parse-html-region (point-min) (point-max))))
+ (eww--parse-html-region (point-min) (point-max))))
(base (plist-get eww-data :url)))
- (eww-score-readability dom)
- (eww-save-history)
- (eww--before-browse)
- (eww-display-html nil nil
- (list 'base (list (cons 'href base))
- (eww-highest-readability dom))
- nil (current-buffer))
- (dolist (elem '(:source :url :title :next :previous :up :peer))
- (plist-put eww-data elem (plist-get old-data elem)))
+ (when make-readable
+ (eww-score-readability dom)
+ (setq dom (eww-highest-readability dom)))
+ (when eww-readable-adds-to-history
+ (eww-save-history)
+ (eww--before-browse)
+ (dolist (elem '(:source :url :title :next :previous :up :peer))
+ (plist-put eww-data elem (plist-get old-data elem))))
+ (eww-display-document (eww-document-base base dom))
+ (plist-put eww-data :readable make-readable)
(eww--after-page-change)))
(defun eww-score-readability (node)
@@ -1398,8 +1436,7 @@ eww-reload
(if local
(if (null (plist-get eww-data :dom))
(error "No current HTML data")
- (eww-display-html 'utf-8 url (plist-get eww-data :dom)
- (point) (current-buffer)))
+ (eww-display-document (plist-get eww-data :dom) (point)))
(let ((parsed (url-generic-parse-url url)))
(if (equal (url-type parsed) "file")
;; Use Tramp instead of url.el for files (since url.el
--
2.25.1
[-- Attachment #3: 0002-Add-eww-readable-urls.patch --]
[-- Type: text/plain, Size: 4771 bytes --]
From a6634a1d5d0cb440554eeaa5a014406e40ffeee9 Mon Sep 17 00:00:00 2001
From: Jim Porter <jporterbugs@gmail.com>
Date: Mon, 18 Mar 2024 16:52:34 -0700
Subject: [PATCH 2/2] Add 'eww-readable-urls'
* lisp/net/eww.el (eww-readable-urls): New option.
(eww-default-readable-p): New function...
(eww-display-html): ... use it.
* doc/misc/eww.texi (Basics): Document 'eww-readable-urls'.
* etc/NEWS: Announce this change.
---
doc/misc/eww.texi | 16 ++++++++++++++++
etc/NEWS | 6 ++++++
lisp/net/eww.el | 38 +++++++++++++++++++++++++++++++++-----
3 files changed, 55 insertions(+), 5 deletions(-)
diff --git a/doc/misc/eww.texi b/doc/misc/eww.texi
index 522034c874d..a08d6694892 100644
--- a/doc/misc/eww.texi
+++ b/doc/misc/eww.texi
@@ -151,6 +151,22 @@ Basics
displays the readable parts, and with a zero or negative prefix, it
always displays the full page.
+@vindex eww-readable-urls
+ If you want EWW to render a certain page in ``readable'' mode by
+default, you can add a regular expression matching its URL to
+@code{eww-readable-urls}. Each entry can either be a regular expression
+as a string or a cons cell of the form @code{(@var{regexp}
+. @var{readability})}. If @var{readability} is non-@code{nil}, this
+behaves the same as the string form; otherwise, URLs matching
+@var{regexp} will never be displayed in readable mode by default. For
+example, you can use this to make all pages default to readable mode,
+except for a few outliers:
+
+@example
+(setq eww-readable-urls '(("https://example\\.com/" . nil)
+ ".*"))
+@end example
+
@findex eww-toggle-fonts
@vindex shr-use-fonts
@kindex F
diff --git a/etc/NEWS b/etc/NEWS
index b23754fb17f..2af00f712a4 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1061,6 +1061,12 @@ only the readable parts of a page or the full page. With a positive
prefix argument, always display the readable parts, and with a zero or
negative prefix, always display the full page.
++++
+*** New option 'eww-readable-urls'.
+This is a list of regular expressions matching the URLs where EWW should
+display only the readable parts by default. For more details, see
+"(eww) Basics" in the EWW manual.
+
---
*** New option 'eww-readable-adds-to-history'.
When non-nil (the default), calling 'eww-readable' adds a new entry to
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index fd697846988..9505378e040 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -275,6 +275,19 @@ eww-url-transformers
:type '(repeat function)
:version "29.1")
+(defcustom eww-readable-urls nil
+ "A list of regexps matching URLs to display in readable mode by default.
+Each element can be either a string regexp or a cons cell of the
+form (REGEXP . READABILITY). If READABILITY is non-nil, this behaves
+the same as the string form; otherwise, URLs matching REGEXP will never
+be displayed in readable mode by default."
+ :type '(repeat (choice (string :tag "Readable URL")
+ (cons :tag "URL and Readability"
+ (string :tag "URL")
+ (radio (const :tag "Readable" t)
+ (const :tag "Non-readable" nil)))))
+ :version "30.1")
+
(defcustom eww-readable-adds-to-history t
"If non-nil, calling `eww-readable' adds a new entry to the history."
:type 'boolean
@@ -809,11 +822,13 @@ eww-display-html
(let ((source (buffer-substring (point) (point-max))))
(with-current-buffer buffer
(plist-put eww-data :source source)))
- (eww-display-document
- (or document
- (eww-document-base
- url (eww--parse-html-region (point) (point-max) charset)))
- point buffer))
+ (unless document
+ (let ((dom (eww--parse-html-region (point) (point-max) charset)))
+ (when (eww-default-readable-p url)
+ (eww-score-readability dom)
+ (setq dom (eww-highest-readability dom)))
+ (setq document (eww-document-base url dom))))
+ (eww-display-document document point buffer))
(defun eww-handle-link (dom)
(let* ((rel (dom-attr dom 'rel))
@@ -1159,6 +1174,19 @@ eww-highest-readability
(setq result highest))))
result))
+(defun eww-default-readable-p (url)
+ "Return non-nil if URL should be displayed in readable mode by default.
+This consults the entries in `eww-readable-urls' (which see)."
+ (catch 'found
+ (let (result)
+ (dolist (regexp eww-readable-urls)
+ (if (consp regexp)
+ (setq result (cdr regexp)
+ regexp (car regexp))
+ (setq result t))
+ (when (string-match regexp url)
+ (throw 'found result))))))
+
(defvar-keymap eww-mode-map
"g" #'eww-reload ;FIXME: revert-buffer-function instead!
"G" #'eww
--
2.25.1
next prev parent reply other threads:[~2024-03-19 0:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-05 7:35 bug#68254: EWW ‘readable’ by default Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-05 11:52 ` Eli Zaretskii
[not found] ` <poNSnv1DQ7L71-FirbCx9nuQ8gqLlPGTIjDYk2pKo2_H3BPuJArYQ2ziQ4pyADSxHCY5cU40D6MUzRqBAZE3pEcFmnzFPD49xunpLyh1UqI=@proton.me>
2024-01-05 13:35 ` Eli Zaretskii
2024-03-17 19:24 ` Jim Porter
2024-03-18 4:32 ` Adam Porter
2024-03-18 5:17 ` Navajeeth via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-18 5:44 ` Jim Porter
2024-03-18 5:18 ` Jim Porter
2024-03-18 12:37 ` Eli Zaretskii
2024-03-19 0:00 ` Jim Porter [this message]
2024-03-21 10:51 ` Eli Zaretskii
2024-03-22 5:46 ` Jim Porter
2024-03-23 7:48 ` Eli Zaretskii
2024-03-23 17:26 ` Jim Porter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c917fe41-45d8-8ad9-0f61-75ff62e73f50@gmail.com \
--to=jporterbugs@gmail.com \
--cc=68254@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=yvv0@proton.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).