From: "Sebastián Monía" <sebastian@sebasmonia.com>
To: Nicholas Drozd <nicholasdrozd@gmail.com>
Cc: 59549@debbugs.gnu.org
Subject: bug#59549: EWW ordered list display irregularity
Date: Mon, 23 Sep 2024 15:07:32 -0400 [thread overview]
Message-ID: <thqned5axk7f.fsf@sebasmonia.com> (raw)
In-Reply-To: <CABAiW0qaXTontrjxu0wCZ1f0d7bKqVhgD73zmYS7rpqpmwdafA@mail.gmail.com> (Nicholas Drozd's message of "Thu, 24 Nov 2022 12:29:13 -0600")
[-- Attachment #1: Type: text/plain, Size: 931 bytes --]
Hi everyone,
> Here is the Wiktionary definition of the word "locus":
> https://en.wiktionary.org/wiki/locus#Noun When I open that page in
> EWW, I see five definition entries. But the second entry is blank.
I was able to reproduce.
> So EWW is not failing to display anything; instead, it is inserting
> something extra. That extra something comes from this piece of HTML:
>
> <li class="mw-empty-elt"></li>
I wouldn't say for sure EWW is in the wrong here. Apparently inserting
empty li elements for styling purposed is a somewhat common practice.
Couldn't confirm how "correct" it is, but it is accepted.
(sidenote, stuff like this makes be glad I haven't worked in web stuff
in many many years)
> I don't know why that's in there or how Firefox knows not to display
> it. It would be cool if EWW also knew not to display it.
The attached patch does exactly that: skip any li elements that don't
have content.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: shr.el: don't render empty li tags --]
[-- Type: text/x-patch, Size: 2789 bytes --]
From afa3cccda43ea17933d0e782243cf2adc9ee51c6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?=
<sebastian.monia@sebasmonia.com>
Date: Mon, 23 Sep 2024 15:00:44 -0400
Subject: [PATCH] shr: don't render empty li tags (bug#59549)
---
lisp/net/shr.el | 49 +++++++++++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 20 deletions(-)
diff --git a/lisp/net/shr.el b/lisp/net/shr.el
index cd0e482aee7..2a72621fec4 100644
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -1656,6 +1656,11 @@ Based on https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-infore
(shr-generic dom)
(shr-ensure-paragraph))
+(defun shr-tag-empty-content-p (dom)
+ "Return t if DOM has no content.
+By \"content\" we mean \"text between the tags\"."
+ (string-empty-p (string-trim (dom-text dom))))
+
(defun shr-tag-div (dom)
(let ((display (cdr (assq 'display shr-stylesheet))))
(if (or (equal display "inline")
@@ -2163,26 +2168,30 @@ BASE is the URL of the HTML being rendered."
(shr-ensure-paragraph))
(defun shr-tag-li (dom)
- (shr-ensure-newline)
- (let ((start (point)))
- (let* ((bullet
- (if (numberp shr-list-mode)
- (prog1
- (format "%d " shr-list-mode)
- (setq shr-list-mode (1+ shr-list-mode)))
- (car shr-internal-bullet)))
- (width (if (numberp shr-list-mode)
- (shr-string-pixel-width bullet)
- (cdr shr-internal-bullet))))
- (insert bullet)
- (shr-mark-fill start)
- (let ((shr-indentation (+ shr-indentation width)))
- (put-text-property start (1+ start)
- 'shr-continuation-indentation shr-indentation)
- (put-text-property start (1+ start) 'shr-prefix-length (length bullet))
- (shr-generic dom))))
- (unless (bolp)
- (insert "\n")))
+ ;; bug#59549: EWW ordered list display irregularity
+ ;; empty li tags are used sometimes for styling purposes: do not
+ ;; render such tags
+ (unless (shr-tag-empty-content-p dom)
+ (shr-ensure-newline)
+ (let ((start (point)))
+ (let* ((bullet
+ (if (numberp shr-list-mode)
+ (prog1
+ (format "%d " shr-list-mode)
+ (setq shr-list-mode (1+ shr-list-mode)))
+ (car shr-internal-bullet)))
+ (width (if (numberp shr-list-mode)
+ (shr-string-pixel-width bullet)
+ (cdr shr-internal-bullet))))
+ (insert bullet)
+ (shr-mark-fill start)
+ (let ((shr-indentation (+ shr-indentation width)))
+ (put-text-property start (1+ start)
+ 'shr-continuation-indentation shr-indentation)
+ (put-text-property start (1+ start) 'shr-prefix-length (length bullet))
+ (shr-generic dom))))
+ (unless (bolp)
+ (insert "\n"))))
(defun shr-mark-fill (start)
;; We may not have inserted any text to fill.
--
2.45.2.windows.1
[-- Attachment #3: Type: text/plain, Size: 166 bytes --]
I tested with the Wiktionary page and a few other offline tests and it
worked as intended.
Regards,
Seb
--
Sebastián Monía
https://site.sebasmonia.com/
next prev parent reply other threads:[~2024-09-23 19:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-24 18:29 bug#59549: EWW ordered list display irregularity Nicholas Drozd
2024-09-23 19:07 ` Sebastián Monía [this message]
2024-10-03 18:27 ` Sebastián Monía
2024-11-05 14:38 ` Sebastián Monía
2024-11-05 17:10 ` Eli Zaretskii
2024-11-05 19:50 ` Jim Porter
2024-11-06 3:20 ` Sebastián Monía
2024-11-06 12:37 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=thqned5axk7f.fsf@sebasmonia.com \
--to=sebastian@sebasmonia.com \
--cc=59549@debbugs.gnu.org \
--cc=nicholasdrozd@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).