unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Sebastián Monía" <sebastian@sebasmonia.com>
To: Nicholas Drozd <nicholasdrozd@gmail.com>
Cc: 59549@debbugs.gnu.org
Subject: bug#59549: EWW ordered list display irregularity
Date: Mon, 23 Sep 2024 15:07:32 -0400	[thread overview]
Message-ID: <thqned5axk7f.fsf@sebasmonia.com> (raw)
In-Reply-To: <CABAiW0qaXTontrjxu0wCZ1f0d7bKqVhgD73zmYS7rpqpmwdafA@mail.gmail.com> (Nicholas Drozd's message of "Thu, 24 Nov 2022 12:29:13 -0600")

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

Hi everyone,


>  Here is the Wiktionary definition of the word "locus":
>  https://en.wiktionary.org/wiki/locus#Noun When I open that page in
>  EWW, I see five definition entries. But the second entry is blank.

I was able to reproduce.

>  So EWW is not failing to display anything; instead, it is inserting
>  something extra. That extra something comes from this piece of HTML:
>
>    <li class="mw-empty-elt"></li>

I wouldn't say for sure EWW is in the wrong here. Apparently inserting
empty li elements for styling purposed is a somewhat common practice.
Couldn't confirm how "correct" it is, but it is accepted.
(sidenote, stuff like this makes be glad I haven't worked in web stuff
in many many years)

>  I don't know why that's in there or how Firefox knows not to display
>  it. It would be cool if EWW also knew not to display it.

The attached patch does exactly that: skip any li elements that don't
have content.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: shr.el: don't render empty li tags --]
[-- Type: text/x-patch, Size: 2789 bytes --]

From afa3cccda43ea17933d0e782243cf2adc9ee51c6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Sebasti=C3=A1n=20Mon=C3=ADa?=
 <sebastian.monia@sebasmonia.com>
Date: Mon, 23 Sep 2024 15:00:44 -0400
Subject: [PATCH] shr: don't render empty li tags (bug#59549)

---
 lisp/net/shr.el | 49 +++++++++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/lisp/net/shr.el b/lisp/net/shr.el
index cd0e482aee7..2a72621fec4 100644
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -1656,6 +1656,11 @@ Based on https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-infore
   (shr-generic dom)
   (shr-ensure-paragraph))
 
+(defun shr-tag-empty-content-p (dom)
+  "Return t if DOM has no content.
+By \"content\" we mean \"text between the tags\"."
+  (string-empty-p (string-trim (dom-text dom))))
+
 (defun shr-tag-div (dom)
   (let ((display (cdr (assq 'display shr-stylesheet))))
     (if (or (equal display "inline")
@@ -2163,26 +2168,30 @@ BASE is the URL of the HTML being rendered."
   (shr-ensure-paragraph))
 
 (defun shr-tag-li (dom)
-  (shr-ensure-newline)
-  (let ((start (point)))
-    (let* ((bullet
-	    (if (numberp shr-list-mode)
-		(prog1
-		    (format "%d " shr-list-mode)
-		  (setq shr-list-mode (1+ shr-list-mode)))
-	      (car shr-internal-bullet)))
-	   (width (if (numberp shr-list-mode)
-		      (shr-string-pixel-width bullet)
-		    (cdr shr-internal-bullet))))
-      (insert bullet)
-      (shr-mark-fill start)
-      (let ((shr-indentation (+ shr-indentation width)))
-	(put-text-property start (1+ start)
-			   'shr-continuation-indentation shr-indentation)
-	(put-text-property start (1+ start) 'shr-prefix-length (length bullet))
-	(shr-generic dom))))
-  (unless (bolp)
-    (insert "\n")))
+  ;; bug#59549: EWW ordered list display irregularity
+  ;; empty li tags are used sometimes for styling purposes: do not
+  ;; render such tags
+  (unless (shr-tag-empty-content-p dom)
+    (shr-ensure-newline)
+    (let ((start (point)))
+      (let* ((bullet
+	      (if (numberp shr-list-mode)
+		  (prog1
+		      (format "%d " shr-list-mode)
+		    (setq shr-list-mode (1+ shr-list-mode)))
+	        (car shr-internal-bullet)))
+	     (width (if (numberp shr-list-mode)
+		        (shr-string-pixel-width bullet)
+		      (cdr shr-internal-bullet))))
+        (insert bullet)
+        (shr-mark-fill start)
+        (let ((shr-indentation (+ shr-indentation width)))
+	  (put-text-property start (1+ start)
+			     'shr-continuation-indentation shr-indentation)
+	  (put-text-property start (1+ start) 'shr-prefix-length (length bullet))
+	  (shr-generic dom))))
+    (unless (bolp)
+      (insert "\n"))))
 
 (defun shr-mark-fill (start)
   ;; We may not have inserted any text to fill.
-- 
2.45.2.windows.1


[-- Attachment #3: Type: text/plain, Size: 166 bytes --]


I tested with the Wiktionary page and a few other offline tests and it
worked as intended.

Regards,
Seb

-- 
Sebastián Monía
https://site.sebasmonia.com/

  reply	other threads:[~2024-09-23 19:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-24 18:29 bug#59549: EWW ordered list display irregularity Nicholas Drozd
2024-09-23 19:07 ` Sebastián Monía [this message]
2024-10-03 18:27   ` Sebastián Monía
2024-11-05 14:38     ` Sebastián Monía
2024-11-05 17:10       ` Eli Zaretskii
2024-11-05 19:50         ` Jim Porter
2024-11-06  3:20           ` Sebastián Monía
2024-11-06 12:37             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=thqned5axk7f.fsf@sebasmonia.com \
    --to=sebastian@sebasmonia.com \
    --cc=59549@debbugs.gnu.org \
    --cc=nicholasdrozd@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).