From: Stefan Kangas <stefan@marxist.se>
To: "Basil L. Contovounesios" <contovob@tcd.ie>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, 48211@debbugs.gnu.org
Subject: bug#48211: 28.0.50; eww strips whitespace between <mark> elements
Date: Mon, 3 May 2021 19:35:35 -0500 [thread overview]
Message-ID: <CADwFkmkKmqjbLuEQNJD+Z7Q8CFgc2Oia+YUEkPXgSZ7rFx5_ag@mail.gmail.com> (raw)
In-Reply-To: <87y2cvl6eg.fsf@tcd.ie> (Basil L. Contovounesios's message of "Tue, 04 May 2021 00:55:03 +0100")
"Basil L. Contovounesios" <contovob@tcd.ie> writes:
> I think this is because libxml-parse-html-region specifies
> HTML_PARSE_NOBLANKS:
>
> Return CDATA sections (like <style>foo</style>) as text nodes.
> 3c2317e891 2010-12-06 17:59:52 +0100
> https://git.sv.gnu.org/cgit/emacs.git/commit/?id=3c2317e89100833812a7194c0d9d39ae0f52cb33
Hmm, okay. For now, I'm seeing this issue with basically any tag that
libxml2 does not already know about, e.g. "<summary>" or "<bdi>".
This is what I came up with before reading Basil's reply:
(with-temp-buffer
(insert "<p><tt>foo</tt> <tt>bar</tt></p>")
(libxml-parse-html-region (point-min) (point-max)))
=> (html nil (body nil (p nil (tt nil "foo") " " (tt nil "bar"))))
(with-temp-buffer
(insert "<p><mark>foo</mark> <mark>bar</mark></p>")
(libxml-parse-html-region (point-min) (point-max)))
=> (html nil (body nil (p nil (mark nil "foo") (mark nil "bar"))))
I guess this is a bug in libxml2, so I reported it here:
https://gitlab.gnome.org/GNOME/libxml2/-/issues/247
FWIW, the below diff works around this bug for me.
diff --git a/lisp/net/shr.el b/lisp/net/shr.el
index cbdeb65ba8..3eb3a5bc49 100644
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -1485,6 +1485,12 @@ shr-tag-tt
;; The `tt' tag is deprecated in favor of `code'.
(shr-tag-code dom))
+(defun shr-tag-mark (dom)
+ (shr-generic dom)
+ ;; Hack to work around bug in libxml2 (Bug#48211):
+ ;; https://gitlab.gnome.org/GNOME/libxml2/-/issues/247
+ (insert " "))
+
(defun shr-tag-ins (cont)
(let* ((start (point))
(color "green")
next prev parent reply other threads:[~2021-05-04 0:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-03 23:16 bug#48211: 28.0.50; eww strips whitespace between <mark> elements Stefan Kangas
2021-05-03 23:55 ` Basil L. Contovounesios
2021-05-04 0:35 ` Stefan Kangas [this message]
2021-05-04 0:51 ` Stefan Kangas
2022-07-01 11:46 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CADwFkmkKmqjbLuEQNJD+Z7Q8CFgc2Oia+YUEkPXgSZ7rFx5_ag@mail.gmail.com \
--to=stefan@marxist.se \
--cc=48211@debbugs.gnu.org \
--cc=contovob@tcd.ie \
--cc=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).