unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Kangas <stefan@marxist.se>
To: "Basil L. Contovounesios" <contovob@tcd.ie>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, 48211@debbugs.gnu.org
Subject: bug#48211: 28.0.50; eww strips whitespace between <mark> elements
Date: Mon, 3 May 2021 19:35:35 -0500	[thread overview]
Message-ID: <CADwFkmkKmqjbLuEQNJD+Z7Q8CFgc2Oia+YUEkPXgSZ7rFx5_ag@mail.gmail.com> (raw)
In-Reply-To: <87y2cvl6eg.fsf@tcd.ie> (Basil L. Contovounesios's message of "Tue, 04 May 2021 00:55:03 +0100")

"Basil L. Contovounesios" <contovob@tcd.ie> writes:

> I think this is because libxml-parse-html-region specifies
> HTML_PARSE_NOBLANKS:
>
> Return CDATA sections (like <style>foo</style>) as text nodes.
> 3c2317e891 2010-12-06 17:59:52 +0100
> https://git.sv.gnu.org/cgit/emacs.git/commit/?id=3c2317e89100833812a7194c0d9d39ae0f52cb33

Hmm, okay.  For now, I'm seeing this issue with basically any tag that
libxml2 does not already know about, e.g. "<summary>" or "<bdi>".

This is what I came up with before reading Basil's reply:

(with-temp-buffer
  (insert "<p><tt>foo</tt> <tt>bar</tt></p>")
  (libxml-parse-html-region (point-min) (point-max)))

=> (html nil (body nil (p nil (tt nil "foo") " " (tt nil "bar"))))

(with-temp-buffer
  (insert "<p><mark>foo</mark> <mark>bar</mark></p>")
  (libxml-parse-html-region (point-min) (point-max)))

=> (html nil (body nil (p nil (mark nil "foo") (mark nil "bar"))))

I guess this is a bug in libxml2, so I reported it here:

    https://gitlab.gnome.org/GNOME/libxml2/-/issues/247

FWIW, the below diff works around this bug for me.

diff --git a/lisp/net/shr.el b/lisp/net/shr.el
index cbdeb65ba8..3eb3a5bc49 100644
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -1485,6 +1485,12 @@ shr-tag-tt
   ;; The `tt' tag is deprecated in favor of `code'.
   (shr-tag-code dom))

+(defun shr-tag-mark (dom)
+  (shr-generic dom)
+  ;; Hack to work around bug in libxml2 (Bug#48211):
+  ;; https://gitlab.gnome.org/GNOME/libxml2/-/issues/247
+  (insert " "))
+
 (defun shr-tag-ins (cont)
   (let* ((start (point))
          (color "green")





  reply	other threads:[~2021-05-04  0:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-03 23:16 bug#48211: 28.0.50; eww strips whitespace between <mark> elements Stefan Kangas
2021-05-03 23:55 ` Basil L. Contovounesios
2021-05-04  0:35   ` Stefan Kangas [this message]
2021-05-04  0:51     ` Stefan Kangas
2022-07-01 11:46     ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADwFkmkKmqjbLuEQNJD+Z7Q8CFgc2Oia+YUEkPXgSZ7rFx5_ag@mail.gmail.com \
    --to=stefan@marxist.se \
    --cc=48211@debbugs.gnu.org \
    --cc=contovob@tcd.ie \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).