From: Tim Landscheidt <tim@tim-landscheidt.de>
To: 40794@debbugs.gnu.org
Subject: bug#40794: 26.3; HTML entities ☆ and ★ (inter alia) are not parsed by libxml-parse-html-region
Date: Thu, 23 Apr 2020 13:24:12 +0000 [thread overview]
Message-ID: <87368uwd1f.fsf@passepartout.tim-landscheidt.de> (raw)
(Prologue: This bug showed up in the "ALT" attribute of an
"IMG" element of an HTML mail in Gnus. I am reasonably cer-
tain that this stems from libxml-parse-html-region and
should be fixed there, but there may be more prudent solu-
tions.)
With GNU Emacs 26.3 on Fedora:
| ELISP> (with-temp-buffer
| (insert "<!DOCTYPE html>
| <html lang=\"en\">
| <head><title>Title</title></head>
| <body>
| <p>Hello world</p>
| <p>ä</p>
| <p>☆</p>
| <p>★</p>
| </body>
| </html>")
| (libxml-parse-html-region (point-min) (point-max)))
| (html
| ((lang . "en"))
| (head nil
| (title nil "Title"))
| (body nil "\n "
| (p nil "Hello world")
| "\n "
| (p nil "ä")
| "\n "
| (p nil "☆")
| "\n "
| (p nil "★")
| "\n"))
| ELISP>
These should instead yield "ä" (228), "☆" (9734) and
"★" (9733).
lisp/leim/quail/sgml-input.el seems to contain the necessary
data for ☆ and ★ that could probably be fed to
libxml.
next reply other threads:[~2020-04-23 13:24 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-23 13:24 Tim Landscheidt [this message]
2020-07-29 5:26 ` bug#40794: 26.3; HTML entities ☆ and ★ (inter alia) are not parsed by libxml-parse-html-region Lars Ingebrigtsen
2020-07-29 5:35 ` Lars Ingebrigtsen
2020-09-09 13:22 ` Stefan Kangas
2020-11-25 10:03 ` Stefan Kangas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87368uwd1f.fsf@passepartout.tim-landscheidt.de \
--to=tim@tim-landscheidt.de \
--cc=40794@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).