From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lars Ingebrigtsen Newsgroups: gmane.emacs.bugs Subject: bug#40794: 26.3; HTML entities ☆ and ★ (inter alia) are not parsed by libxml-parse-html-region Date: Wed, 29 Jul 2020 07:26:15 +0200 Message-ID: <878sf23n9k.fsf@gnus.org> References: <87368uwd1f.fsf@passepartout.tim-landscheidt.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15325"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: 40794@debbugs.gnu.org To: Tim Landscheidt Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jul 29 07:27:13 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1k0ecT-0003qN-Nx for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 29 Jul 2020 07:27:13 +0200 Original-Received: from localhost ([::1]:52162 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k0ecS-0003bi-7r for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 29 Jul 2020 01:27:12 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:42688) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k0ecI-0003bX-1J for bug-gnu-emacs@gnu.org; Wed, 29 Jul 2020 01:27:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:48200) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k0ecH-0000ys-Oa for bug-gnu-emacs@gnu.org; Wed, 29 Jul 2020 01:27:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1k0ecH-0003FJ-KC for bug-gnu-emacs@gnu.org; Wed, 29 Jul 2020 01:27:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Lars Ingebrigtsen Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 29 Jul 2020 05:27:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 40794 X-GNU-PR-Package: emacs Original-Received: via spool by 40794-submit@debbugs.gnu.org id=B40794.159600038712436 (code B ref 40794); Wed, 29 Jul 2020 05:27:01 +0000 Original-Received: (at 40794) by debbugs.gnu.org; 29 Jul 2020 05:26:27 +0000 Original-Received: from localhost ([127.0.0.1]:59746 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k0ebi-0003EW-Sp for submit@debbugs.gnu.org; Wed, 29 Jul 2020 01:26:27 -0400 Original-Received: from quimby.gnus.org ([95.216.78.240]:41986) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k0ebh-0003EK-DN for 40794@debbugs.gnu.org; Wed, 29 Jul 2020 01:26:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID :In-Reply-To:Date:References:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=TR0tMONcdeULTYuKuyThijpClM1vOGipMcIzXYbGuz8=; b=FjmAhdcmf894ruyJWTJWO6/AN8 vjuHE+JaWrc6z0Yp56dghICpZ0WZNBpqyYWmr2lhGVQKsc+qElzwEtk9PXc+6GsjXSCFJqNrF2b9e h7lbGpjqPEhhi8BTaO+hMCfwchXbcgADU0CW17D9DHvXND7AVcz18bo12RkpWzPJtl1I=; Original-Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=xo) by quimby with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1k0ebY-0002WC-Ck; Wed, 29 Jul 2020 07:26:19 +0200 In-Reply-To: <87368uwd1f.fsf@passepartout.tim-landscheidt.de> (Tim Landscheidt's message of "Thu, 23 Apr 2020 13:24:12 +0000") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:183643 Archived-At: Tim Landscheidt writes: > (Prologue: This bug showed up in the "ALT" attribute of an > "IMG" element of an HTML mail in Gnus. I am reasonably cer- > tain that this stems from libxml-parse-html-region and > should be fixed there, but there may be more prudent solu- > tions.) [...] > These should instead yield "=C3=A4" (228), "=E2=98=86" (9734) and > "=E2=98=85" (9733). > > lisp/leim/quail/sgml-input.el seems to contain the necessary > data for ☆ and ★ that could probably be fed to > libxml. As far as I can tell, libxml2 doesn't take a list of entities as an input when parsing HTML? I may have missed something... Hm, a bit of googling shows http://xmlsoft.org/html/libxml-entities.html and there is apparently a way to tell libxml2 about further entities? But I think this all sounds more like a libxml2 than an Emacs bug, really? --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no