From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Katsumi Yamaoka Newsgroups: gmane.emacs.bugs Subject: bug#30789: 26.0.91; xml-parse-region works but libxml-parse-html-region doesn't Date: Tue, 13 Mar 2018 08:38:09 +0900 Organization: Emacsen advocacy group Message-ID: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1520897896 19293 195.159.176.226 (12 Mar 2018 23:38:16 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 12 Mar 2018 23:38:16 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.91 (x86_64-unknown-cygwin) Cc: =?UTF-8?Q?=E7=A9=8D=E4=B8=B9=E5=B0=BC?= Dan Jacobson To: 30789@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Mar 13 00:38:11 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1evX19-0004vY-7F for geb-bug-gnu-emacs@m.gmane.org; Tue, 13 Mar 2018 00:38:11 +0100 Original-Received: from localhost ([::1]:36184 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1evX39-0001gG-Ao for geb-bug-gnu-emacs@m.gmane.org; Mon, 12 Mar 2018 19:40:15 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51908) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1evX32-0001ck-2D for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 19:40:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1evX2x-0004Oi-5y for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 19:40:08 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:49930) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1evX2w-0004OV-TF for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 19:40:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1evX2w-0007ha-IQ for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 19:40:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Katsumi Yamaoka Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 12 Mar 2018 23:40:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 30789 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.152089795129535 (code B ref -1); Mon, 12 Mar 2018 23:40:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 12 Mar 2018 23:39:11 +0000 Original-Received: from localhost ([127.0.0.1]:57827 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1evX25-0007gH-U9 for submit@debbugs.gnu.org; Mon, 12 Mar 2018 19:39:10 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:60713) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1evX23-0007g2-LX for submit@debbugs.gnu.org; Mon, 12 Mar 2018 19:39:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1evX1x-0003ZI-P9 for submit@debbugs.gnu.org; Mon, 12 Mar 2018 19:39:02 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:49161) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1evX1x-0003Yu-FV for submit@debbugs.gnu.org; Mon, 12 Mar 2018 19:39:01 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51654) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1evX1w-0001OV-8M for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 19:39:01 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1evX1r-0003TE-Cg for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 19:39:00 -0400 Original-Received: from hampton.hostforweb.net ([181.214.31.159]:38088) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1evX1r-0002lt-87 for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 19:38:55 -0400 Original-Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1evX19-003aKW-Fs; Mon, 12 Mar 2018 18:38:12 -0500 X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( Cancel-Lock: sha1:dzfg6kMpHpBh01vsiAZAPUW9MW8= X-OutGoing-Spam-Status: No, score=-1.5 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (barebone) [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:144166 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, Jidanni mailed me an example html mail that contains a broken encoded text as follows: .......=E5=85=AC=E5=91=8A=E8=BE=A6=E7=90=86=E7=8F=BE=E9=87=91=E6=95=91= =E5=8A=A9=E5=8F=8A=E4=BD=8E=E5=88=A9=E8=B2=B8=E6=AC=BE\343\200 \202=E5=9B=A02=E6=9C=88 =E4=BD=8E=E6=BA=AB=E5=8D=B1=E5=AE=B3=E8=BE=B2=E4=BD=9C=E7=89=A9=E7=82=BA= =E5=BB=B6=E9=81=B2=E6=80=A7=E6=90=8D=E5=AE=B3=EF=BC=8C....... This is a part of the contents. The original one is encoded by utf-8 and 8-bit (attached in this mail). Where "\343\200\n \202" is the encoded version of "=E3=80=82", i.e., "\343\200\202", but broken in the middle of the bytes. It seems that a stupid mail software perpetrates it because of a long encoded line. When I read the mail using Gnus + shr, the text after the broken point is all cut off. That is what libxml-parse-html-region does, whereas xml-parse-region doesn't cut it. Moreover a web browser, to which I send the html data using the `K H' command, shows all the text (the broken character is shown as is, though). This is not necessarily a libxml bug anyway, but I hope it works like xml-parse. Thanks. In GNU Emacs 26.0.91 (build 1, x86_64-unknown-cygwin, GTK+ Version 3.22.28) of 2018-03-12 built on localhost Windowing system distributor 'The Cygwin/X Project', version 11.0.11906000 --=-=-= Content-Type: application/x-gunzip; charset=utf-8 Content-Disposition: attachment; filename=example-html-mail.gz Content-Transfer-Encoding: base64 H4sICKsNp1oCA2V4YW1wbGUtaHRtbC1tYWlsALPJKMnNseNSULDJSE1MATGAzNzUkkSFjJKSAt3U wtLMMlsl5/y8ktS8Et2QyoJUJYVkCM9WqSS1okQfZIC1QnJGYlFxaoltaUmaroUS2EB9mIk2Sfkp lRCj9SDgaeuapxO7Xuxb9nxC2/P+fS/bJz6bOvFp18qn/V1P9vY97Vj5YtOOZ2v2PW7gUmh6OnuB 0bM5HUDxZ7tWP+3d+HTd5hf7Nj3ZO+d558rnTbue7t72snHTs4blzyb0AqXe7+mB2gF2AsRmG32I LwFZLaXI7QAAAA== --=-=-=--