From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Katsumi Yamaoka Newsgroups: gmane.emacs.bugs Subject: bug#30789: 26.0.91; xml-parse-region works but libxml-parse-html-region doesn't Date: Tue, 13 Mar 2018 12:31:09 +0900 Organization: Emacsen advocacy group Message-ID: References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1520911822 20364 195.159.176.226 (13 Mar 2018 03:30:22 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 13 Mar 2018 03:30:22 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-unknown-cygwin) Cc: 30789@debbugs.gnu.org To: Lars Ingebrigtsen , =?UTF-8?Q?=E7=A9=8D=E4=B8=B9=E5=B0=BC?= Dan Jacobson Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Mar 13 04:30:17 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1evadl-0005BM-6T for geb-bug-gnu-emacs@m.gmane.org; Tue, 13 Mar 2018 04:30:17 +0100 Original-Received: from localhost ([::1]:36811 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1evafo-0007rb-97 for geb-bug-gnu-emacs@m.gmane.org; Mon, 12 Mar 2018 23:32:24 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59524) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1evafZ-0007kv-9q for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 23:32:10 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1evafU-0003Vb-DV for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 23:32:09 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:50023) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1evafU-0003V5-8t for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 23:32:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1evafS-00056W-7G for bug-gnu-emacs@gnu.org; Mon, 12 Mar 2018 23:32:04 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: Resent-From: Katsumi Yamaoka Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 13 Mar 2018 03:32:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 30789 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 30789-submit@debbugs.gnu.org id=B30789.152091187619561 (code B ref 30789); Tue, 13 Mar 2018 03:32:02 +0000 Original-Received: (at 30789) by debbugs.gnu.org; 13 Mar 2018 03:31:16 +0000 Original-Received: from localhost ([127.0.0.1]:57920 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1evaei-00055R-Ic for submit@debbugs.gnu.org; Mon, 12 Mar 2018 23:31:16 -0400 Original-Received: from hampton.hostforweb.net ([181.214.31.159]:59064) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1evaeg-00055G-9J for 30789@debbugs.gnu.org; Mon, 12 Mar 2018 23:31:14 -0400 Original-Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1evaec-0002Qj-1h; Mon, 12 Mar 2018 22:31:11 -0500 X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( Cancel-Lock: sha1:GXkfKnKgq0zx8oj/DFCgNDRp6hM= X-OutGoing-Spam-Status: No, score=-1.5 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - debbugs.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:144177 Archived-At: --=-=-= On Tue, 13 Mar 2018 11:28:45 +0900, Katsumi Yamaoka wrote: > + ;; Remove extra bytes in utf-8 encoded data. > + (when (eq coding 'utf-8) > + (goto-char (point-min)) > + (while (re-search-forward "[\x00-\x7f]+\\([\x80-\xbf]\\)" nil t) > + (replace-match "\\1"))) Corrected: --=-=-= Content-Type: text/x-patch Content-Disposition: inline --- mm-decode.el~ 2018-02-28 02:01:37.897607000 +0000 +++ mm-decode.el 2018-03-13 03:27:56.885844100 +0000 @@ -1810,6 +1810,13 @@ (when (and (or coding (setq coding (mm-charset-to-coding-system charset nil t))) (not (eq coding 'ascii))) + ;; Remove extra bytes in utf-8 encoded data. + (when (eq coding 'utf-8) + (goto-char (point-min)) + (while (re-search-forward + "\\([\xc2-\xf7][\x80-\xbf]?\\)[\x00-\x7f]+\\([\x80-\xbf]\\)" + nil t) + (replace-match "\\1\\2"))) (insert (prog1 (decode-coding-string (buffer-string) coding) (erase-buffer) --=-=-=--