From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "D. Schmudde" Newsgroups: gmane.emacs.bugs Subject: bug#70076: 28.3; xml-escape-string parse issue Date: Sun, 31 Mar 2024 13:15:29 +0200 Message-ID: <87cyraaby6.fsf@schmud.de> References: <87h6gp9gte.fsf@schmud.de> <86il14ews3.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; format=flowed Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14046"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.10.7; emacs 28.3 Cc: public@protesilaos.com, 70076@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Mar 31 13:44:26 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rqtbt-0003S2-U0 for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 31 Mar 2024 13:44:25 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rqtbX-000537-Hl; Sun, 31 Mar 2024 07:44:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rqtbU-00052f-ND for bug-gnu-emacs@gnu.org; Sun, 31 Mar 2024 07:44:00 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rqtbU-0007pQ-24 for bug-gnu-emacs@gnu.org; Sun, 31 Mar 2024 07:44:00 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rqtbW-00011A-AL for bug-gnu-emacs@gnu.org; Sun, 31 Mar 2024 07:44:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: "D. Schmudde" Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 31 Mar 2024 11:44:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70076 X-GNU-PR-Package: emacs Original-Received: via spool by 70076-submit@debbugs.gnu.org id=B70076.17118854173815 (code B ref 70076); Sun, 31 Mar 2024 11:44:02 +0000 Original-Received: (at 70076) by debbugs.gnu.org; 31 Mar 2024 11:43:37 +0000 Original-Received: from localhost ([127.0.0.1]:46739 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rqtb3-0000zG-RO for submit@debbugs.gnu.org; Sun, 31 Mar 2024 07:43:37 -0400 Original-Received: from mailtransmit05.runbox.com ([2a0c:5a00:149::26]:51544) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rqtB9-0005DG-Vu for 70076@debbugs.gnu.org; Sun, 31 Mar 2024 07:16:49 -0400 Original-Received: from mailtransmit02.runbox ([10.9.9.162] helo=aibo.runbox.com) by mailtransmit05.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1rqtB1-00Bv9r-Ca; Sun, 31 Mar 2024 13:16:39 +0200 Original-Received: from [10.9.9.74] (helo=submission03.runbox) by mailtransmit02.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1rqtB0-0005uf-Vq; Sun, 31 Mar 2024 13:16:39 +0200 Original-Received: by submission03.runbox with esmtpsa [Authenticated ID (687959)] (TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256) (Exim 4.93) id 1rqtAh-007hDt-AI; Sun, 31 Mar 2024 13:16:19 +0200 In-reply-to: <86il14ews3.fsf@gnu.org> X-Mailman-Approved-At: Sun, 31 Mar 2024 07:43:32 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:282431 Archived-At: Okay, good to know. Thanks for taking a look. Here is some additional context. It occurs when using Elfeed's ~elfeed-export-opml~ on my list of RSS feeds. It seems the library relies on ~xml-escape-string~ to parse each element. It's worth noting that this happens on several feeds, not just the feed for leancrew.com listed below. I can file a bug with the package maintainers but I wasn't sure if the XML parser was a better place to start. Here is the specific backtrace, if it's useful: Debugger entered--Lisp error: (xml-invalid-character 4194274 11) signal(xml-invalid-character (4194274 11)) xml-escape-string("And now it\342\200\231s all this") xml-debug-print-internal((outline ((xmlUrl . "https://leancrew.com/all-this/feed/") (title . "And now it\342\200\231s all this"))) " ") ... /David Eli Zaretskii writes: >> Cc: Protesilaos Stavrou >> From: "D. Schmudde" >> Date: Fri, 29 Mar 2024 16:44:48 +0100 >> >> Starting with `emacs -Q`: >> >> (require 'xml) >> (xml-escape-string "And now it\342\200\231s all this") >> >> The result is: `xml-escape-string: Invalid XML character: >> 4194274, >> 11` >> >> I expect that the string will parse correctly with these escape >> characters. Or is this expectation wrong? > > Your expectation is wrong, AFAIU: you are inserting a unibyte > string > (a string made out of raw bytes) instead of inserting a > non-ASCII > multibyte string, which is what XML expects. > > Why did you need to insert those bytes, and where did they come > from? -- w: http://schmud.de e: d@schmud.de t: @dschmudde