unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "D. Schmudde" <d@schmud.de>
To: Eli Zaretskii <eliz@gnu.org>
Cc: public@protesilaos.com, 70076@debbugs.gnu.org
Subject: bug#70076: 28.3; xml-escape-string parse issue
Date: Sun, 31 Mar 2024 13:15:29 +0200	[thread overview]
Message-ID: <87cyraaby6.fsf@schmud.de> (raw)
In-Reply-To: <86il14ews3.fsf@gnu.org>

Okay, good to know. Thanks for taking a look.

Here is some additional context. It occurs when using Elfeed's 
~elfeed-export-opml~ on my list of RSS feeds. It seems the library 
relies on ~xml-escape-string~ to parse each element. It's worth 
noting that this happens on several feeds, not just the feed for 
leancrew.com listed below.

I can file a bug with the package maintainers but I wasn't sure if 
the XML parser was a better place to start. Here is the specific 
backtrace, if it's useful:

Debugger entered--Lisp error: (xml-invalid-character 4194274 11)
  signal(xml-invalid-character (4194274 11))
  xml-escape-string("And now it\342\200\231s all this")
  xml-debug-print-internal((outline ((xmlUrl 
  . "https://leancrew.com/all-this/feed/") (title . "And now 
  it\342\200\231s all this"))) "    ")
  ...

/David

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: Protesilaos Stavrou <public@protesilaos.com>
>> From: "D. Schmudde" <d@schmud.de>
>> Date: Fri, 29 Mar 2024 16:44:48 +0100
>>
>> Starting with `emacs -Q`:
>>
>> (require 'xml)
>> (xml-escape-string "And now it\342\200\231s all this")
>>
>> The result is: `xml-escape-string: Invalid XML character: 
>> 4194274,
>> 11`
>>
>> I expect that the string will parse correctly with these escape
>> characters. Or is this expectation wrong?
>
> Your expectation is wrong, AFAIU: you are inserting a unibyte 
> string
> (a string made out of raw bytes) instead of inserting a 
> non-ASCII
> multibyte string, which is what XML expects.
>
> Why did you need to insert those bytes, and where did they come 
> from?


--
w: http://schmud.de
e: d@schmud.de
t: @dschmudde





  reply	other threads:[~2024-03-31 11:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-29 15:44 bug#70076: 28.3; xml-escape-string parse issue D. Schmudde
2024-03-29 18:08 ` Eli Zaretskii
2024-03-31 11:15   ` D. Schmudde [this message]
2024-03-31 13:21     ` Eli Zaretskii
2024-06-30  6:11       ` Stefan Kangas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cyraaby6.fsf@schmud.de \
    --to=d@schmud.de \
    --cc=70076@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=public@protesilaos.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).