all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#70076: 28.3; xml-escape-string parse issue
@ 2024-03-29 15:44 D. Schmudde
  2024-03-29 18:08 ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: D. Schmudde @ 2024-03-29 15:44 UTC (permalink / raw)
  To: 70076; +Cc: Protesilaos Stavrou

Starting with `emacs -Q`:

(require 'xml)
(xml-escape-string "And now it\342\200\231s all this")

The result is: `xml-escape-string: Invalid XML character: 4194274, 
11`

I expect that the string will parse correctly with these escape 
characters. Or is this expectation wrong?

In GNU Emacs 28.3 (build 1, x86_64-pc-linux-gnu, GTK+ Version 
3.24.33, cairo version 1.16.0)
 of 2023-08-25 built on pop-os
Repository revision: dec958258b133b4c21224c594da433919d852800
Repository branch: emacs-28
System Description: Pop!_OS 22.04 LTS

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ 
JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES 
NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF 
TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB
Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix



-- 
w: http://schmud.de
e: d@schmud.de
t: @dschmudde





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#70076: 28.3; xml-escape-string parse issue
  2024-03-29 15:44 bug#70076: 28.3; xml-escape-string parse issue D. Schmudde
@ 2024-03-29 18:08 ` Eli Zaretskii
  2024-03-31 11:15   ` D. Schmudde
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2024-03-29 18:08 UTC (permalink / raw)
  To: D. Schmudde; +Cc: public, 70076

> Cc: Protesilaos Stavrou <public@protesilaos.com>
> From: "D. Schmudde" <d@schmud.de>
> Date: Fri, 29 Mar 2024 16:44:48 +0100
> 
> Starting with `emacs -Q`:
> 
> (require 'xml)
> (xml-escape-string "And now it\342\200\231s all this")
> 
> The result is: `xml-escape-string: Invalid XML character: 4194274, 
> 11`
> 
> I expect that the string will parse correctly with these escape 
> characters. Or is this expectation wrong?

Your expectation is wrong, AFAIU: you are inserting a unibyte string
(a string made out of raw bytes) instead of inserting a non-ASCII
multibyte string, which is what XML expects.

Why did you need to insert those bytes, and where did they come from?





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#70076: 28.3; xml-escape-string parse issue
  2024-03-29 18:08 ` Eli Zaretskii
@ 2024-03-31 11:15   ` D. Schmudde
  2024-03-31 13:21     ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: D. Schmudde @ 2024-03-31 11:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: public, 70076

Okay, good to know. Thanks for taking a look.

Here is some additional context. It occurs when using Elfeed's 
~elfeed-export-opml~ on my list of RSS feeds. It seems the library 
relies on ~xml-escape-string~ to parse each element. It's worth 
noting that this happens on several feeds, not just the feed for 
leancrew.com listed below.

I can file a bug with the package maintainers but I wasn't sure if 
the XML parser was a better place to start. Here is the specific 
backtrace, if it's useful:

Debugger entered--Lisp error: (xml-invalid-character 4194274 11)
  signal(xml-invalid-character (4194274 11))
  xml-escape-string("And now it\342\200\231s all this")
  xml-debug-print-internal((outline ((xmlUrl 
  . "https://leancrew.com/all-this/feed/") (title . "And now 
  it\342\200\231s all this"))) "    ")
  ...

/David

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: Protesilaos Stavrou <public@protesilaos.com>
>> From: "D. Schmudde" <d@schmud.de>
>> Date: Fri, 29 Mar 2024 16:44:48 +0100
>>
>> Starting with `emacs -Q`:
>>
>> (require 'xml)
>> (xml-escape-string "And now it\342\200\231s all this")
>>
>> The result is: `xml-escape-string: Invalid XML character: 
>> 4194274,
>> 11`
>>
>> I expect that the string will parse correctly with these escape
>> characters. Or is this expectation wrong?
>
> Your expectation is wrong, AFAIU: you are inserting a unibyte 
> string
> (a string made out of raw bytes) instead of inserting a 
> non-ASCII
> multibyte string, which is what XML expects.
>
> Why did you need to insert those bytes, and where did they come 
> from?


--
w: http://schmud.de
e: d@schmud.de
t: @dschmudde





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#70076: 28.3; xml-escape-string parse issue
  2024-03-31 11:15   ` D. Schmudde
@ 2024-03-31 13:21     ` Eli Zaretskii
  0 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2024-03-31 13:21 UTC (permalink / raw)
  To: D. Schmudde; +Cc: public, 70076

> From: "D. Schmudde" <d@schmud.de>
> Cc: 70076@debbugs.gnu.org, public@protesilaos.com
> Date: Sun, 31 Mar 2024 13:15:29 +0200
> 
> Okay, good to know. Thanks for taking a look.
> 
> Here is some additional context. It occurs when using Elfeed's 
> ~elfeed-export-opml~ on my list of RSS feeds. It seems the library 
> relies on ~xml-escape-string~ to parse each element. It's worth 
> noting that this happens on several feeds, not just the feed for 
> leancrew.com listed below.

OK, but still: how did you get to that point?  Where did the
problematic string originate from?  Was it something that you typed or
copy/pasted, or something else?

> I can file a bug with the package maintainers but I wasn't sure if 
> the XML parser was a better place to start.

Yes, I think it is best to start by reporting this to package
maintainers.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-31 13:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-29 15:44 bug#70076: 28.3; xml-escape-string parse issue D. Schmudde
2024-03-29 18:08 ` Eli Zaretskii
2024-03-31 11:15   ` D. Schmudde
2024-03-31 13:21     ` Eli Zaretskii

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.