unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8
@ 2024-12-17  6:08 Evgeny Kurnevsky
  2024-12-17 13:18 ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Evgeny Kurnevsky @ 2024-12-17  6:08 UTC (permalink / raw)
  To: 74922

[-- Attachment #1: Type: text/plain, Size: 637 bytes --]

According to the docs and comment inside module_copy_string_contents it
should always produce a valid utf-8 string that can be used in dynamic
modules, but it seems it's not always the case. I encountered an emacs
crash when using emacs-module-rs because it always expects a valid utf-8
for strings. To reproduce you can call:

(some-function-from-dynamic-library (encode-coding-string (f-read-text
"wg-private-pc.age") 'utf-8 t))

The file is
https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4809afcbf9/secrets/wg-private-pc.age

See https://github.com/ubolonton/emacs-module-rs/issues/58 for additional
details.

[-- Attachment #2: Type: text/html, Size: 956 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8
  2024-12-17  6:08 bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 Evgeny Kurnevsky
@ 2024-12-17 13:18 ` Eli Zaretskii
       [not found]   ` <CAOEHfojGKXoUKbf1-5N=973OURs==BQTXejLFd8cLhsR1DWh+g@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2024-12-17 13:18 UTC (permalink / raw)
  To: Evgeny Kurnevsky; +Cc: 74922

> From: Evgeny Kurnevsky <kurnevsky@gmail.com>
> Date: Tue, 17 Dec 2024 06:08:30 +0000
> 
> According to the docs and comment inside module_copy_string_contents it should always produce a valid
> utf-8 string that can be used in dynamic modules, but it seems it's not always the case. I encountered an
> emacs crash when using emacs-module-rs because it always expects a valid utf-8 for strings. To reproduce
> you can call:
> 
> (some-function-from-dynamic-library (encode-coding-string (f-read-text "wg-private-pc.age") 'utf-8 t))
> 
> The file is
> https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4809afcbf9/secrets/wg-private-pc.age

This string includes raw bytes, it isn't a text string, as far as I
could see.  It definitely isn't UTF-8 encoded text.  What did you
expect to happen with it when you copy such a string from Emacs?

> See https://github.com/ubolonton/emacs-module-rs/issues/58 for additional details.

Can't say there are too many details there...





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8
       [not found]   ` <CAOEHfojGKXoUKbf1-5N=973OURs==BQTXejLFd8cLhsR1DWh+g@mail.gmail.com>
@ 2024-12-17 13:31     ` Evgeny Kurnevsky
  2024-12-17 14:24       ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Evgeny Kurnevsky @ 2024-12-17 13:31 UTC (permalink / raw)
  To: 74922

[-- Attachment #1: Type: text/plain, Size: 1828 bytes --]

Yes, that's a binary file that is not an utf-8 string. From the comment in
module_copy_string_contents implementation I guessed that in such cases
emacs should signal an error, but instead it just passes this invalid
string to the dynamic library which caused this bug in emacs-module-rs (see
https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#strings
). So if it's expected then maybe it should be explicitly said in the docs
of copy_string_contents here
https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.html
? It just says that it stores the utf-8 encoded text which makes an
impression that it's an always valid utf-8 string.

On Tue, Dec 17, 2024 at 1:18 PM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Evgeny Kurnevsky <kurnevsky@gmail.com>
> > Date: Tue, 17 Dec 2024 06:08:30 +0000
> >
> > According to the docs and comment inside module_copy_string_contents it
> should always produce a valid
> > utf-8 string that can be used in dynamic modules, but it seems it's not
> always the case. I encountered an
> > emacs crash when using emacs-module-rs because it always expects a valid
> utf-8 for strings. To reproduce
> > you can call:
> >
> > (some-function-from-dynamic-library (encode-coding-string (f-read-text
> "wg-private-pc.age") 'utf-8 t))
> >
> > The file is
> >
> https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4809afcbf9/secrets/wg-private-pc.age
>
> This string includes raw bytes, it isn't a text string, as far as I
> could see.  It definitely isn't UTF-8 encoded text.  What did you
> expect to happen with it when you copy such a string from Emacs?
>
> > See https://github.com/ubolonton/emacs-module-rs/issues/58 for
> additional details.
>
> Can't say there are too many details there...
>

[-- Attachment #2: Type: text/html, Size: 2855 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8
  2024-12-17 13:31     ` bug#74922: Fwd: " Evgeny Kurnevsky
@ 2024-12-17 14:24       ` Eli Zaretskii
  2024-12-17 14:46         ` Evgeny Kurnevsky
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2024-12-17 14:24 UTC (permalink / raw)
  To: Evgeny Kurnevsky; +Cc: 74922

> From: Evgeny Kurnevsky <kurnevsky@gmail.com>
> Date: Tue, 17 Dec 2024 13:31:57 +0000
> 
> Yes, that's a binary file that is not an utf-8 string. From the comment in module_copy_string_contents
> implementation I guessed that in such cases emacs should signal an error, but instead it just passes this
> invalid string to the dynamic library which caused this bug in emacs-module-rs (see
> https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#strings ). So if it's expected then
> maybe it should be explicitly said in the docs of copy_string_contents here
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.html ? It just says that it stores
> the utf-8 encoded text which makes an impression that it's an always valid utf-8 string.

I could look into the internals, but I actually wonder why the module
doesn't check the text before relying on such subtle behaviors.  We
didn't document the fact that it signals an error for a reason.

So: why cannot the module code or the application which uses it test
up from that the string it copies is human-readable text, nit some
binary junk?





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8
  2024-12-17 14:24       ` Eli Zaretskii
@ 2024-12-17 14:46         ` Evgeny Kurnevsky
  2024-12-17 15:10           ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Evgeny Kurnevsky @ 2024-12-17 14:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 74922

[-- Attachment #1: Type: text/plain, Size: 1692 bytes --]

It can definitely do it, but I guess in emacs-module-rs it's not done by
default because of performance implications - it might be quite costly to
check every string in some cases, and it wasn't really clear if emacs can
pass an invalid string. So currently this case causes undefined behavior
there which results in emacs crash.

On Tue, Dec 17, 2024 at 2:24 PM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Evgeny Kurnevsky <kurnevsky@gmail.com>
> > Date: Tue, 17 Dec 2024 13:31:57 +0000
> >
> > Yes, that's a binary file that is not an utf-8 string. From the comment
> in module_copy_string_contents
> > implementation I guessed that in such cases emacs should signal an
> error, but instead it just passes this
> > invalid string to the dynamic library which caused this bug in
> emacs-module-rs (see
> >
> https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#strings
> ). So if it's expected then
> > maybe it should be explicitly said in the docs of copy_string_contents
> here
> >
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.html
> ? It just says that it stores
> > the utf-8 encoded text which makes an impression that it's an always
> valid utf-8 string.
>
> I could look into the internals, but I actually wonder why the module
> doesn't check the text before relying on such subtle behaviors.  We
> didn't document the fact that it signals an error for a reason.
>
> So: why cannot the module code or the application which uses it test
> up from that the string it copies is human-readable text, nit some
> binary junk?
>


-- 
С уважением, Курневский Евгений.

[-- Attachment #2: Type: text/html, Size: 2485 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8
  2024-12-17 14:46         ` Evgeny Kurnevsky
@ 2024-12-17 15:10           ` Eli Zaretskii
  0 siblings, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2024-12-17 15:10 UTC (permalink / raw)
  To: Evgeny Kurnevsky; +Cc: 74922

> From: Evgeny Kurnevsky <kurnevsky@gmail.com>
> Date: Tue, 17 Dec 2024 14:46:28 +0000
> Cc: 74922@debbugs.gnu.org
> 
> It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance
> implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs
> can pass an invalid string. So currently this case causes undefined behavior there which results in emacs
> crash.

What do Rust programs do when they are told to read random files?
This is the same situation, basically.

And what would the module do if copy_string_contents *did* signal an
error?





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-12-17 15:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-17  6:08 bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 Evgeny Kurnevsky
2024-12-17 13:18 ` Eli Zaretskii
     [not found]   ` <CAOEHfojGKXoUKbf1-5N=973OURs==BQTXejLFd8cLhsR1DWh+g@mail.gmail.com>
2024-12-17 13:31     ` bug#74922: Fwd: " Evgeny Kurnevsky
2024-12-17 14:24       ` Eli Zaretskii
2024-12-17 14:46         ` Evgeny Kurnevsky
2024-12-17 15:10           ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).