From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 Date: Sat, 21 Dec 2024 14:09:24 +0200 Message-ID: <86ttax6xvv.fsf@gnu.org> References: <86msguo3cg.fsf@gnu.org> <8634imo0aa.fsf@gnu.org> <86zfkumjkj.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36300"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 74922@debbugs.gnu.org To: kurnevsky@gmail.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Dec 21 13:10:28 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tOyJP-0009De-4v for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 21 Dec 2024 13:10:27 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tOyJ5-0006BC-JQ; Sat, 21 Dec 2024 07:10:10 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tOyJ0-00067e-Fj for bug-gnu-emacs@gnu.org; Sat, 21 Dec 2024 07:10:02 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tOyJ0-0002Br-6i for bug-gnu-emacs@gnu.org; Sat, 21 Dec 2024 07:10:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=References:In-Reply-To:From:Date:To:Subject; bh=eyVS7eyMyQ+b5R8/bH/+7fhTxJhNcSIP4vIsnvXvkqo=; b=PPBJQ448t2JV6E+Z+lTxLYz810/KYIOwz4Gm3BXj4xpZc1u0JP4TrCzjpFM8/fRg7cYWZuNw/Dht+0HQXDPQoi/21Z8Kfdaz0nhNtgTImYj+M01YP7G2qKu4iOf6Gqq2J2qj5qQyIOxPsu5+yMfIgnVcFQr9Fj0dvq7xBcQEFoXTZkQGTn3C/sOT75aNqWVvp6jvrI4cudlA2zDJ7Pce1OpBKH4MtdyIBEbXxGmbpMfTFQlEQl1U3JK0I+ZkdGpVdYIenRfToEMzHn6NxEfNdCfdYOsK86R8lfiYrKlWjcE2GJWmyRKmbBqUWOpeYmyPL1v5psVjIT4hs8tGOTAjSw==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tOyJ0-0006Z1-0s for bug-gnu-emacs@gnu.org; Sat, 21 Dec 2024 07:10:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 21 Dec 2024 12:10:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 74922 X-GNU-PR-Package: emacs Original-Received: via spool by 74922-submit@debbugs.gnu.org id=B74922.173478298425190 (code B ref 74922); Sat, 21 Dec 2024 12:10:01 +0000 Original-Received: (at 74922) by debbugs.gnu.org; 21 Dec 2024 12:09:44 +0000 Original-Received: from localhost ([127.0.0.1]:45571 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tOyIi-0006YE-0i for submit@debbugs.gnu.org; Sat, 21 Dec 2024 07:09:44 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:58878) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tOyIg-0006Y0-Ak for 74922@debbugs.gnu.org; Sat, 21 Dec 2024 07:09:43 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tOyIb-00020N-3j; Sat, 21 Dec 2024 07:09:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=eyVS7eyMyQ+b5R8/bH/+7fhTxJhNcSIP4vIsnvXvkqo=; b=sKfxhISAZt/L 7Few+MZVnWAPUq9IOSgARgzC5CV+PpLleWwxvDj2A9Qi61exxo06BbDKTVRLAfgNCTfKbFoHBXA0W SiCA4PfY6AJExA23rRkaEv/MhtaWSOGZbDYNQOEEhdM109j7B8EsYEJqErvAY0WYwrgTOdzjZyNwo ZQ8HzlUrHinEuqQNSXcM1aBGgR/WufL0ue+w+chA42gJ2xoKl5+WZ62grLileWSZhm11ObhgkuXsO AhdqY0h3cHMizQOO3sfP6RvcykMmtBNsh1iedL7ScEiO1A4mJNF3ISWpfjtxNkTq4xP3Fs6f0YUCO VgGiRIHs3/WDyfl2JIcHEw==; In-Reply-To: <86zfkumjkj.fsf@gnu.org> (message from Eli Zaretskii on Tue, 17 Dec 2024 17:10:36 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:297519 Archived-At: > Cc: 74922@debbugs.gnu.org > Date: Tue, 17 Dec 2024 17:10:36 +0200 > From: Eli Zaretskii > > > From: Evgeny Kurnevsky > > Date: Tue, 17 Dec 2024 14:46:28 +0000 > > Cc: 74922@debbugs.gnu.org > > > > It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance > > implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs > > can pass an invalid string. So currently this case causes undefined behavior there which results in emacs > > crash. > > What do Rust programs do when they are told to read random files? > This is the same situation, basically. > > And what would the module do if copy_string_contents *did* signal an > error? I think I know what happened: you called copy_string_contents with a unibyte string. In that case, copy_string_contents will return you the original string without doing anything. The code in copy_string_contents that signals an error relies on the fact that encoding the input string yields nil if the input includes non-Unicode characters. But that cannot be established with unibyte strings, because a unibyte string doesn't hold characters, it holds raw bytes. What you should do is make sure the string passed to copy_string_contents is a multibyte string. If I do that, i.e. (switch-to-buffer "foo") (set-buffer-multibyte t) (insert-file-contents "/path/to/wg-private-pc.age") (setq str1 (buffer-string)) and then call copy_string_contents with the resulting string str1, I get the result you expected. You need to realize that copy_string_contents is a variant of text-encoding routines: it encodes the input multibyte string in UTF-8. The encoding routines in Emacs always return unibyte strings without doing anything, because a unibyte string is already encoded, or at least is supposed to be encoded. And before you ask: no, copy_string_contents cannot by itself signal an error if passed a unibyte string, because a unibyte string can legitimately be a valid UTF-8 string. So in this case, copy_string_contents relies on the caller to make sure the input is valid UTF-8.