From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 Date: Sat, 04 Jan 2025 13:39:25 +0200 Message-ID: <86o70merki.fsf@gnu.org> References: <86msguo3cg.fsf@gnu.org> <8634imo0aa.fsf@gnu.org> <86zfkumjkj.fsf@gnu.org> <86ttax6xvv.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14657"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 74922-done@debbugs.gnu.org To: kurnevsky@gmail.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jan 04 12:40:24 2025 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tU2Vz-0003cw-QE for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 04 Jan 2025 12:40:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tU2Vg-0000oM-6A; Sat, 04 Jan 2025 06:40:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tU2Ve-0000o2-PG for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 06:40:02 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tU2Ve-000124-Gv for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 06:40:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=References:In-Reply-To:From:Date:To:Subject; bh=FB7oxxBbQuS3v59RRJeGs3NmJDs52Y/bfHMG9ANXDIM=; b=PHlhCn4OlyRh/loLmXjlzkMPl1VbqDaxLXTUyAvjzwV6ML9QSE56yA5DFhtosV+6yekPpTEeOgSAUPaPDQkmntmR8XgGIVDb9HQWa7C0L6ywa63wVKnd6O6MM/29u7nKfos640cZqMNk7ORYyI9fVpxQbgHIOJamMHlE4P6I2bq8UY+XfoUBB6tnoftU4OhwmO0eGErH2MyhwM6RAE8IjnrHr/0cQnAVxnXF5mdNQsHtjCpktyQTSJeE7mNI2prmW1mDu0MfZZXV+D9+BMZzK6eE+PmU8iEh3gqnJCb7Ped5HO5H++Xdz2kM3olCMsaxzHSFJAIoIF4OKeTaiXPQSA==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tU2Ve-0002xi-BV for bug-gnu-emacs@gnu.org; Sat, 04 Jan 2025 06:40:02 -0500 Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-To: bug-gnu-emacs@gnu.org Resent-Date: Sat, 04 Jan 2025 11:40:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: cc-closed 74922 X-GNU-PR-Package: emacs Mail-Followup-To: 74922@debbugs.gnu.org, eliz@gnu.org, kurnevsky@gmail.com Original-Received: via spool by 74922-done@debbugs.gnu.org id=D74922.173599077611339 (code D ref 74922); Sat, 04 Jan 2025 11:40:02 +0000 Original-Received: (at 74922-done) by debbugs.gnu.org; 4 Jan 2025 11:39:36 +0000 Original-Received: from localhost ([127.0.0.1]:53633 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tU2VD-0002wo-PB for submit@debbugs.gnu.org; Sat, 04 Jan 2025 06:39:36 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:45674) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tU2VB-0002wZ-7e for 74922-done@debbugs.gnu.org; Sat, 04 Jan 2025 06:39:33 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tU2V5-0000qW-VD; Sat, 04 Jan 2025 06:39:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=FB7oxxBbQuS3v59RRJeGs3NmJDs52Y/bfHMG9ANXDIM=; b=EAttuw16YFXL cWWZK5puNgMsM47uh1MDVbUS+dXcD92yC1jN28lFTqYt4SBWeCOFJbqUmot7cvgC+MiqvSOydTNkI J03HtMOwmYdrwzen+e+5FVs7XOMj/tY9sinCVhThXfu0v71dH2AKD5u690uAAaxlMZSqWe1yQ4VZ5 a3VHDLkLaX3TFrTW6+YJc1o41rsp5kFfHNQuO/Gh64BTfD2Th4bO2ME4k2cnAi8edeFSwcRHzr9Ml Oc5nMjhT0SDXH8m2TCXv8Nv5zEXgbBmASsCLpdf9VmVx6a8srs2bn/+aIC6q41yTGMKlhLsS0BKiR /IHjMCvkrN3zip+8Gy3Sgg==; In-Reply-To: <86ttax6xvv.fsf@gnu.org> (message from Eli Zaretskii on Sat, 21 Dec 2024 14:09:24 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:298368 Archived-At: > Cc: 74922@debbugs.gnu.org > Date: Sat, 21 Dec 2024 14:09:24 +0200 > From: Eli Zaretskii > > > Cc: 74922@debbugs.gnu.org > > Date: Tue, 17 Dec 2024 17:10:36 +0200 > > From: Eli Zaretskii > > > > > From: Evgeny Kurnevsky > > > Date: Tue, 17 Dec 2024 14:46:28 +0000 > > > Cc: 74922@debbugs.gnu.org > > > > > > It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance > > > implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs > > > can pass an invalid string. So currently this case causes undefined behavior there which results in emacs > > > crash. > > > > What do Rust programs do when they are told to read random files? > > This is the same situation, basically. > > > > And what would the module do if copy_string_contents *did* signal an > > error? > > I think I know what happened: you called copy_string_contents with a > unibyte string. In that case, copy_string_contents will return you > the original string without doing anything. The code in > copy_string_contents that signals an error relies on the fact that > encoding the input string yields nil if the input includes non-Unicode > characters. But that cannot be established with unibyte strings, > because a unibyte string doesn't hold characters, it holds raw bytes. > > What you should do is make sure the string passed to > copy_string_contents is a multibyte string. If I do that, i.e. > > (switch-to-buffer "foo") > (set-buffer-multibyte t) > (insert-file-contents "/path/to/wg-private-pc.age") > (setq str1 (buffer-string)) > > and then call copy_string_contents with the resulting string str1, I > get the result you expected. > > You need to realize that copy_string_contents is a variant of > text-encoding routines: it encodes the input multibyte string in > UTF-8. The encoding routines in Emacs always return unibyte strings > without doing anything, because a unibyte string is already encoded, > or at least is supposed to be encoded. > > And before you ask: no, copy_string_contents cannot by itself signal > an error if passed a unibyte string, because a unibyte string can > legitimately be a valid UTF-8 string. So in this case, > copy_string_contents relies on the caller to make sure the input is > valid UTF-8. I believe the above explains the problem and the solution, so I'm now closing this bug.