From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Evgeny Kurnevsky Newsgroups: gmane.emacs.bugs Subject: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 Date: Tue, 17 Dec 2024 13:31:57 +0000 Message-ID: References: <86msguo3cg.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000de5eba0629775208" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15151"; mail-complaints-to="usenet@ciao.gmane.io" To: 74922@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Dec 17 14:34:48 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tNXiq-0003hF-AQ for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 17 Dec 2024 14:34:48 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tNXiP-0005Ia-OK; Tue, 17 Dec 2024 08:34:21 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tNXi6-0005Eu-LE for bug-gnu-emacs@gnu.org; Tue, 17 Dec 2024 08:34:04 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tNXi6-00049n-BI for bug-gnu-emacs@gnu.org; Tue, 17 Dec 2024 08:34:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=Date:From:In-Reply-To:References:MIME-Version:To:Subject; bh=bxejSS1suD1NvAFa9thWQZcBxF/91+OSXa9Gz1RxWyc=; b=ahA+HzTeLj0yu0DmUJtZ283IM4iF8yX4LPc6Iu/BnW3K7Fc4ywZUv7wenh/+vVSqvokzMm7gDx6CudP+BfYwnVdTy4VH9vLFU2rz+S0SGQqCIVDJ9ZYEhMbSOVvpUKh9ggzJ/EYWSteHO/fWRm4wB6v6wiEJ/Nf/sFHS9GCt1cY/AQQBz7Xz2Zb0rrWTuK8FYBPkjhuPUAKceC24gL4LxmOd8FwBB/FtJyQ49xL4GWMxIkR8WYoMMTbexKeDjsq1Pvers7lF6gm5VZIx4vmyQg1SJNUHTuYDdlpUQVCrIpBveL0eHeVbruU8LE5tAENFyt0VnF63ciI9lzjzblTawQ==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tNXi5-0008D3-W0 for bug-gnu-emacs@gnu.org; Tue, 17 Dec 2024 08:34:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Evgeny Kurnevsky Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 17 Dec 2024 13:34:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 74922 X-GNU-PR-Package: emacs Original-Received: via spool by 74922-submit@debbugs.gnu.org id=B74922.173444240431481 (code B ref 74922); Tue, 17 Dec 2024 13:34:01 +0000 Original-Received: (at 74922) by debbugs.gnu.org; 17 Dec 2024 13:33:24 +0000 Original-Received: from localhost ([127.0.0.1]:58080 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNXhP-0008BY-IE for submit@debbugs.gnu.org; Tue, 17 Dec 2024 08:33:23 -0500 Original-Received: from mail-wr1-f46.google.com ([209.85.221.46]:52378) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNXhK-0008B7-O6 for 74922@debbugs.gnu.org; Tue, 17 Dec 2024 08:33:15 -0500 Original-Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-3862ca8e0bbso4539225f8f.0 for <74922@debbugs.gnu.org>; Tue, 17 Dec 2024 05:33:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734442329; x=1735047129; darn=debbugs.gnu.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=bxejSS1suD1NvAFa9thWQZcBxF/91+OSXa9Gz1RxWyc=; b=mDK9Z3ky0nIS56JqU111cXquGJPXHW2UMz4scXVdpzmAdkKCckrfjDEGoMIWCr5a2S yT9hiBtxW5qXp4BOfA6cKttqB3kBA6jZgRm1yTXbwWhGNGhSF78CY6vE/BqFTzdZtAg+ m2dygjAaucuV6NCeOBNnS04dbGTp/e3yeJd3jePhqbbh6cs3BdRuUR0Kx+QLeXKiGVZ9 1lbTBfI0tP//+OCmar1IBALchPFMR0uFJlM3y00qbgoNQ0awKsoarGbKUL8AIVPS8X3o 9x44XuXjU5JoCSyE+qwEjxZkJ2BjIlKkbutn/saBP7vDsHhk+gvg1+TSjKDHuhcd51gE bXow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734442329; x=1735047129; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bxejSS1suD1NvAFa9thWQZcBxF/91+OSXa9Gz1RxWyc=; b=viBIgDIzJZyHW71+3OW1hrbN+6LHrIWoCQEC4DIcuUiyQO9St2Q/mC+ggItuvXlegF zKGIGKb1MeNPaNx04UG1ctcUWi04X8pn9OANLMxnv0CUin8QQ9jimEYfGPOGW8YxOWnF bxTJW/F2V9LJIgEnlljkKnrFn5wL8WFHDdcf1yVAcYJVwBmFja96+kBmFZ/tQmMY4qHl bfnu+UYiFXLjsDpJrpuVz2k0VokH3VmK1b/A6nIZmOwjIG65Pnv2QTaNF2g+Jsk3YE1F M/3cSgZlgU4y8vfpjP59RnKs/h09VC5Ey1js8spcfVB+szoIxXauKWLMCOxFOdrw8Ma/ h4FQ== X-Gm-Message-State: AOJu0YyPYmq0yC/Q1P6iCJlMbHBhCw3p2+rZcmkctxBd594u0SOovibS Ee+9+gXTjVLiDxIir8bYN2qcC2Sv42bpMoT13xoBXAw8/clVesikKE3/SlpRe4c93uwG4RDzgPa EsbGeHOpvOHdkbQZalY546YOqu0X1gNdo X-Gm-Gg: ASbGncvkfR98QWJ9x9851ofhZ+kIJ9qJSfPhg9OF769rnucy9pi7ZlPWTiuTA9kChaZ /4fAijHo9KH20wALNPnQhJZU9PkwpKNdPVVhnFX4PDDV1Ae2bjrlYApUSpJXU5kX0qp93uw== X-Google-Smtp-Source: AGHT+IFcXngYXTyQKNPrJ7ZB2SZHp33Ekc3WY8iZnv7NlEIJk8wQgwwyTV/MnWP3ZNwttaXuIkldHpUifoIXF5sGdmI= X-Received: by 2002:a05:6000:4b0e:b0:382:4ab4:b3e5 with SMTP id ffacd0b85a97d-3886fe7c5ebmr16668316f8f.0.1734442328413; Tue, 17 Dec 2024 05:32:08 -0800 (PST) In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:297291 Archived-At: --000000000000de5eba0629775208 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Yes, that's a binary file that is not an utf-8 string. From the comment in module_copy_string_contents implementation I guessed that in such cases emacs should signal an error, but instead it just passes this invalid string to the dynamic library which caused this bug in emacs-module-rs (see https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#st= rings ). So if it's expected then maybe it should be explicitly said in the docs of copy_string_contents here https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.htm= l ? It just says that it stores the utf-8 encoded text which makes an impression that it's an always valid utf-8 string. On Tue, Dec 17, 2024 at 1:18=E2=80=AFPM Eli Zaretskii wrote: > > From: Evgeny Kurnevsky > > Date: Tue, 17 Dec 2024 06:08:30 +0000 > > > > According to the docs and comment inside module_copy_string_contents it > should always produce a valid > > utf-8 string that can be used in dynamic modules, but it seems it's not > always the case. I encountered an > > emacs crash when using emacs-module-rs because it always expects a vali= d > utf-8 for strings. To reproduce > > you can call: > > > > (some-function-from-dynamic-library (encode-coding-string (f-read-text > "wg-private-pc.age") 'utf-8 t)) > > > > The file is > > > https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627a55788b80d4= 809afcbf9/secrets/wg-private-pc.age > > This string includes raw bytes, it isn't a text string, as far as I > could see. It definitely isn't UTF-8 encoded text. What did you > expect to happen with it when you copy such a string from Emacs? > > > See https://github.com/ubolonton/emacs-module-rs/issues/58 for > additional details. > > Can't say there are too many details there... > --000000000000de5eba0629775208 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Yes, that's a binary file that is not an utf-8 string. From th= e comment in module_copy_string_contents implementation I guessed that in s= uch cases emacs should signal an error, but instead it just passes this inv= alid string to the dynamic library which caused this bug in emacs-module-rs= (see https://ubolonton.github.io/emacs= -module-rs/latest/type-conversions.html#strings ). So if it's expec= ted then maybe it should be explicitly said in the docs of=C2=A0copy_string_contents here ht= tps://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.html<= /a> ? It just says that it stores the utf-8 encoded text which makes an imp= ression that it's an always valid utf-8 string.

> From: Evgeny Kurnevsky <kurnevsky@gmail.com>
> Date: Tue, 17 Dec 2024 06:08:30 +0000
>
> According to the docs and comment inside module_copy_string_contents i= t should always produce a valid
> utf-8 string that can be used in dynamic modules, but it seems it'= s not always the case. I encountered an
> emacs crash when using emacs-module-rs because it always expects a val= id utf-8 for strings. To reproduce
> you can call:
>
> (some-function-from-dynamic-library (encode-coding-string (f-read-text= "wg-private-pc.age") 'utf-8 t))
>
> The file is
> https://github.com/kurnevsky/nixfiles/raw/0b3de016dac551398627= a55788b80d4809afcbf9/secrets/wg-private-pc.age

This string includes raw bytes, it isn't a text string, as far as I
could see.=C2=A0 It definitely isn't UTF-8 encoded text.=C2=A0 What did= you
expect to happen with it when you copy such a string from Emacs?

> See https://github.com/ubolonton/emacs-mo= dule-rs/issues/58 for additional details.

Can't say there are too many details there...
--000000000000de5eba0629775208--