From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Evgeny Kurnevsky Newsgroups: gmane.emacs.bugs Subject: bug#74922: Fwd: bug#74922: 29.4; copy_string_contents doesn't always produce a valid utf-8 Date: Tue, 17 Dec 2024 14:46:28 +0000 Message-ID: References: <86msguo3cg.fsf@gnu.org> <8634imo0aa.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000063de2e0629785d3e" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5537"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 74922@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Dec 17 15:48:25 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tNYs5-0001Io-E9 for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 17 Dec 2024 15:48:25 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tNYrn-0004GJ-Pl; Tue, 17 Dec 2024 09:48:07 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tNYrj-0004G0-Qa for bug-gnu-emacs@gnu.org; Tue, 17 Dec 2024 09:48:04 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tNYrj-0007b7-IF for bug-gnu-emacs@gnu.org; Tue, 17 Dec 2024 09:48:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=Date:From:In-Reply-To:References:MIME-Version:To:Subject; bh=/XLe4OqjeyI0qc00q/3u89dy/Yxw+RtxC+vYjVeBzQY=; b=R5AhTnQyi2b85xYexhd1giwP7L5nIcAaYw+UftbzX9CiXfDdelAq7KYtGA8ZgF3PECbiZ59kSOZcxM4iu75LiTLRZwwR9dQyu0FY3HgjoKO85CITTnrJO9KUu+NggoyqhU8N9UpU4n6/PjMnWYpqDm++FdInvZn+gbd1tVurWEmztrQjJPb58HSQyxzWRiB31u3GTSSQeV3jbVM9ndxbp2Hn3xYd+oMa+Jhc10JzaHzELqEs+SRWoYQLslHGAKSWX2zZuVKxjR2V2BEkfohs0wDJXx2mnJz7oKW9YUI/LrSO8JJJfroIbQrQZAkNl7Tz+IHvZe1vQE5tjkTX9KlVeQ==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1tNYrh-0003Xp-Ns for bug-gnu-emacs@gnu.org; Tue, 17 Dec 2024 09:48:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Evgeny Kurnevsky Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 17 Dec 2024 14:48:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 74922 X-GNU-PR-Package: emacs Original-Received: via spool by 74922-submit@debbugs.gnu.org id=B74922.173444686713581 (code B ref 74922); Tue, 17 Dec 2024 14:48:01 +0000 Original-Received: (at 74922) by debbugs.gnu.org; 17 Dec 2024 14:47:47 +0000 Original-Received: from localhost ([127.0.0.1]:58271 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNYrR-0003Wy-RW for submit@debbugs.gnu.org; Tue, 17 Dec 2024 09:47:46 -0500 Original-Received: from mail-ej1-f51.google.com ([209.85.218.51]:51326) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tNYrN-0003Wm-4i for 74922@debbugs.gnu.org; Tue, 17 Dec 2024 09:47:42 -0500 Original-Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-aa67f31a858so994207866b.2 for <74922@debbugs.gnu.org>; Tue, 17 Dec 2024 06:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734446800; x=1735051600; darn=debbugs.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/XLe4OqjeyI0qc00q/3u89dy/Yxw+RtxC+vYjVeBzQY=; b=ayrTZPJsWWl4MXSl9nFW50hnin0k5bSskEXIj0aUFQOlpP6b4PmFr30x3P9rtY/GqU Tem4E83MAk2HUvbGqTVuzOR5Jjy51pYMYDU+Z5xxQzaf1eIPnFS+Ihxqncg+5Vn9s7So ja1TE0eZgv+UxP7Va5/qdNSFMr8MOgpjHWC18+gpWLkjV4+PdqqYX+I9PUccMStIclAx 98duaWsUsXZzbjiDQKbq5L0dB102Du6225A4Okw6E8DHoHjYASSGFzFtGNHe8TG6dZxA ktw2i8ABJG+mylq8VDyOBBJ7xxgosr3XDDcXmgUFn097JDljVfaWbxcLk3dvFqAo7eTD +XJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734446800; x=1735051600; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/XLe4OqjeyI0qc00q/3u89dy/Yxw+RtxC+vYjVeBzQY=; b=Yd2oiziUHdnMT2UDriHlzwLVhQPyAV2G4OY1zQ8AiLOtRF7mjpWdoG35w0hpAS6WYB OVTIy/70bmvdr7sSc4VXKFK7UEXsWVuaSNW4ou9HMV4WrXcLZygzhxw8YOf4UCPoDD0q g6xj+luctQCSrZ18JwRTNyLQz94IGbE8VYmChmy9z2E3u+fOnoZE2kkazhxWchRsn5zw uRp9e/hNQm0PZ/wLiWSuMfzFo+kxskBZlxzoQyDgJojdx1hLVZGx950bd4JmxfKsEoW7 l+Rm3qcH7WTcHV9ZRNLeJNz7O2USVIfpEPuFDIIbeDrhf8Std+KseCHT6wdbzWtc3CQ0 9Muw== X-Gm-Message-State: AOJu0YxmzdH9VfY635yPtM8kOlJsxvWZPSAsjWbhFGcOItNUgvCdaVYq F+LMYhHaitZy1T+j9uvIoBSNTgwtLMAQqHnXGX1i+iHChZ3nxwKcyW/kssPo6wT/pD57bv84KhT 63nn4BfCXFkE+CmbhK1gg8DheKF0= X-Gm-Gg: ASbGncvtJuMYyhfQp12Acs+Z6AGDdRCJYrGmnxOk+LuSBr+IjApsB+VwVJDzVkEHtsw FPd3bDSj37Gy+G/kt5PLU0a2fyGeEWm8KvJBFIWP7o26/Zg4vyjy1rPG0HWTSWoEcRQpIRA== X-Google-Smtp-Source: AGHT+IEHAy0nGGog7mMtkJz3SuU/ru7vDz47mb0LHhP7O4wLVn104nZe9zLcTKwAEXXRyO/yRZi/3YodEOgO58eFUZM= X-Received: by 2002:a17:907:2d89:b0:aa6:832b:8d71 with SMTP id a640c23a62f3a-aab778d9dbemr1376575066b.2.1734446799901; Tue, 17 Dec 2024 06:46:39 -0800 (PST) In-Reply-To: <8634imo0aa.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:297300 Archived-At: --00000000000063de2e0629785d3e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs can pass an invalid string. So currently this case causes undefined behavior there which results in emacs crash. On Tue, Dec 17, 2024 at 2:24=E2=80=AFPM Eli Zaretskii wrote: > > From: Evgeny Kurnevsky > > Date: Tue, 17 Dec 2024 13:31:57 +0000 > > > > Yes, that's a binary file that is not an utf-8 string. From the comment > in module_copy_string_contents > > implementation I guessed that in such cases emacs should signal an > error, but instead it just passes this > > invalid string to the dynamic library which caused this bug in > emacs-module-rs (see > > > https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#= strings > ). So if it's expected then > > maybe it should be explicitly said in the docs of copy_string_contents > here > > > https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.h= tml > ? It just says that it stores > > the utf-8 encoded text which makes an impression that it's an always > valid utf-8 string. > > I could look into the internals, but I actually wonder why the module > doesn't check the text before relying on such subtle behaviors. We > didn't document the fact that it signals an error for a reason. > > So: why cannot the module code or the application which uses it test > up from that the string it copies is human-readable text, nit some > binary junk? > --=20 =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, =D0=9A=D1=83= =D1=80=D0=BD=D0=B5=D0=B2=D1=81=D0=BA=D0=B8=D0=B9 =D0=95=D0=B2=D0=B3=D0=B5= =D0=BD=D0=B8=D0=B9. --00000000000063de2e0629785d3e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
It can definitely do it, but I guess in emacs-module-rs it= 's not done by default because of performance implications - it might b= e quite costly to check every string in some cases, and it wasn't reall= y clear if emacs can pass an invalid string. So currently this case causes = undefined behavior there which results in emacs crash.

On Tue, Dec 17, 2024 at 2:24=E2=80=AFPM Eli Zaretskii <eliz@gnu.org> wrote:
> From: Evgeny Kurnevsky <kurnevsky@gmail.com><= br> > Date: Tue, 17 Dec 2024 13:31:57 +0000
>
> Yes, that's a binary file that is not an utf-8 string. From the co= mment in module_copy_string_contents
> implementation I guessed that in such cases emacs should signal an err= or, but instead it just passes this
> invalid string to the dynamic library which caused this bug in emacs-m= odule-rs (see
> https://ubolont= on.github.io/emacs-module-rs/latest/type-conversions.html#strings ). So= if it's expected then
> maybe it should be explicitly said in the docs of copy_string_contents= here
> https://www.gnu.org= /software/emacs/manual/html_node/elisp/Module-Values.html ? It just say= s that it stores
> the utf-8 encoded text which makes an impression that it's an alwa= ys valid utf-8 string.

I could look into the internals, but I actually wonder why the module
doesn't check the text before relying on such subtle behaviors.=C2=A0 W= e
didn't document the fact that it signals an error for a reason.

So: why cannot the module code or the application which uses it test
up from that the string it copies is human-readable text, nit some
binary junk?


--
= =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, =D0=9A=D1=83= =D1=80=D0=BD=D0=B5=D0=B2=D1=81=D0=BA=D0=B8=D0=B9 =D0=95=D0=B2=D0=B3=D0=B5= =D0=BD=D0=B8=D0=B9.
--00000000000063de2e0629785d3e--