From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: master 49e243c0c85: Avoid resizing mutation in subst-char-in-string, take two Date: Tue, 14 May 2024 09:06:54 +0300 Message-ID: <861q65x6yp.fsf@gnu.org> References: <865xvhy4wn.fsf@gnu.org> <8AF4F364-9030-4634-91C5-79E297E5335B@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30957"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue May 14 08:07:59 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1s6lKQ-0007sU-Ng for ged-emacs-devel@m.gmane-mx.org; Tue, 14 May 2024 08:07:58 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s6lJS-0001PV-OF; Tue, 14 May 2024 02:06:58 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s6lJR-0001PL-41 for emacs-devel@gnu.org; Tue, 14 May 2024 02:06:57 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s6lJQ-0000Ut-MR; Tue, 14 May 2024 02:06:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=MF8ZeP6v5Y6cKLqa27QqM/5ejdiURVetTuGxUVnsRnI=; b=gtftc46FqL+bUW3qdDRT iFx7kDeIchvFoA77kI+OB59zlIAIiPIFaJ7fjvQBPOIS7Knvp5wPg/SY4298xuno7MZgnGUiaMJfp M6oRXjhvkx/ahEhtM3FNXnzSmacgG/SqACUxlXZ7hcO9kFxCA6LuyXAIIS7VBrbjfX+Sqf4qq5E7V 54MhDkl/dQY6D952PgCk/gPhYyYUnk1zYSCAtsZyuYQT7XesMmB+SZBn/jHbSyn432kbxREawyX9J zwRzFV5fGV3ycwQpSsUm0XUKnx3T05KpD7mBHAzzM1RlGJ59C81fMvLL3HdpNWChiMtSMCuyt0nwQ b1NM5JQOXSecuQ==; In-Reply-To: <8AF4F364-9030-4634-91C5-79E297E5335B@gmail.com> (message from Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Mon, 13 May 2024 21:20:24 +0200) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:319213 Archived-At: > From: Mattias EngdegÄrd > Date: Mon, 13 May 2024 21:20:24 +0200 > Cc: emacs-devel@gnu.org > > 13 maj 2024 kl. 19.53 skrev Eli Zaretskii : > > > >> + (if (and (not inplace) > >> + (if (multibyte-string-p string) > >> + (> (max fromchar tochar) 127) > >> + (> tochar 255))) > > > > Is the above condition correct? My reading of it is that if INPLACE > > is non-nil, we use aset (which will resize a string) even if TOCHAR > > needs more bytes than FROMCHAR. Which seems to be in contradiction > > with the goal of the change, as advertised by the log message: "avoid > > resizing mutation". > > I agree that it does look a bit odd, but it's intentional. First of all, the aim is to insulate non-mutating calls to the function from issues arising from mutation in the implementation. If we don't have to mutate and it's faster and/or safer not to, then we shouldn't. > > Second, the function is documented to change the string in-place if INPLACE is non-nil, so in that case we have no choice but to mutate, or we might silently break reasonable code. So I guess the log message doesn't describe this intent clearly enough. > > why, in the case of a multibyte STRING, does the code look at the > > codepoints of FROMCHAR and TOCHAR and not at the number of bytes they > > take in the internal Emacs representation of the characters? > > It's a conservative approximation that is much simpler than computing the size of the internal representation. (It's also the condition proposed in bug#70784.) Which part of bug#70784 suggested that? (It's a very long discussion, and the suggestion at the beginning talks only about the unibyte case.) More to the point, the length of the multibyte string deterministically depends on the character's codepoint, so I don't really understand why you say it's "much simpler". We could have a primitive, say, char-bytes, to do that even faster, if we want this to be as efficient as possible. This will allow a large subset of calls (without INPLACE = t) to be much faster than it is now, without resizing the string. IOW, we will be able to "avoid resizing mutation" in many more cases.