From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: `aset` on strings, changing the size in bytes Date: Fri, 07 Sep 2018 15:52:57 -0400 Message-ID: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1536349869 20794 195.159.176.226 (7 Sep 2018 19:51:09 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 7 Sep 2018 19:51:09 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 07 21:51:05 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fyMmW-0005KX-Tu for ged-emacs-devel@m.gmane.org; Fri, 07 Sep 2018 21:51:05 +0200 Original-Received: from localhost ([::1]:40078 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fyMod-00054U-8z for ged-emacs-devel@m.gmane.org; Fri, 07 Sep 2018 15:53:15 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58514) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fyMoR-000548-23 for emacs-devel@gnu.org; Fri, 07 Sep 2018 15:53:03 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fyMoM-0004KX-LI for emacs-devel@gnu.org; Fri, 07 Sep 2018 15:53:02 -0400 Original-Received: from pruche.dit.umontreal.ca ([132.204.246.22]:38941) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fyMoM-0004K6-F5 for emacs-devel@gnu.org; Fri, 07 Sep 2018 15:52:58 -0400 Original-Received: from pastel.home (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w87JqvSS024321; Fri, 7 Sep 2018 15:52:57 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id 202DD6A38A; Fri, 7 Sep 2018 15:52:57 -0400 (EDT) X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 2 Rules triggered EDT_SA_DN_PASS=0, RV6369=0 X-NAI-Spam-Version: 2.3.0.9418 : core <6369> : inlines <6864> : streams <1797786> : uri <2705285> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 132.204.246.22 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:229447 Archived-At: [ As some of you may know, I like my strings to be immutable. But having tried it, my conclusion is that making Elisp strings immutable doesn't bring significant benefits because, while strings are rarely modified in-place w.r.t their sequence of characters, they are often modified in terms of the text-properties (although the `propertize` function has reduced the occurrence of such modifications to some extent). ] One of the ugliest part of string mutation is that the `aset` operation on a string can take time proportional to the size of the string instead of being a constant-time operation. There are two causes: - conversion between char-positions and byte-positions may need to scan the string (for strings which contain non-ASCII chars). - the `aset` operation may change the size of the strings in bytes, so it may require allocating a whole new chunk of memory, copying the old string's bytes there, placing the new char at its proper position. This second cause is rather hypothetical: it occurs very very rarely. But it has far reaching consequences in the implementation of strings, making it necessary to be able to relocate a string's bytes and hence requiring an additional indirection. Currently, this indirection comes "for free" since we use that same indirection to let the GC compact the set of string-data-bytes objects to try and reduce memory fragmentation. But I think we should not have our high-level API impose such an indirection at the lower level, especially since this (mis)feature is virtually never used. So here's my request: could we declare that we deprecate the use `aset` on strings when it causes the string's length in bytes to change? In my experience, all the code I found which could trigger this behavior was easily changed without loss of efficiency (e.g. by asking subst-char-in-string not to work in-place, or by using a vector instead of a string and converting the vector into a string once all the modifications are done, ...). This means, it's still perfectly OK to use `aset` to replace an ASCII char with another ASCII char, and to use `aset` on any unibyte string. Of course, such a backward incompatible change would need to be introduced gradually, especially since it's virtually impossible to find offending chunks of code other than by runtime testing. First we'd declare the practice deprecated; then we'd start emitting warnings when it happens, conditional on a flag that's disabled by default; then we'd change the default of the flag. Stefan