unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: emacs-devel@gnu.org
Subject: `aset` on strings, changing the size in bytes
Date: Fri, 07 Sep 2018 15:52:57 -0400	[thread overview]
Message-ID: <jwvmustqf4c.fsf-monnier+emacs@gnu.org> (raw)

[ As some of you may know, I like my strings to be immutable.
  But having tried it, my conclusion is that making Elisp strings
  immutable doesn't bring significant benefits because, while strings are
  rarely modified in-place w.r.t their sequence of characters, they are
  often modified in terms of the text-properties (although the
  `propertize` function has reduced the occurrence of such modifications
  to some extent).  ]

One of the ugliest part of string mutation is that the `aset` operation
on a string can take time proportional to the size of the string instead
of being a constant-time operation.

There are two causes:
- conversion between char-positions and byte-positions may need to scan
  the string (for strings which contain non-ASCII chars).
- the `aset` operation may change the size of the strings in bytes, so
  it may require allocating a whole new chunk of memory, copying the old
  string's bytes there, placing the new char at its proper position.

This second cause is rather hypothetical: it occurs very very rarely.
But it has far reaching consequences in the implementation of strings,
making it necessary to be able to relocate a string's bytes and hence
requiring an additional indirection.

Currently, this indirection comes "for free" since we use that same
indirection to let the GC compact the set of string-data-bytes objects
to try and reduce memory fragmentation.  But I think we should not have
our high-level API impose such an indirection at the lower level,
especially since this (mis)feature is virtually never used.

So here's my request: could we declare that we deprecate the use `aset`
on strings when it causes the string's length in bytes to change?  In my
experience, all the code I found which could trigger this behavior was
easily changed without loss of efficiency (e.g. by asking
subst-char-in-string not to work in-place, or by using a vector instead
of a string and converting the vector into a string once all the
modifications are done, ...).

This means, it's still perfectly OK to use `aset` to replace an ASCII
char with another ASCII char, and to use `aset` on any unibyte string.

Of course, such a backward incompatible change would need to be
introduced gradually, especially since it's virtually impossible to find
offending chunks of code other than by runtime testing.  First we'd
declare the practice deprecated; then we'd start emitting warnings when
it happens, conditional on a flag that's disabled by default; then we'd
change the default of the flag.


        Stefan



             reply	other threads:[~2018-09-07 19:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07 19:52 Stefan Monnier [this message]
2018-09-07 21:33 ` `aset` on strings, changing the size in bytes Johan Bockgård
2018-09-07 23:12   ` Paul Eggert
2018-09-07 23:41     ` John Wiegley
2018-09-08  5:17       ` Richard Stallman
2018-09-08  6:34       ` Eli Zaretskii
2018-09-08  2:04     ` Stefan Monnier
2018-09-08  2:17       ` Paul Eggert
2018-09-08  6:41     ` Eli Zaretskii
2018-09-08 18:03       ` Stefan Monnier
2018-09-08 18:20         ` Eli Zaretskii
2018-09-08 18:36           ` Stefan Monnier
2018-09-08 20:59             ` Eli Zaretskii
2018-09-08 22:09               ` Stefan Monnier
2018-09-09  5:22                 ` Eli Zaretskii
2018-09-10  0:18                   ` Stefan Monnier
2018-09-09  6:07           ` Richard Stallman
2018-09-09  6:26             ` Eli Zaretskii
2018-09-09 14:44               ` Noam Postavsky
2018-09-09 15:17                 ` Eli Zaretskii
2018-09-09 16:27                   ` Noam Postavsky
2018-09-10  5:48                     ` Richard Stallman
2018-09-10  3:03                   ` Stefan Monnier
2018-10-16 21:05                   ` Stefan Monnier
2018-09-10  5:47                 ` Richard Stallman
2018-09-10  5:48               ` Richard Stallman
2018-09-10  3:02             ` Stefan Monnier
2018-09-08  6:03 ` Helmut Eller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvmustqf4c.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).