unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* `aset` on strings, changing the size in bytes
@ 2018-09-07 19:52 Stefan Monnier
  2018-09-07 21:33 ` Johan Bockgård
  2018-09-08  6:03 ` Helmut Eller
  0 siblings, 2 replies; 28+ messages in thread
From: Stefan Monnier @ 2018-09-07 19:52 UTC (permalink / raw)
  To: emacs-devel

[ As some of you may know, I like my strings to be immutable.
  But having tried it, my conclusion is that making Elisp strings
  immutable doesn't bring significant benefits because, while strings are
  rarely modified in-place w.r.t their sequence of characters, they are
  often modified in terms of the text-properties (although the
  `propertize` function has reduced the occurrence of such modifications
  to some extent).  ]

One of the ugliest part of string mutation is that the `aset` operation
on a string can take time proportional to the size of the string instead
of being a constant-time operation.

There are two causes:
- conversion between char-positions and byte-positions may need to scan
  the string (for strings which contain non-ASCII chars).
- the `aset` operation may change the size of the strings in bytes, so
  it may require allocating a whole new chunk of memory, copying the old
  string's bytes there, placing the new char at its proper position.

This second cause is rather hypothetical: it occurs very very rarely.
But it has far reaching consequences in the implementation of strings,
making it necessary to be able to relocate a string's bytes and hence
requiring an additional indirection.

Currently, this indirection comes "for free" since we use that same
indirection to let the GC compact the set of string-data-bytes objects
to try and reduce memory fragmentation.  But I think we should not have
our high-level API impose such an indirection at the lower level,
especially since this (mis)feature is virtually never used.

So here's my request: could we declare that we deprecate the use `aset`
on strings when it causes the string's length in bytes to change?  In my
experience, all the code I found which could trigger this behavior was
easily changed without loss of efficiency (e.g. by asking
subst-char-in-string not to work in-place, or by using a vector instead
of a string and converting the vector into a string once all the
modifications are done, ...).

This means, it's still perfectly OK to use `aset` to replace an ASCII
char with another ASCII char, and to use `aset` on any unibyte string.

Of course, such a backward incompatible change would need to be
introduced gradually, especially since it's virtually impossible to find
offending chunks of code other than by runtime testing.  First we'd
declare the practice deprecated; then we'd start emitting warnings when
it happens, conditional on a flag that's disabled by default; then we'd
change the default of the flag.


        Stefan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-07 19:52 `aset` on strings, changing the size in bytes Stefan Monnier
@ 2018-09-07 21:33 ` Johan Bockgård
  2018-09-07 23:12   ` Paul Eggert
  2018-09-08  6:03 ` Helmut Eller
  1 sibling, 1 reply; 28+ messages in thread
From: Johan Bockgård @ 2018-09-07 21:33 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> So here's my request: could we declare that we deprecate the use `aset`
> on strings when it causes the string's length in bytes to change?

According to the documentation, that is already an error...

(info "(elisp) Modifying Strings")



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-07 21:33 ` Johan Bockgård
@ 2018-09-07 23:12   ` Paul Eggert
  2018-09-07 23:41     ` John Wiegley
                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Paul Eggert @ 2018-09-07 23:12 UTC (permalink / raw)
  To: Johan Bockgård, Stefan Monnier; +Cc: emacs-devel

Johan Bockgård wrote:
> According to the documentation, that is already an error...
> 
> (info "(elisp) Modifying Strings")

Cool! That means we can do Stefan's request simply by reverting Kenichi Handa's 
patch that introduced the ability to change the byte length of a string (commit 
3c9de1afcde82a99137721436c822059cce79b5b dated 2000-07-21 06:45:30 UTC), since 
that patch made the code explicitly disagree with the documentation. Though this 
leaves open the question of why Handa made that change in the first place.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-07 23:12   ` Paul Eggert
@ 2018-09-07 23:41     ` John Wiegley
  2018-09-08  5:17       ` Richard Stallman
  2018-09-08  6:34       ` Eli Zaretskii
  2018-09-08  2:04     ` Stefan Monnier
  2018-09-08  6:41     ` Eli Zaretskii
  2 siblings, 2 replies; 28+ messages in thread
From: John Wiegley @ 2018-09-07 23:41 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Stefan Monnier, Johan Bockgård, emacs-devel

>>>>> "PE" == Paul Eggert <eggert@cs.ucla.edu> writes:

PE> Cool! That means we can do Stefan's request simply by reverting Kenichi
PE> Handa's patch that introduced the ability to change the byte length of a
PE> string (commit 3c9de1afcde82a99137721436c822059cce79b5b dated 2000-07-21
PE> 06:45:30 UTC), since that patch made the code explicitly disagree with the
PE> documentation. Though this leaves open the question of why Handa made that
PE> change in the first place.

I'd like to hear Handa's rationale, but otherwise am +1 for reverting.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-07 23:12   ` Paul Eggert
  2018-09-07 23:41     ` John Wiegley
@ 2018-09-08  2:04     ` Stefan Monnier
  2018-09-08  2:17       ` Paul Eggert
  2018-09-08  6:41     ` Eli Zaretskii
  2 siblings, 1 reply; 28+ messages in thread
From: Stefan Monnier @ 2018-09-08  2:04 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Johan Bockgård, emacs-devel

>> According to the documentation, that is already an error...
>> (info "(elisp) Modifying Strings")
> Cool! That means we can do Stefan's request simply by reverting Kenichi
> Handa's patch that introduced the ability to change the byte length
> of a string (commit 3c9de1afcde82a99137721436c822059cce79b5b dated
> 2000-07-21 06:45:30 UTC), since that patch made the code explicitly disagree
> with the documentation.

It's very easy to accidentally use that feature, so just because it was
in some doc somewhere doesn't mean no code makes use of it, or that any
code that does deserves to get an error.

So we can't just revert that patch.  We should first announce the
reversal of stance (if not compared to the doc, at least compared to
the code), and then introduce some test&warning.

> Though this leaves open the question of why Handa made that change in
> the first place.

I remember being disappointed by the change, but I can't remember
exactly what was the original justification.  I think the stance was
simply that it was reasonably easy to implement (because of the
pre-existing string-compaction code) and it would satisfy
a user request.


        Stefan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-08  2:04     ` Stefan Monnier
@ 2018-09-08  2:17       ` Paul Eggert
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Eggert @ 2018-09-08  2:17 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Johan Bockgård, emacs-devel

Stefan Monnier wrote:
> So we can't just revert that patch.  We should first announce the
> reversal of stance (if not compared to the doc, at least compared to
> the code), and then introduce some test&warning.

Sounds good to me.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-07 23:41     ` John Wiegley
@ 2018-09-08  5:17       ` Richard Stallman
  2018-09-08  6:34       ` Eli Zaretskii
  1 sibling, 0 replies; 28+ messages in thread
From: Richard Stallman @ 2018-09-08  5:17 UTC (permalink / raw)
  To: John Wiegley; +Cc: eggert, monnier, bojohan, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > PE> Cool! That means we can do Stefan's request simply by reverting Kenichi
  > PE> Handa's patch that introduced the ability to change the byte length of a
  > PE> string (commit 3c9de1afcde82a99137721436c822059cce79b5b dated 2000-07-21
  > PE> 06:45:30 UTC), since that patch made the code explicitly disagree with the
  > PE> documentation. Though this leaves open the question of why Handa made that
  > PE> change in the first place.

This case used to work in the past, before support for non-ASCII
characters.  Since it is part of the natural domain of aset, the
failure to support it feels like a bug.  It made sense to add support
for that case, to make the function more complete.

I guess the documentation was not updated.

Normally I would say, let's update the manual now.  Slower support is
better than failing.

The point about perhaps changing GC is significant.  Perhaps that is
a good reason to leave that case unsupported.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-07 19:52 `aset` on strings, changing the size in bytes Stefan Monnier
  2018-09-07 21:33 ` Johan Bockgård
@ 2018-09-08  6:03 ` Helmut Eller
  1 sibling, 0 replies; 28+ messages in thread
From: Helmut Eller @ 2018-09-08  6:03 UTC (permalink / raw)
  To: emacs-devel

On Fri, Sep 07 2018, Stefan Monnier wrote:

> Currently, this indirection comes "for free" since we use that same
> indirection to let the GC compact the set of string-data-bytes objects
> to try and reduce memory fragmentation.  But I think we should not have
> our high-level API impose such an indirection at the lower level,
> especially since this (mis)feature is virtually never used.

Would it be hard to have two internal string types, where the "normal"
string type has no indirection but can be changed (by updating the type
header) to the other type that has the indirection?

Helmut




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-07 23:41     ` John Wiegley
  2018-09-08  5:17       ` Richard Stallman
@ 2018-09-08  6:34       ` Eli Zaretskii
  1 sibling, 0 replies; 28+ messages in thread
From: Eli Zaretskii @ 2018-09-08  6:34 UTC (permalink / raw)
  To: John Wiegley, Kenichi Handa; +Cc: eggert, monnier, bojohan, emacs-devel

> From: "John Wiegley" <johnw@gnu.org>
> Date: Fri, 07 Sep 2018 16:41:04 -0700
> Cc: Stefan Monnier <monnier@IRO.UMontreal.CA>,
> 	Johan Bockgård <bojohan@gnu.org>, emacs-devel@gnu.org
> 
> >>>>> "PE" == Paul Eggert <eggert@cs.ucla.edu> writes:
> 
> PE> Cool! That means we can do Stefan's request simply by reverting Kenichi
> PE> Handa's patch that introduced the ability to change the byte length of a
> PE> string (commit 3c9de1afcde82a99137721436c822059cce79b5b dated 2000-07-21
> PE> 06:45:30 UTC), since that patch made the code explicitly disagree with the
> PE> documentation. Though this leaves open the question of why Handa made that
> PE> change in the first place.
> 
> I'd like to hear Handa's rationale, but otherwise am +1 for reverting.

Won't work without CC'ing Handa-san ;-)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-07 23:12   ` Paul Eggert
  2018-09-07 23:41     ` John Wiegley
  2018-09-08  2:04     ` Stefan Monnier
@ 2018-09-08  6:41     ` Eli Zaretskii
  2018-09-08 18:03       ` Stefan Monnier
  2 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2018-09-08  6:41 UTC (permalink / raw)
  To: Paul Eggert, Kenichi Handa; +Cc: monnier, bojohan, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 7 Sep 2018 16:12:07 -0700
> Cc: emacs-devel@gnu.org
> 
> Johan Bockgård wrote:
> > According to the documentation, that is already an error...
> > 
> > (info "(elisp) Modifying Strings")
> 
> Cool! That means we can do Stefan's request simply by reverting Kenichi Handa's 
> patch that introduced the ability to change the byte length of a string (commit 
> 3c9de1afcde82a99137721436c822059cce79b5b dated 2000-07-21 06:45:30 UTC), since 
> that patch made the code explicitly disagree with the documentation. Though this 
> leaves open the question of why Handa made that change in the first place.

What's the justification for such an incompatible change, though?
This feature _is_ used, e.g. see ruler-mode.el.

I understand that the effect of the change will be that whenever one
wants to mutate a string by replacing a character, they would need to
cons a new string, with the likes of

  (setq str (concat (substring str ...) new-char (substring str ...)))

is that right?  Which means in practice that 'aset' will need to
generally disappear from string-processing code, because in practice
it is impossible to know when the byte length will change without
complicated calculations, so robust code will need to drop use of
'aset' for strings, except in a small set of specialized situations.
Is that what we want?

Or maybe the proposal is to modify 'aset' to do the above under the
hood?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-08  6:41     ` Eli Zaretskii
@ 2018-09-08 18:03       ` Stefan Monnier
  2018-09-08 18:20         ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Monnier @ 2018-09-08 18:03 UTC (permalink / raw)
  To: emacs-devel

> What's the justification for such an incompatible change, though?
> This feature _is_ used, e.g. see ruler-mode.el.

Indeed, ruler-mode is the main user I know of this feature.

> I understand that the effect of the change will be that whenever one
> wants to mutate a string by replacing a character, they would need to
> cons a new string, with the likes of
>
>   (setq str (concat (substring str ...) new-char (substring str ...)))
>
> is that right?

There are several alternatives.  The ones I'm familiar with are:
- the `concat` option you mention
- use a unibyte string
- use a vector rather than a string
- use a (temp)buffer rather than a string
For ruler-mode, I've found the temp-buffer to be the better option.

> Which means in practice that 'aset' will need to generally disappear
> from string-processing code, because in practice it is impossible to
> know when the byte length will change without complicated
> calculations, so robust code will need to drop use of 'aset' for
> strings, except in a small set of specialized situations.

In the general case, indeed.  But in my experience, `aset` is used very
rarely on strings and many of those cases are known to only involve
ASCII chars or to work on unibyte strings.

> Or maybe the proposal is to modify 'aset' to do the above under the
> hood?

No, "'aset' do the above under the hood?" is what we have now.


        Stefan




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-08 18:03       ` Stefan Monnier
@ 2018-09-08 18:20         ` Eli Zaretskii
  2018-09-08 18:36           ` Stefan Monnier
  2018-09-09  6:07           ` Richard Stallman
  0 siblings, 2 replies; 28+ messages in thread
From: Eli Zaretskii @ 2018-09-08 18:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 08 Sep 2018 14:03:10 -0400
> 
> > Or maybe the proposal is to modify 'aset' to do the above under the
> > hood?
> 
> No, "'aset' do the above under the hood?" is what we have now.

That's not my reading of the code.  It doesn't seem to cons a new
string.

But if I'm mistaken, and the current implementation does cons a new
string, then what is your problem with it?  The original string is not
mutable, it's just replaced wit ha new one.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-08 18:20         ` Eli Zaretskii
@ 2018-09-08 18:36           ` Stefan Monnier
  2018-09-08 20:59             ` Eli Zaretskii
  2018-09-09  6:07           ` Richard Stallman
  1 sibling, 1 reply; 28+ messages in thread
From: Stefan Monnier @ 2018-09-08 18:36 UTC (permalink / raw)
  To: emacs-devel

> That's not my reading of the code.  It doesn't seem to cons a new
> string.

It does not allocate a new Lisp_String object, but it does allocate
a new sdata object.  Allocating a new Lisp_Object is not really an
option, because `aset` couldn't return that new object (it has to work
in-place).

> But if I'm mistaken, and the current implementation does cons a new
> string, then what is your problem with it?

The need for an indirection (a String_Object has to hold a pointer to an
sdata object rather than being able to keep its payload directly in the
Lisp_String object (using FLEXIBLE_ARRAY_MEMBER)) and the unexpected
memory allocation behavior (users who aren't privy to the underlying
implementation would never expect a primitive like `aset` to internally
perform a copy of the string's bytes).


        Stefan




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-08 18:36           ` Stefan Monnier
@ 2018-09-08 20:59             ` Eli Zaretskii
  2018-09-08 22:09               ` Stefan Monnier
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2018-09-08 20:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 08 Sep 2018 14:36:46 -0400
> 
> The need for an indirection (a String_Object has to hold a pointer to an
> sdata object rather than being able to keep its payload directly in the
> Lisp_String object (using FLEXIBLE_ARRAY_MEMBER))

Why is that a problem?

> and the unexpected memory allocation behavior (users who aren't
> privy to the underlying implementation would never expect a
> primitive like `aset` to internally perform a copy of the string's
> bytes).

Only C-level programming sees that, and it will never see it if
strings are treated like buffers, i.e. not passed as C pointers to
text.  So I don't see why this would be a problem, given the small
number of people who work at that level, and an even smaller number of
people who make changes in the low-level code which depends on that.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-08 20:59             ` Eli Zaretskii
@ 2018-09-08 22:09               ` Stefan Monnier
  2018-09-09  5:22                 ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Monnier @ 2018-09-08 22:09 UTC (permalink / raw)
  To: emacs-devel

>> The need for an indirection (a String_Object has to hold a pointer to an
>> sdata object rather than being able to keep its payload directly in the
>> Lisp_String object (using FLEXIBLE_ARRAY_MEMBER))
> Why is that a problem?

It slows down every string access, and increases the heap size of every
string (currently they're something like N bytes of payload plus
5 words where 2 of those 5 words are due to the extra indirection).

For a feature that's almost never used, I think it's pretty costly.


        Stefan




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-08 22:09               ` Stefan Monnier
@ 2018-09-09  5:22                 ` Eli Zaretskii
  2018-09-10  0:18                   ` Stefan Monnier
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2018-09-09  5:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 08 Sep 2018 18:09:41 -0400
> 
> >> The need for an indirection (a String_Object has to hold a pointer to an
> >> sdata object rather than being able to keep its payload directly in the
> >> Lisp_String object (using FLEXIBLE_ARRAY_MEMBER))
> > Why is that a problem?
> 
> It slows down every string access, and increases the heap size of every
> string (currently they're something like N bytes of payload plus
> 5 words where 2 of those 5 words are due to the extra indirection).
> 
> For a feature that's almost never used, I think it's pretty costly.

It can be (and was) used by memory-allocation infrastructure,
especially with very large strings.  Are we sure we want to lose it
for slowdown that should be hardly perceptible?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-08 18:20         ` Eli Zaretskii
  2018-09-08 18:36           ` Stefan Monnier
@ 2018-09-09  6:07           ` Richard Stallman
  2018-09-09  6:26             ` Eli Zaretskii
  2018-09-10  3:02             ` Stefan Monnier
  1 sibling, 2 replies; 28+ messages in thread
From: Richard Stallman @ 2018-09-09  6:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > But if I'm mistaken, and the current implementation does cons a new
  > string, then what is your problem with it?  The original string is not
  > mutable, it's just replaced wit ha new one.

"Replaced with the new one" requires relocatable strings.

But suppose we made a function like aset which returned
the string.  That way, it could return a new string when necessary.
It would be used like

   (setq s (sset s idx newchar))

Or we could define it to replace a substring:

   (setq s (srep s from to newsubstring))

It could also accept a regexp as FROM
and a match-string number as TO.  srep would always
make a new string, and nsrep would modify a string in place
if that is possible.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09  6:07           ` Richard Stallman
@ 2018-09-09  6:26             ` Eli Zaretskii
  2018-09-09 14:44               ` Noam Postavsky
  2018-09-10  5:48               ` Richard Stallman
  2018-09-10  3:02             ` Stefan Monnier
  1 sibling, 2 replies; 28+ messages in thread
From: Eli Zaretskii @ 2018-09-09  6:26 UTC (permalink / raw)
  To: rms; +Cc: monnier, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sun, 09 Sep 2018 02:07:15 -0400
> 
>   > But if I'm mistaken, and the current implementation does cons a new
>   > string, then what is your problem with it?  The original string is not
>   > mutable, it's just replaced wit ha new one.
> 
> "Replaced with the new one" requires relocatable strings.

Not necessarily, not in the example I've shown:

  (setq str (do-something-with str))

> But suppose we made a function like aset which returned
> the string.  That way, it could return a new string when necessary.
> It would be used like
> 
>    (setq s (sset s idx newchar))
> 
> Or we could define it to replace a substring:
> 
>    (setq s (srep s from to newsubstring))

Why not have aset do this under the hood?  Then we won't need to ask
people to change code that worked for decades.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09  6:26             ` Eli Zaretskii
@ 2018-09-09 14:44               ` Noam Postavsky
  2018-09-09 15:17                 ` Eli Zaretskii
  2018-09-10  5:47                 ` Richard Stallman
  2018-09-10  5:48               ` Richard Stallman
  1 sibling, 2 replies; 28+ messages in thread
From: Noam Postavsky @ 2018-09-09 14:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emacs developers, Richard Stallman, Stefan Monnier

On 9 September 2018 at 02:26, Eli Zaretskii <eliz@gnu.org> wrote:

>> Or we could define it to replace a substring:
>>
>>    (setq s (srep s from to newsubstring))
>
> Why not have aset do this under the hood?  Then we won't need to ask
> people to change code that worked for decades.

It's unclear what you mean by that, the semantics of aset is to modify
the original string, changing that would surely break code that worked
for decades.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09 14:44               ` Noam Postavsky
@ 2018-09-09 15:17                 ` Eli Zaretskii
  2018-09-09 16:27                   ` Noam Postavsky
                                     ` (2 more replies)
  2018-09-10  5:47                 ` Richard Stallman
  1 sibling, 3 replies; 28+ messages in thread
From: Eli Zaretskii @ 2018-09-09 15:17 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: emacs-devel, rms, monnier

> From: Noam Postavsky <npostavs@gmail.com>
> Date: Sun, 9 Sep 2018 10:44:25 -0400
> Cc: Richard Stallman <rms@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>, 
> 	Emacs developers <emacs-devel@gnu.org>
> 
> On 9 September 2018 at 02:26, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> >> Or we could define it to replace a substring:
> >>
> >>    (setq s (srep s from to newsubstring))
> >
> > Why not have aset do this under the hood?  Then we won't need to ask
> > people to change code that worked for decades.
> 
> It's unclear what you mean by that

I meant to make aset cons a new string when it turns out the original
one's data is too small to include the new contents.

> the semantics of aset is to modify the original string, changing
> that would surely break code that worked for decades.

But if we are going to tell people aset won't work in those cases,
that code will be broken as well, no?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09 15:17                 ` Eli Zaretskii
@ 2018-09-09 16:27                   ` Noam Postavsky
  2018-09-10  5:48                     ` Richard Stallman
  2018-09-10  3:03                   ` Stefan Monnier
  2018-10-16 21:05                   ` Stefan Monnier
  2 siblings, 1 reply; 28+ messages in thread
From: Noam Postavsky @ 2018-09-09 16:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emacs developers, Richard Stallman, Stefan Monnier

On 9 September 2018 at 11:17, Eli Zaretskii <eliz@gnu.org> wrote:

> I meant to make aset cons a new string when it turns out the original
> one's data is too small to include the new contents.
>
>> the semantics of aset is to modify the original string, changing
>> that would surely break code that worked for decades.
>
> But if we are going to tell people aset won't work in those cases,
> that code will be broken as well, no?

Ah, I think I see the disconnect. You mean aset should return a new
string only in the case where the string needs to be resized (rather
than signalling an error), but otherwise continue to modify the string
in-place and return NEWELT as normal.

A conditional return value like that seems too awkward to be useful,
IMO. Instead of

(setq s (sset s idx newchar)) ; as in [1]

You would need:

(let ((new-s (aset s idx newchar)))
  (when (stringp new-s)
    (setq s new-s)))

In practice, I suspect people wouldn't bother a lot of the time, so in
the rare case where a string is resized there will just be confusingly
wrong answer instead of a clear error.

[1]: https://lists.gnu.org/archive/html/emacs-devel/2018-09/msg00387.html



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09  5:22                 ` Eli Zaretskii
@ 2018-09-10  0:18                   ` Stefan Monnier
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2018-09-10  0:18 UTC (permalink / raw)
  To: emacs-devel

>> >> The need for an indirection (a String_Object has to hold a pointer to an
>> >> sdata object rather than being able to keep its payload directly in the
>> >> Lisp_String object (using FLEXIBLE_ARRAY_MEMBER))
>> > Why is that a problem?
>> It slows down every string access, and increases the heap size of every
>> string (currently they're something like N bytes of payload plus
>> 5 words where 2 of those 5 words are due to the extra indirection).
>> For a feature that's almost never used, I think it's pretty costly.
> It can be (and was) used by memory-allocation infrastructure,
> especially with very large strings.  Are we sure we want to lose it
> for slowdown that should be hardly perceptible?

As mentioned in my original message, this indirection is used currently
for 2 purposes:
1- to implement `aset`.
2- to implement string compaction in the GC.

For this reason, either one of the two may be considered to "come for
free" if you presume the other one as a given.

Point nb 2 is a purely internal implementation detail, which we
could change whenever we feel like this decision gets in the way of
something preferable.

Point nb 1 OTOH cannot be changed so easily because it would be
a backward-incompatible change.


        Stefan




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09  6:07           ` Richard Stallman
  2018-09-09  6:26             ` Eli Zaretskii
@ 2018-09-10  3:02             ` Stefan Monnier
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2018-09-10  3:02 UTC (permalink / raw)
  To: Richard Stallman; +Cc: Eli Zaretskii, emacs-devel

> But suppose we made a function like aset which returned
> the string.  That way, it could return a new string when necessary.
> It would be used like
>
>    (setq s (sset s idx newchar))

Sounds good,


        Stefan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09 15:17                 ` Eli Zaretskii
  2018-09-09 16:27                   ` Noam Postavsky
@ 2018-09-10  3:03                   ` Stefan Monnier
  2018-10-16 21:05                   ` Stefan Monnier
  2 siblings, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2018-09-10  3:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Noam Postavsky, rms

> I meant to make aset cons a new string when it turns out the original
> one's data is too small to include the new contents.

`aset` is documented to return NEWELT, so what could it do with the
newly cons'd string?
We'd have to change `aset` so it returns this newly cons'd string, but
that would also be a backward incompatible change.  Better introduce
a new function `sset` for it.


        Stefan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09 14:44               ` Noam Postavsky
  2018-09-09 15:17                 ` Eli Zaretskii
@ 2018-09-10  5:47                 ` Richard Stallman
  1 sibling, 0 replies; 28+ messages in thread
From: Richard Stallman @ 2018-09-10  5:47 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: eliz, monnier, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > >> Or we could define it to replace a substring:
  > >>
  > >>    (setq s (srep s from to newsubstring))
  > >
  > > Why not have aset do this under the hood?  Then we won't need to ask
  > > people to change code that worked for decades.

  > It's unclear what you mean by that, the semantics of aset is to modify
  > the original string, changing that would surely break code that worked
  > for decades.

I propose new functions sset and srep.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09  6:26             ` Eli Zaretskii
  2018-09-09 14:44               ` Noam Postavsky
@ 2018-09-10  5:48               ` Richard Stallman
  1 sibling, 0 replies; 28+ messages in thread
From: Richard Stallman @ 2018-09-10  5:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > >    (setq s (srep s from to newsubstring))

  > Why not have aset do this under the hood?

aset, with its current specification, cannot do this
because there is no way to replace the old string (as a Lisp object)
with a new string.

Currently a string object points to a separate string text.  That enables
aset to replace the old string text with a new string text.

If we wanted to change the string representation, that might become
impossible.  That might cause problems in places that uses aset.

I propose srep because some of those places might use srep instead.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09 16:27                   ` Noam Postavsky
@ 2018-09-10  5:48                     ` Richard Stallman
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Stallman @ 2018-09-10  5:48 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: eliz, monnier, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > (let ((new-s (aset s idx newchar)))
  >   (when (stringp new-s)
  >     (setq s new-s)))

This does not work.  aset does not return the string.
It returns the character that was stored.

Changing the code to use my proposed srep is simpler
than changing it to do this.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: `aset` on strings, changing the size in bytes
  2018-09-09 15:17                 ` Eli Zaretskii
  2018-09-09 16:27                   ` Noam Postavsky
  2018-09-10  3:03                   ` Stefan Monnier
@ 2018-10-16 21:05                   ` Stefan Monnier
  2 siblings, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2018-10-16 21:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Noam Postavsky, rms

>> >> Or we could define it to replace a substring:
>> >>    (setq s (srep s from to newsubstring))
>> > Why not have aset do this under the hood?  Then we won't need to ask
>> > people to change code that worked for decades.
>> It's unclear what you mean by that
> I meant to make aset cons a new string when it turns out the original
> one's data is too small to include the new contents.
>> the semantics of aset is to modify the original string, changing
>> that would surely break code that worked for decades.
> But if we are going to tell people aset won't work in those cases,
> that code will be broken as well, no?

So, IIUC you're suggesting to change `aset` so that when the string
needs to be resized, it does not modify the provided string but instead
returns a new string (with the requested modification).

The idea being that my suggestion would cause `aset` to treat such
a case as an error, so instead of signaling an error we could introduce
this new behavior.

Right?

If so, I agree that it's an option, but I think it'll be problematic in
practice:
- most callers of `aset` don't look at the return value, so if they
  expect the old behavior they'll just go on blissfully unaware that
  their `aset` did not happen.
- given the fact that we're breaking backward compatibility, it's
  important to introduce the change gradually, with warnings emitted
  during the transition to bring attention to possible troubles ahead.
  But with your suggested final behavior, I'm not sure how we could
  detect problematic uses.

IOW, the benefit of sset is that once we introduce sset we can:
- first change aset so that it emits a warning when it needs to allocate
  a new string.
- later turn this warning into an error.


        Stefan



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-10-16 21:05 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-07 19:52 `aset` on strings, changing the size in bytes Stefan Monnier
2018-09-07 21:33 ` Johan Bockgård
2018-09-07 23:12   ` Paul Eggert
2018-09-07 23:41     ` John Wiegley
2018-09-08  5:17       ` Richard Stallman
2018-09-08  6:34       ` Eli Zaretskii
2018-09-08  2:04     ` Stefan Monnier
2018-09-08  2:17       ` Paul Eggert
2018-09-08  6:41     ` Eli Zaretskii
2018-09-08 18:03       ` Stefan Monnier
2018-09-08 18:20         ` Eli Zaretskii
2018-09-08 18:36           ` Stefan Monnier
2018-09-08 20:59             ` Eli Zaretskii
2018-09-08 22:09               ` Stefan Monnier
2018-09-09  5:22                 ` Eli Zaretskii
2018-09-10  0:18                   ` Stefan Monnier
2018-09-09  6:07           ` Richard Stallman
2018-09-09  6:26             ` Eli Zaretskii
2018-09-09 14:44               ` Noam Postavsky
2018-09-09 15:17                 ` Eli Zaretskii
2018-09-09 16:27                   ` Noam Postavsky
2018-09-10  5:48                     ` Richard Stallman
2018-09-10  3:03                   ` Stefan Monnier
2018-10-16 21:05                   ` Stefan Monnier
2018-09-10  5:47                 ` Richard Stallman
2018-09-10  5:48               ` Richard Stallman
2018-09-10  3:02             ` Stefan Monnier
2018-09-08  6:03 ` Helmut Eller

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).