unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Dmitry Gutov <dmitry@gutov.dev>
Cc: "Mattias Engdegård" <mattias.engdegard@gmail.com>,
	68244@debbugs.gnu.org, "Eli Zaretskii" <eliz@gnu.org>
Subject: bug#68244: hash-table improvements
Date: Sun, 07 Jan 2024 00:26:56 -0500	[thread overview]
Message-ID: <jwv4jfp4t3a.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <d91c1625-56e8-4e50-9b0a-e041d1acf716@gutov.dev> (Dmitry Gutov's message of "Sun, 7 Jan 2024 05:13:39 +0200")

Side note: I think reasoning won't get us out of this: either we decide
the choice is important, and then we try and do some
profiling/benchmarking, or we decide it's probably not worth the effort
and make an arbitrary choice under the expectation that it probably
won't make any difference anyway.

The use of memory allocation as a way to decide when to do the next GC
is just a crude tool anyway, which can often result in bad GC decisions,
anyway (e.g. typically during long periods of initialization where we
allocate many objects but don't generate almost any garbage).

We sadly don't have a better alternative, but being crude means that the
details usually don't matter anyway.


        Stefan


Dmitry Gutov [2024-01-07 05:13:39] wrote:

> On 06/01/2024 13:34, Mattias Engdegård wrote:
>> 5 jan. 2024 kl. 16.41 skrev Dmitry Gutov <dmitry@gutov.dev>:
>> 
>>>> That's a good question and it all comes down to how we interpret
>>> `consing_until_gc`. Here we take the view that it should encompass all
>>> parts of an allocation and this seems to be consistent with
>>> existing code.
>>>
>>> But the existing code used objects that would need to be collected by GC,
>>> right? And the new one, seemingly, does not.
>> But it does, similar to the same way that we deal with string data.
>
> Actually, vectors might be a better comparison. And we do increase the tally
> when creating a vector (inside 'allocate_vectorlike').
>
>>> So I don't quite see the advantage of increasing consing_until_gc
>>> then. It's like the difference between creating new strings and inserting
>>> strings into a buffer: new memory is used either way, but the latter
>>> doesn't increase consing.
>> Since we don't know exactly when objects die, we use object allocation as
>> a proxy, assuming that on average A bytes die for every B bytes allocated
>> and make an informed (and adjusted) guess as to what the A/B ratio might
>> be. That is the basis for the GC clock.
>> Buffer memory is indeed treated differently and does not advance the GC
>> clock as far as I can tell. Presumably the reasoning is that buffer size
>> changes make a poor proxy for object deaths.
>
> Perhaps we could look at it differently: what are the failure modes for not
> increasing the tally.
>
> For strings, one could allocate a handful of very long strings, taking up
> a lot of memory, and if the consing tally did not take into account the
> lengths of the strings, the GC might never start, and we die of OOM.
>
> For vectors, it almost looks different (the contained values are already
> counted, and they'd usually be larger than the memory taken by one cell),
> but then you could put many copies of the same value (could even be nil)
> into a large vector, and we're back to the same problem.
>
> Could we do something like that with a hash-table? Probably not - the
> hashing should at least guarantee 'eq' uniqueness. But then I suppose
> someone could create an empty hash-table of a very large size. If the
> internal vectors are pre-allocated, that could have the same effect as
> the above.
>
> The same reasoning could work for buffers too, but are they actually
> garbage-collected?
>
>> Of course we could reason that growing an existing hash table is also
>> a bad proxy for object deaths, but the evidence for that is weak so I used
>> the same metric as for other data structures just to be on the safe side.
>>
>> This reminds me that the `gcstat` bookkeeping should probably include the
>> hash-table ancillary arrays as well, since those counters are used to
>> adjust the GC clock (see total_bytes_of_live_objects and
>> consing_threshold). Will fix!
>> 
>>> It's great that the new hash tables are garbage-collected more easily and
>>> produce less garbage overall, but in a real program any GC cycle will
>>> have to traverse the other data structures anyway. So we might be leaving
>>> free performance gains on the table when we induce GC cycles while no
>>> managed allocations are done. I could be missing something, of course.
>> So could I, and please know that your questions are much appreciated. Are
>> you satisfied by my replies above, or did I misunderstand your concerns?
>
> Thank you. I hope I'm not too off mark with my reasoning.






  reply	other threads:[~2024-01-07  5:26 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <170438379722.3921.9312235725296561206@vcs2.savannah.gnu.org>
     [not found] ` <20240104155642.B4A99C00344@vcs2.savannah.gnu.org>
     [not found]   ` <8d49ebdc-9da7-4e70-a080-d8e892b980b6@gutov.dev>
2024-01-05 10:33     ` bug#68244: hash-table improvements Mattias Engdegård
2024-01-05 15:41       ` Dmitry Gutov
2024-01-06 11:34         ` Mattias Engdegård
2024-01-06 11:51           ` Eli Zaretskii
2024-01-07  3:13           ` Dmitry Gutov
2024-01-07  5:26             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2024-01-07 15:39               ` Dmitry
2024-01-07 18:36               ` Mattias Engdegård
2024-01-07 19:10                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-08 18:26                   ` Mattias Engdegård
2024-01-09  0:33                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-09 10:26                       ` Mattias Engdegård
2024-01-13 20:06                         ` Mattias Engdegård
2024-01-04 16:27                           ` Mattias Engdegård
2024-01-04 16:39                             ` Eli Zaretskii
2024-01-04 17:02                               ` Mattias Engdegård
2024-01-04 17:45                                 ` Eli Zaretskii
2024-01-05 11:34                                   ` Mattias Engdegård
2024-01-05 17:14                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-06 11:46                                       ` Mattias Engdegård
2024-01-09 21:51                             ` Stefan Kangas
2024-01-12 15:42                               ` Mattias Engdegård
2024-01-14 22:08                             ` Andy Moreton
2024-01-15 12:31                               ` Eli Zaretskii
2024-01-15 13:26                                 ` Mattias Engdegård
2024-01-18 18:13                                   ` Mattias Engdegård
2024-01-15 20:01                             ` Andy Moreton
2024-01-15 20:21                               ` Eli Zaretskii
2024-01-16 21:57                             ` Andy Moreton
2024-01-17  3:31                               ` Eli Zaretskii
2024-01-18 20:29                             ` Andy Moreton
2024-01-19  6:37                               ` Eli Zaretskii
2024-01-20 20:20                             ` Andy Moreton
2024-01-21  5:11                               ` Eli Zaretskii
2024-01-21 13:03                             ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-22  9:17                               ` João Távora
2024-01-22  9:18                                 ` João Távora
2024-01-23  9:44                                 ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-14  5:25                           ` Gerd Möllmann
2024-01-14 14:42                             ` Mattias Engdegård
2024-01-21 12:41                           ` Stefan Kangas
2024-02-08  9:46                           ` Mattias Engdegård
2024-02-08 14:19                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-08 14:36                               ` Gerd Möllmann
2024-02-08 14:42                               ` Mattias Engdegård
2024-02-08 15:13                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-08 17:29                                   ` Mattias Engdegård
2024-02-08 17:49                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-12 12:16                                       ` Mattias Engdegård
2024-02-12 13:36                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-13  9:05                                         ` Gerd Möllmann
2024-02-13 10:12                                           ` Mattias Engdegård
2024-02-13 12:12                                             ` Gerd Möllmann
2024-02-13 12:43                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-14 12:37                                               ` Mattias Engdegård
2024-02-14 13:05                                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-14 13:21                                                   ` Mattias Engdegård
2024-02-17 19:50                                                     ` Mattias Engdegård
2024-02-20 10:21                                                       ` Mattias Engdegård
2024-02-20 14:00                                                         ` Eli Zaretskii
2024-02-20 16:11                                                           ` Mattias Engdegård
2024-02-20 17:12                                                             ` Eli Zaretskii
2024-02-21 12:59                                                               ` Eli Zaretskii
2024-02-21 20:13                                                                 ` Andrea Corallo
2024-02-23 12:16                                                                 ` Mattias Engdegård
2024-02-24  9:45                                                                   ` Mattias Engdegård
2024-02-24 10:30                                                                     ` Eli Zaretskii
2024-02-24 10:53                                                                       ` Mattias Engdegård
2024-02-24 11:03                                                                         ` Eli Zaretskii
2024-02-24 17:20                                                                           ` Dmitry Gutov
2024-02-24 17:43                                                                             ` Mattias Engdegård
2024-02-24 17:48                                                                               ` Dmitry Gutov
2024-02-24 17:53                                                                                 ` Mattias Engdegård
2024-02-24 18:08                                                                                   ` Eli Zaretskii
2024-02-24 18:31                                                                                     ` Dmitry Gutov
2024-02-24 17:54                                                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-24 17:14                                                                         ` Dmitry Gutov
2024-02-24  2:46                                           ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwv4jfp4t3a.fsf-monnier+emacs@gnu.org \
    --to=bug-gnu-emacs@gnu.org \
    --cc=68244@debbugs.gnu.org \
    --cc=dmitry@gutov.dev \
    --cc=eliz@gnu.org \
    --cc=mattias.engdegard@gmail.com \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).