From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#68244: hash-table improvements Date: Sun, 7 Jan 2024 05:13:39 +0200 Message-ID: References: <170438379722.3921.9312235725296561206@vcs2.savannah.gnu.org> <20240104155642.B4A99C00344@vcs2.savannah.gnu.org> <8d49ebdc-9da7-4e70-a080-d8e892b980b6@gutov.dev> <08314177-5AE9-4352-94A0-641830B4094D@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37001"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla Thunderbird Cc: Eli Zaretskii , 68244@debbugs.gnu.org, Stefan Monnier To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Jan 07 04:14:31 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rMJcL-0009OC-Ua for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 07 Jan 2024 04:14:31 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rMJbq-000206-Ol; Sat, 06 Jan 2024 22:13:58 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rMJbq-0001zu-1x for bug-gnu-emacs@gnu.org; Sat, 06 Jan 2024 22:13:58 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rMJbp-0008GT-Pr for bug-gnu-emacs@gnu.org; Sat, 06 Jan 2024 22:13:57 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rMJbu-0004DL-IJ for bug-gnu-emacs@gnu.org; Sat, 06 Jan 2024 22:14:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 07 Jan 2024 03:14:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 68244 X-GNU-PR-Package: emacs Original-Received: via spool by 68244-submit@debbugs.gnu.org id=B68244.170459723716181 (code B ref 68244); Sun, 07 Jan 2024 03:14:02 +0000 Original-Received: (at 68244) by debbugs.gnu.org; 7 Jan 2024 03:13:57 +0000 Original-Received: from localhost ([127.0.0.1]:60520 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rMJbo-0004Cv-LG for submit@debbugs.gnu.org; Sat, 06 Jan 2024 22:13:57 -0500 Original-Received: from out3-smtp.messagingengine.com ([66.111.4.27]:59251) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rMJbn-0004Cf-50 for 68244@debbugs.gnu.org; Sat, 06 Jan 2024 22:13:55 -0500 Original-Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id B7CA45C00F2; Sat, 6 Jan 2024 22:13:43 -0500 (EST) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Sat, 06 Jan 2024 22:13:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1704597223; x=1704683623; bh=X8KhohXpGG63SsObzQ3/kqpS4hQhvINabVG3v8g2evY=; b= eD9MdR8Gukm/idrbJdKpEuSnUNgkYw5+IXzzQz5kJkIzAmBrF3gt+x5izqis+J/L NAg63MSBSc4Zat3j4UkmnQWNbgNjVLWMVjASOziPF3O82nmeLdhUrnOQVwnnc1Ns jGFN6cpcpIw14o4ZJ3kIvaWusgXpmklcFCdEZjJYxBlqd7ZFhbLlR92edPK8n/zy fhfPWK70qGG+dt9iLs6rFRM4Ws3s1extwve4vjXAOZw1RzmOjQJhqmUQjYiNQyYI o5ThvNuqKJjKS1mGRiAdxQZ9THSDP/t7U8xzZKRzynwyvqJlzEY9ZW2+z7EGvYzQ QMI/qtTrWQIZK/2mRn0dXg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1704597223; x= 1704683623; bh=X8KhohXpGG63SsObzQ3/kqpS4hQhvINabVG3v8g2evY=; b=N E7kw6ICllUpt2jh6EiZX4Wuo4KNmd0LR+Smro5GqNeXwl5tSPbjsDW5NZ1SUFMaw rv9qLmP94tr3y8iRGK1sLUMEv+JAMUECHjDKwGmQYTVn1p4PGHtfxQ7BtGs/eAzz 0N4roAPv/lhyIRSMwuDQXKUOzdKW+sPywx22IBVtKJQZlEBd8B9Eg1KnjeErkVQh UNXT+vwOAHjM6ja26J7VOJM1XOpLJ5uPSzUwFhJID1OxNkC1KrVCv+4qoxiHmf2F 2OgUibOgoCH/oWuVXYqVAiIcTjlaaNLbLnFJFBqMtvawB0wK/HU7Sz4AcqHqZYZi tznuFh4YQNT89Ro1l2dOA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdehvddgheekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepgeelfeetkefghfdvhfdtgeevveevteetgeetveegtedthefhudekteehffeu keeknecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 6 Jan 2024 22:13:41 -0500 (EST) Content-Language: en-US In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:277487 Archived-At: On 06/01/2024 13:34, Mattias EngdegÄrd wrote: > 5 jan. 2024 kl. 16.41 skrev Dmitry Gutov : > >>> That's a good question and it all comes down to how we interpret `consing_until_gc`. Here we take the view that it should encompass all parts of an allocation and this seems to be consistent with existing code. >> >> But the existing code used objects that would need to be collected by GC, right? And the new one, seemingly, does not. > > But it does, similar to the same way that we deal with string data. Actually, vectors might be a better comparison. And we do increase the tally when creating a vector (inside 'allocate_vectorlike'). >> So I don't quite see the advantage of increasing consing_until_gc then. It's like the difference between creating new strings and inserting strings into a buffer: new memory is used either way, but the latter doesn't increase consing. > > Since we don't know exactly when objects die, we use object allocation as a proxy, assuming that on average A bytes die for every B bytes allocated and make an informed (and adjusted) guess as to what the A/B ratio might be. That is the basis for the GC clock. > > Buffer memory is indeed treated differently and does not advance the GC clock as far as I can tell. Presumably the reasoning is that buffer size changes make a poor proxy for object deaths. Perhaps we could look at it differently: what are the failure modes for not increasing the tally. For strings, one could allocate a handful of very long strings, taking up a lot of memory, and if the consing tally did not take into account the lengths of the strings, the GC might never start, and we die of OOM. For vectors, it almost looks different (the contained values are already counted, and they'd usually be larger than the memory taken by one cell), but then you could put many copies of the same value (could even be nil) into a large vector, and we're back to the same problem. Could we do something like that with a hash-table? Probably not - the hashing should at least guarantee 'eq' uniqueness. But then I suppose someone could create an empty hash-table of a very large size. If the internal vectors are pre-allocated, that could have the same effect as the above. The same reasoning could work for buffers too, but are they actually garbage-collected? > Of course we could reason that growing an existing hash table is also a bad proxy for object deaths, but the evidence for that is weak so I used the same metric as for other data structures just to be on the safe side. > > This reminds me that the `gcstat` bookkeeping should probably include the hash-table ancillary arrays as well, since those counters are used to adjust the GC clock (see total_bytes_of_live_objects and consing_threshold). Will fix! > >> It's great that the new hash tables are garbage-collected more easily and produce less garbage overall, but in a real program any GC cycle will have to traverse the other data structures anyway. So we might be leaving free performance gains on the table when we induce GC cycles while no managed allocations are done. I could be missing something, of course. > > So could I, and please know that your questions are much appreciated. Are you satisfied by my replies above, or did I misunderstand your concerns? Thank you. I hope I'm not too off mark with my reasoning.