From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#68244: hash-table improvements Date: Tue, 9 Jan 2024 11:26:05 +0100 Message-ID: References: <170438379722.3921.9312235725296561206@vcs2.savannah.gnu.org> <20240104155642.B4A99C00344@vcs2.savannah.gnu.org> <8d49ebdc-9da7-4e70-a080-d8e892b980b6@gutov.dev> <08314177-5AE9-4352-94A0-641830B4094D@gmail.com> <19265EA5-E6F3-446C-AD9B-763693CF0A48@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="565"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Dmitry Gutov , Eli Zaretskii , 68244@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Jan 09 11:27:17 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rN9KG-000ARk-CT for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 09 Jan 2024 11:27:16 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rN9Jx-0005jy-8O; Tue, 09 Jan 2024 05:26:57 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rN9Jv-0005jq-Bn for bug-gnu-emacs@gnu.org; Tue, 09 Jan 2024 05:26:55 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rN9Jv-0004P9-3j for bug-gnu-emacs@gnu.org; Tue, 09 Jan 2024 05:26:55 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rN9K1-00040P-Io for bug-gnu-emacs@gnu.org; Tue, 09 Jan 2024 05:27:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 09 Jan 2024 10:27:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 68244 X-GNU-PR-Package: emacs Original-Received: via spool by 68244-submit@debbugs.gnu.org id=B68244.170479598515355 (code B ref 68244); Tue, 09 Jan 2024 10:27:01 +0000 Original-Received: (at 68244) by debbugs.gnu.org; 9 Jan 2024 10:26:25 +0000 Original-Received: from localhost ([127.0.0.1]:38432 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rN9JR-0003zb-0G for submit@debbugs.gnu.org; Tue, 09 Jan 2024 05:26:25 -0500 Original-Received: from mail-lj1-x230.google.com ([2a00:1450:4864:20::230]:42394) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rN9JM-0003zJ-10 for 68244@debbugs.gnu.org; Tue, 09 Jan 2024 05:26:23 -0500 Original-Received: by mail-lj1-x230.google.com with SMTP id 38308e7fff4ca-2cd1aeb1bf3so25675331fa.1 for <68244@debbugs.gnu.org>; Tue, 09 Jan 2024 02:26:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704795967; x=1705400767; darn=debbugs.gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date:message-id:reply-to; bh=a3EPIFzemGu8RFbWPsTLJrtEO2uVNws/yBmGApDzMtg=; b=UCF/eKpxH+LdV1TQ7UpzA0yYdKbzTZulhk7jw58EgpGhO4pRGBChYjqEP96KLk0e9X bfW44WbSY5ezw5/omXGS/agOl5JT07wKV2kJriyWPnQMFE48cpfHIXxzXn0nb2T5GsRy e8EnIwVogidkFmztI7veoXAdxpzmp9mpd2Aorbj2FIkJ84oIyF0XASsNLO/cvhP5spAl 0cvMh+FhxE69b96ViwFP0OiKGhU+KKjfJAExdoSbEKyvx5IVtYVJ5LKEyCo18/2ge/H0 k86zZn/6k3xsG2UY2m7EJ8zBLkTM2UdhMI0D+jPOZgQmBKiH1V9d6m2uV2BA0MpHFgze 9xZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704795967; x=1705400767; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=a3EPIFzemGu8RFbWPsTLJrtEO2uVNws/yBmGApDzMtg=; b=HhN5U32Jp0ztjs1jiWdkuZXfGr36Tm4NIHaq2m2VNcx0Dvdd6lhW3dvFkEAgJNgX+s Z2e40tmlxa28YYzqm3Tvo2aH4b2nGYAOPvEjyZIDFFWLN4DUyHmNbv/E3ah74A7JLMB/ ztYJCRAv1WW49ZlK1hX/79ILwkfoJRiqoVZzA+O2WzL87C4Kz1UT23KzdOS7itv8vz62 00E9/A4Sl0A05y9RPRRGRsmbI/SSfFGvfu+rmOETSp470KNwbxc77VL6HypDl4yVum7b 1euVwnhV/vKV/U/F3cGC5I5NvG+wWsWZ6OoDM0mn2/8A/vuWsju7ij2LVhEuCv0ZdseB AdVQ== X-Gm-Message-State: AOJu0YzONoSurQ4TPHzA1YPKTcEDHGJg9KjahA5oso3sw7xBA5b2iczz HZe8JDsVhyqa2sLMrVYeCy0= X-Google-Smtp-Source: AGHT+IEkV8bly2ejGohGaCFHgrPvA1dSc4VjOww5VAP/fEaSGWyVodR0gJcS1tnHdW2CQvD1GUwN2w== X-Received: by 2002:a05:651c:516:b0:2cc:dea3:23b with SMTP id o22-20020a05651c051600b002ccdea3023bmr372946ljp.3.1704795966928; Tue, 09 Jan 2024 02:26:06 -0800 (PST) Original-Received: from smtpclient.apple (c80-217-1-132.bredband.tele2.se. [80.217.1.132]) by smtp.gmail.com with ESMTPSA id l3-20020a2ea303000000b002cd46b08d4esm314996lje.67.2024.01.09.02.26.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Jan 2024 02:26:06 -0800 (PST) In-Reply-To: X-Mailer: Apple Mail (2.3654.120.0.1.15) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:277617 Archived-At: 9 jan. 2024 kl. 01.33 skrev Stefan Monnier : > BTW, what is the "per-entry byte-size" of your new code? > The old code had about 6 words per entry, IIRC. With tables filled to capacity, the old code was about 5.25 (assuming an = index vector size 1.25 times the table size). Now it's 1.625 words less, = thus 3.625 words per entry. I haven't done the maths for what the = average per-entry size would be if we take growth space into account. The index vector can be shrunk further if we use a narrower index for = smaller tables. This is a fairly common optimisation and usually the = lower memory usage is worth an extra branch or two. The hash-table object size is also down from 16 words to 10. 8 is = actually quite achievable: consolidate the key_and_value, hash and next = vectors into a single allocated block. It's just a matter of = benchmarking to see what memory arrangement is the most beneficial. >> We could try switching to a high-quality hash function (or family = thereof), >> like Murmur or Jenkins. Then range reduction is just a matter of = masking off >> the required number of bits. >=20 > I don't see a strong need for it. Maybe not, but I wouldn't discount it out of hand. A few cheap ALU ops = could easily pay for themselves if they lead to fewer collisions. > BTW, I see in the Knuth reduction you extract the bits 32..32+N of > the multiplication. It's supposed to be bits [32-N,32) actually (hope I got that right). > Any reason not to use the top N bits instead (so > we're not limited to 32 bits, for example)? I thought about writing a clever expression that would work for other = widths as well but it seemed like a waste of time given the current data = structures.