all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattias.engdegard@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Dmitry Gutov <dmitry@gutov.dev>, Eli Zaretskii <eliz@gnu.org>,
	68244@debbugs.gnu.org
Subject: bug#68244: hash-table improvements
Date: Mon, 8 Jan 2024 19:26:21 +0100	[thread overview]
Message-ID: <19265EA5-E6F3-446C-AD9B-763693CF0A48@gmail.com> (raw)
In-Reply-To: <jwvv885dktt.fsf-monnier+emacs@gnu.org>

7 jan. 2024 kl. 20.10 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> The change gives good results for small tables but less so for big ones.
> I don't have a good intuition for why that would be: none of the
> operations directly involved seem to be more costly for large tables, so
> my best guess is that it leads to more collisions somehow, tho I don't
> have a good idea about why that would be.

Yes, I wondered about that too. It could simply be bad sampling. The benchmarks with bad results used sequential numbers (0, 1, ...) as keys so let's start with that:

|  count |   size idxsiz  1st%  avg max |   size idxsiz  1st%  avg max
|--------+------------------------------+-----------------------------
|  10000 |  12466  15343 100.0 1.00   1 |  12288  16384 100.0 1.00   1
|  20000 |  28048  34523 100.0 1.00   1 |  24576  32768  98.8 1.01   2
|  30000 |  42072  51781 100.0 1.00   1 |  49152  65536  99.1 1.01   2
|  40000 |  42072  51781 100.0 1.00   1 |  49152  65536  94.5 1.06   2
|  50000 |  63108  77671 100.0 1.00   1 |  98304 131072 100.0 1.00   1
|  60000 |  63108  77671 100.0 1.00   1 |  98304 131072 100.0 1.00   1
|  70000 |  94662 116507 100.0 1.00   1 |  98304 131072 100.0 1.00   1
|  80000 |  94662 116507 100.0 1.00   1 |  98304 131072  96.0 1.04   2
|  90000 |  94662 116507 100.0 1.00   1 |  98304 131072  89.3 1.11   2
| 100000 | 141993 174761 100.0 1.00   1 | 196608 262144  92.9 1.07   2
| 110000 | 141993 174761 100.0 1.00   1 | 196608 262144  90.9 1.09   2
| 120000 | 141993 174761 100.0 1.00   1 | 196608 262144  89.3 1.11   2
| 130000 | 141993 174761 100.0 1.00   1 | 196608 262144  87.9 1.12   2
| 140000 | 141993 174761 100.0 1.00   1 | 196608 262144  86.7 1.13   2
| 150000 | 212989 262141 100.0 1.00   1 | 196608 262144  85.7 1.14   2
| 160000 | 212989 262141 100.0 1.00   1 | 196608 262144  84.8 1.15   2
| 170000 | 212989 262141 100.0 1.00   1 | 196608 262144  84.0 1.16   2
| 180000 | 212989 262141 100.0 1.00   1 | 196608 262144  83.3 1.17   2
| 190000 | 212989 262141 100.0 1.00   1 | 196608 262144  82.7 1.17   2
| 200000 | 212989 262141 100.0 1.00   1 | 393216 524288 100.0 1.00   1

The left columns are for the standard hash tables with remainder-based range reduction, the right ones with Knuth reduction.
'size' is the table size, 'idxsiz' the index vector size, '1st%' the portion of entries accessed with a single lookup, 'avg' the average number of accesses and 'max' the maximum number of accesses required.

The old code looks perfect (no collisions!), but the new shiny code is a disappointment.

Then again, sequential numbers are best-case for remainder-based indexing: Emacs hashes (smallish) fixnums to themselves so we are guaranteed a minimum number of collisions, actually zero since the index vector is always larger than the number of entries.

But sequential keys would be a somewhat unusual use of hash tables; a vector is a lot more efficient. Let's try with random fixnums instead, using the same seed for each table:

|  count |   size idxsiz 1st%  avg max |   size idxsiz 1st%  avg max
|--------+-----------------------------+----------------------------
|  10000 |  12466  15343 73.5 1.32   6 |  12288  16384 75.4 1.30   6
|  20000 |  28048  34523 75.7 1.29   6 |  24576  32768 75.2 1.30   6
|  30000 |  42072  51781 75.8 1.29   5 |  49152  65536 80.6 1.22   5
|  40000 |  42072  51781 69.6 1.39   7 |  49152  65536 75.0 1.30   6
|  50000 |  63108  77671 73.6 1.32   6 |  98304 131072 83.5 1.19   6
|  60000 |  63108  77671 69.6 1.39   7 |  98304 131072 80.5 1.23   6
|  70000 |  94662 116507 75.2 1.30   6 |  98304 131072 77.5 1.27   6
|  80000 |  94662 116507 72.4 1.34   7 |  98304 131072 75.0 1.30   7
|  90000 |  94662 116507 69.8 1.38   7 |  98304 131072 72.6 1.34   7
| 100000 | 141993 174761 76.2 1.29   6 | 196608 262144 83.3 1.19   6
| 110000 | 141993 174761 74.2 1.32   6 | 196608 262144 81.8 1.21   6
| 120000 | 141993 174761 72.4 1.34   7 | 196608 262144 80.3 1.23   6
| 130000 | 141993 174761 70.6 1.37   7 | 196608 262144 78.8 1.25   6
| 140000 | 141993 174761 68.8 1.40   7 | 196608 262144 77.6 1.27   6
| 150000 | 212989 262141 76.1 1.29   6 | 196608 262144 76.2 1.29   6
| 160000 | 212989 262141 74.8 1.31   6 | 196608 262144 74.9 1.30   6
| 170000 | 212989 262141 73.5 1.33   6 | 196608 262144 73.6 1.32   6
| 180000 | 212989 262141 72.3 1.35   7 | 196608 262144 72.3 1.34   6
| 190000 | 212989 262141 71.1 1.36   7 | 196608 262144 71.1 1.36   7
| 200000 | 212989 262141 69.9 1.38   7 | 393216 524288 83.1 1.19   5

Here the new code looks a bit better, but it could just be the bigger index vectors.
Not sure what to make of this.

We could try switching to a high-quality hash function (or family thereof), like Murmur or Jenkins. Then range reduction is just a matter of masking off the required number of bits.






  reply	other threads:[~2024-01-08 18:26 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <170438379722.3921.9312235725296561206@vcs2.savannah.gnu.org>
     [not found] ` <20240104155642.B4A99C00344@vcs2.savannah.gnu.org>
2024-01-04 17:32   ` scratch/hash-table-perf 2d28042f56a 19/35: Use non-Lisp allocation for internal hash-table vectors Dmitry Gutov
2024-01-05 10:33     ` bug#68244: hash-table improvements Mattias Engdegård
2024-01-05 15:41       ` Dmitry Gutov
2024-01-06 11:34         ` Mattias Engdegård
2024-01-06 11:51           ` Eli Zaretskii
2024-01-07  3:13           ` Dmitry Gutov
2024-01-07  5:26             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-07 15:39               ` Dmitry
2024-01-07 18:36               ` Mattias Engdegård
2024-01-07 19:10                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-08 18:26                   ` Mattias Engdegård [this message]
2024-01-09  0:33                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-09 10:26                       ` Mattias Engdegård
2024-01-13 20:06                         ` Mattias Engdegård
2024-01-04 16:27                           ` Mattias Engdegård
2024-01-04 16:39                             ` Eli Zaretskii
2024-01-04 17:02                               ` Mattias Engdegård
2024-01-04 17:45                                 ` Eli Zaretskii
2024-01-05 11:34                                   ` Mattias Engdegård
2024-01-05 17:14                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-06 11:46                                       ` Mattias Engdegård
2024-01-09 21:51                             ` Stefan Kangas
2024-01-12 15:42                               ` Mattias Engdegård
2024-01-14 22:08                             ` Andy Moreton
2024-01-15 12:31                               ` Eli Zaretskii
2024-01-15 13:26                                 ` Mattias Engdegård
2024-01-18 18:13                                   ` Mattias Engdegård
2024-01-15 20:01                             ` Andy Moreton
2024-01-15 20:21                               ` Eli Zaretskii
2024-01-16 21:57                             ` Andy Moreton
2024-01-17  3:31                               ` Eli Zaretskii
2024-01-18 20:29                             ` Andy Moreton
2024-01-19  6:37                               ` Eli Zaretskii
2024-01-20 20:20                             ` Andy Moreton
2024-01-21  5:11                               ` Eli Zaretskii
2024-01-21 13:03                             ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-22  9:17                               ` João Távora
2024-01-22  9:18                                 ` João Távora
2024-01-23  9:44                                 ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-14  5:25                           ` Gerd Möllmann
2024-01-14 14:42                             ` Mattias Engdegård
2024-01-21 12:41                           ` Stefan Kangas
2024-02-08  9:46                           ` Mattias Engdegård
2024-02-08 14:19                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-08 14:36                               ` Gerd Möllmann
2024-02-08 14:42                               ` Mattias Engdegård
2024-02-08 15:13                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-08 17:29                                   ` Mattias Engdegård
2024-02-08 17:49                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-12 12:16                                       ` Mattias Engdegård
2024-02-12 13:36                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-13  9:05                                         ` Gerd Möllmann
2024-02-13 10:12                                           ` Mattias Engdegård
2024-02-13 12:12                                             ` Gerd Möllmann
2024-02-13 12:43                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-14 12:37                                               ` Mattias Engdegård
2024-02-14 13:05                                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-14 13:21                                                   ` Mattias Engdegård
2024-02-17 19:50                                                     ` Mattias Engdegård
2024-02-20 10:21                                                       ` Mattias Engdegård
2024-02-20 14:00                                                         ` Eli Zaretskii
2024-02-20 16:11                                                           ` Mattias Engdegård
2024-02-20 17:12                                                             ` Eli Zaretskii
2024-02-21 12:59                                                               ` Eli Zaretskii
2024-02-21 20:13                                                                 ` Andrea Corallo
2024-02-23 12:16                                                                 ` Mattias Engdegård
2024-02-24  9:45                                                                   ` Mattias Engdegård
2024-02-24 10:30                                                                     ` Eli Zaretskii
2024-02-24 10:53                                                                       ` Mattias Engdegård
2024-02-24 11:03                                                                         ` Eli Zaretskii
2024-02-24 17:20                                                                           ` Dmitry Gutov
2024-02-24 17:43                                                                             ` Mattias Engdegård
2024-02-24 17:48                                                                               ` Dmitry Gutov
2024-02-24 17:53                                                                                 ` Mattias Engdegård
2024-02-24 18:08                                                                                   ` Eli Zaretskii
2024-02-24 18:31                                                                                     ` Dmitry Gutov
2024-02-24 17:54                                                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-24 17:14                                                                         ` Dmitry Gutov
2024-02-24  2:46                                           ` Dmitry Gutov
     [not found] ` <20240104155642.6326FC00344@vcs2.savannah.gnu.org>
2024-01-04 18:52   ` scratch/hash-table-perf 3e9e68333ae 16/35: Remove rehash-threshold and rehash-size struct members Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19265EA5-E6F3-446C-AD9B-763693CF0A48@gmail.com \
    --to=mattias.engdegard@gmail.com \
    --cc=68244@debbugs.gnu.org \
    --cc=dmitry@gutov.dev \
    --cc=eliz@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.