all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#68244: hash-table improvements
@ 2024-01-04 16:27                           ` Mattias Engdegård
  2024-01-04 16:39                             ` Eli Zaretskii
                                               ` (7 more replies)
  0 siblings, 8 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-04 16:27 UTC (permalink / raw)
  To: 68244

It's the hash-table's turn to be subjected to some performance tuning. The implementation has changed very little over the years and it's not so much a matter of a single big thing to fix as many small ones, so the list of changes is fairly long but ever little helps.

A working patch series can be found in the scratch/hash-table-perf branch. Although it should all be satisfactory and an improvement on what was before, there are a couple of details that I'd like to do better, which is why this hasn't been merged yet: the way shared hash_table_test structs are cached isn't very elegant, nor is the way we deal with Qunbound in pdumper.c.







^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:27                           ` Mattias Engdegård
@ 2024-01-04 16:39                             ` Eli Zaretskii
  2024-01-04 17:02                               ` Mattias Engdegård
  2024-01-09 21:51                             ` Stefan Kangas
                                               ` (6 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2024-01-04 16:39 UTC (permalink / raw)
  To: Mattias Engdegård, Stefan Monnier; +Cc: 68244

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Thu, 4 Jan 2024 17:27:53 +0100
> 
> It's the hash-table's turn to be subjected to some performance tuning. The implementation has changed very little over the years and it's not so much a matter of a single big thing to fix as many small ones, so the list of changes is fairly long but ever little helps.
> 
> A working patch series can be found in the scratch/hash-table-perf branch. Although it should all be satisfactory and an improvement on what was before, there are a couple of details that I'd like to do better, which is why this hasn't been merged yet: the way shared hash_table_test structs are cached isn't very elegant, nor is the way we deal with Qunbound in pdumper.c.

Thanks.  Any data about the actual performance gains?

Adding Stefan to the discussion, in case he has comments and
suggestions.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:39                             ` Eli Zaretskii
@ 2024-01-04 17:02                               ` Mattias Engdegård
  2024-01-04 17:45                                 ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-04 17:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 68244, Stefan Monnier

4 jan. 2024 kl. 17.39 skrev Eli Zaretskii <eliz@gnu.org>:

> Any data about the actual performance gains?

Lots, naturally, but hash tables have many moving parts and performance is many-dimensional which is why I didn't include the raw data; it's voluminous and not always easy to interpret. Giving a single number as in 'now 20 % faster' is impossible if we want to stay honest, because there are so many different aspects, contexts and operations.

The reforms so far actually don't change the basic algorithms or data structures much but concentrate on slimming the internal and external representation, both of which are basically guaranteed to provide performance gains (and they do).

External representation improvements means that printing and reading is substantially faster, especially for small tables (otherwise most of the time is just spent printing and reading the data), but many tables are small.

Internal representation means that we now use much less memory for the basic hash table objects and for the tables, and even more importantly avoid expensive GC involvement where not absolutely necessary. This gives some measurable speed-up itself, but provides benefits for other code.

I do want to change some of the hashing and range reduction code, though, but this requires even more careful benchmarking to be sure that we don't end up with any surprises.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: scratch/hash-table-perf 2d28042f56a 19/35: Use non-Lisp allocation for internal hash-table vectors
       [not found] ` <20240104155642.B4A99C00344@vcs2.savannah.gnu.org>
@ 2024-01-04 17:32   ` Dmitry Gutov
  2024-01-05 10:33     ` bug#68244: hash-table improvements Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Dmitry Gutov @ 2024-01-04 17:32 UTC (permalink / raw)
  To: emacs-devel, Mattias Engdegård

On 04/01/2024 17:56, Mattias Engdegård wrote:
> +/* Like xmalloc, but makes allocation count toward the total consing.
> +   Return NULL for a zero-sized allocation.  */
> +void *
> +hash_table_alloc_bytes (ptrdiff_t nbytes)
> +{
> +  if (nbytes == 0)
> +    return NULL;
> +  tally_consing (nbytes);
> +  return xmalloc (nbytes);
> +}

Sorry if it's a stupid question, but if the operation doesn't add any 
Lisp "garbage", why increase the consing counter? That is likely 
triggers more GCs earlier which otherwise might not run, and if there 
are no slots to GC, it seems like they would run in vain.



^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 17:02                               ` Mattias Engdegård
@ 2024-01-04 17:45                                 ` Eli Zaretskii
  2024-01-05 11:34                                   ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2024-01-04 17:45 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 68244, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Thu, 4 Jan 2024 18:02:08 +0100
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  68244@debbugs.gnu.org
> 
> 4 jan. 2024 kl. 17.39 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Any data about the actual performance gains?
> 
> Lots, naturally, but hash tables have many moving parts and performance is many-dimensional which is why I didn't include the raw data; it's voluminous and not always easy to interpret. Giving a single number as in 'now 20 % faster' is impossible if we want to stay honest, because there are so many different aspects, contexts and operations.

It doesn't have to be a single number.  Showing a series of separate
benchmarks is also fine.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: scratch/hash-table-perf 3e9e68333ae 16/35: Remove rehash-threshold and rehash-size struct members
       [not found] ` <20240104155642.6326FC00344@vcs2.savannah.gnu.org>
@ 2024-01-04 18:52   ` Dmitry Gutov
  0 siblings, 0 replies; 80+ messages in thread
From: Dmitry Gutov @ 2024-01-04 18:52 UTC (permalink / raw)
  To: emacs-devel, Mattias Engdegård

On 04/01/2024 17:56, Mattias Engdegård wrote:
> branch: scratch/hash-table-perf
> commit 3e9e68333ae15a6544e490851f80da3a8f9ef343
> Author: Mattias Engdegård<mattiase@acm.org>
> Commit: Mattias Engdegård<mattiase@acm.org>
> 
>      Remove rehash-threshold and rehash-size struct members
>      
>      These parameters have no visible semantics and are hardly ever used,
>      so just use the default values for all hash tables.  This saves
>      memory, shrinks the external representation, and will improve
>      performance.

I'm curious whether any of these callers have tested with different 
values and found a difference in performance:

https://github.com/search?q=%3Arehash-threshold+language%3A%22Emacs+Lisp%22&type=code&l=Emacs+Lisp



^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 17:32   ` scratch/hash-table-perf 2d28042f56a 19/35: Use non-Lisp allocation for internal hash-table vectors Dmitry Gutov
@ 2024-01-05 10:33     ` Mattias Engdegård
  2024-01-05 15:41       ` Dmitry Gutov
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-05 10:33 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, 68244, Stefan Monnier

[ replies to questions asked elsewhere ]

4 jan. 2024 kl. 18.32 skrev Dmitry Gutov <dmitry@gutov.dev>:

>> +hash_table_alloc_bytes (ptrdiff_t nbytes)
>> +{
>> +  if (nbytes == 0)
>> +    return NULL;
>> +  tally_consing (nbytes);
>> +  return xmalloc (nbytes);
>> +}
> 
> Sorry if it's a stupid question, but if the operation doesn't add any Lisp "garbage", why increase the consing counter? That is likely triggers more GCs earlier which otherwise might not run, and if there are no slots to GC, it seems like they would run in vain.

That's a good question and it all comes down to how we interpret `consing_until_gc`. Here we take the view that it should encompass all parts of an allocation and this seems to be consistent with existing code.

These ancillary allocations are parts of the hash-table object: they are allocated and freed at the same time as the mother object, and subject to the same `consing_until_gc` accounting.

The new code is radically more efficient when growing tables, because we can free the old vectors immediately (and adjust consing_until_gc accordingly) instead of leaving it as garbage waiting to be collected. This provides several benefits in itself (GCs are made less often, we can re-use hot memory). Not having to traverse them in either the mark or sweep phases is another big gain (the key_and_value parts still have to be marked, of course).

> I'm curious whether any of these callers have tested with different values and found a difference in performance:
> 
> https://github.com/search?q=%3Arehash-threshold+language%3A%22Emacs+Lisp%22&type=code&l=Emacs+Lisp

I don't think they did, but it's easy to picture what was going on:

  Manual: Here is a Knob. It can be Turned to improve Performance.
  User:   Knob! Performance!

Some trials indicate that it's almost impossible for a user to use those knobs effectively, nor should they have to. Moreover, by removing them we not only give users fewer things to worry about, we are also able to improve performance for everyone.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 17:45                                 ` Eli Zaretskii
@ 2024-01-05 11:34                                   ` Mattias Engdegård
  2024-01-05 17:14                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-05 11:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Gutov, 68244, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 666 bytes --]

4 jan. 2024 kl. 18.45 skrev Eli Zaretskii <eliz@gnu.org>:

> It doesn't have to be a single number.  Showing a series of separate
> benchmarks is also fine.

Unfortunately it's not quite that simple: those numbers have to be interpreted and understood as well, which requires about as much work as making the benchmarks in the first place.

But let's try anyway: here is a run of one of the main suites I've been using. The numbers indicate relative changes in percent of elapsed time from the baseline to the tip of the scratch/hash-table-perf branch, negative numbers being better. For example, -50 means that speed has doubled, +100 that it has halved.


[-- Attachment #2: pct.org --]
[-- Type: application/octet-stream, Size: 2992 bytes --]

|                           | 681a2877cc2 |
|---------------------------+-------------|
| make empty                |       -88.9 |
| make 4, dead              |       -88.8 |
| make 16, dead             |       -62.3 |
| make 64, dead             |       +25.1 |
| make 256, dead            |       -77.7 |
| make 1024, dead           |       -76.7 |
| make 4096, dead           |       -81.4 |
| make 100000, dead         |       -73.2 |
|---------------------------+-------------|
| make 4, alive             |       -93.3 |
| make 16, alive            |       -75.6 |
| make 64, alive            |       +11.5 |
| make 256, alive           |       -82.4 |
| make 1024, alive          |       -79.7 |
| make 2048, alive          |       -83.3 |
| make 4096, alive          |       -84.4 |
| make 100000, alive        |       -75.1 |
|---------------------------+-------------|
| gethash seq 4             |        -4.7 |
| gethash seq 16            |        -5.7 |
| gethash seq 64            |        -4.9 |
| gethash seq 256           |        -4.7 |
| gethash seq 1024          |        -4.7 |
| gethash seq 4096          |        -4.5 |
| gethash seq 100000        |        -4.5 |
|---------------------------+-------------|
| gethash rnd 4             |        -5.1 |
| gethash rnd 16            |        -4.5 |
| gethash rnd 64            |        -8.9 |
| gethash rnd 256           |        -3.0 |
| gethash rnd 1024          |        -4.8 |
| gethash rnd 4096          |        -1.9 |
| gethash rnd 100000        |        -6.5 |
|---------------------------+-------------|
| gethash sym 1059          |       -15.2 |
|---------------------------+-------------|
| gethash 20000 tbls 5 elts |       -76.8 |
| gethash 5263 tbls 19 elts |       -64.0 |
| gethash 1538 tbls 65 elts |       -21.8 |
| gethash 571 tbls 175 elts |       -34.6 |
|---------------------------+-------------|
| insert/remove all 10      |        -5.4 |
| insert/remove all 100     |        -5.7 |
| insert/remove all 1000    |        -4.7 |
| insert/remove all 10000   |        -5.1 |
| insert/remove all 100000  |        -5.5 |
|---------------------------+-------------|
| insert/remove 10          |       +10.2 |
| insert/remove 100         |        -7.7 |
| insert/remove 1000        |       -12.4 |
| insert/remove 10000       |        -5.1 |
| insert/remove 100000      |       -10.2 |
|---------------------------+-------------|
| maphash 10                |        -6.0 |
| maphash 100               |        -0.4 |
| maphash 1000              |        -0.2 |
| maphash 10000             |        +0.8 |
| maphash 100000            |        -0.4 |
|---------------------------+-------------|
| print empty eql           |       -42.1 |
| print empty eq            |       -39.0 |
|---------------------------+-------------|
| read empty eql            |       -96.5 |
| read empty eq             |       -94.9 |
| read 100                  |       -36.0 |
| read 10000                |        -6.7 |

[-- Attachment #3: Type: text/plain, Size: 575 bytes --]



Again, these are micro-benchmarks designed to amplify the signal in various respects, and the numbers don't tell everything -- far from it, in fact.

As mentioned before, one of the most important aspects of a data structure is not just how well it performs itself but how good neighbour it is: the less memory it uses and the better its locality of reference, the less negative impact it has on other objects and data structures. However, these effects can be difficult to measure correctly, so a lot of the tuning and assessment have to be done by indirect means.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-05 10:33     ` bug#68244: hash-table improvements Mattias Engdegård
@ 2024-01-05 15:41       ` Dmitry Gutov
  2024-01-06 11:34         ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Dmitry Gutov @ 2024-01-05 15:41 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Eli Zaretskii, 68244, Stefan Monnier

On 05/01/2024 12:33, Mattias Engdegård wrote:
> [ replies to questions asked elsewhere ]
> 
> 4 jan. 2024 kl. 18.32 skrev Dmitry Gutov <dmitry@gutov.dev>:
> 
>>> +hash_table_alloc_bytes (ptrdiff_t nbytes)
>>> +{
>>> +  if (nbytes == 0)
>>> +    return NULL;
>>> +  tally_consing (nbytes);
>>> +  return xmalloc (nbytes);
>>> +}
>>
>> Sorry if it's a stupid question, but if the operation doesn't add any Lisp "garbage", why increase the consing counter? That is likely triggers more GCs earlier which otherwise might not run, and if there are no slots to GC, it seems like they would run in vain.
> 
> That's a good question and it all comes down to how we interpret `consing_until_gc`. Here we take the view that it should encompass all parts of an allocation and this seems to be consistent with existing code.

But the existing code used objects that would need to be collected by 
GC, right? And the new one, seemingly, does not.

So I don't quite see the advantage of increasing consing_until_gc then. 
It's like the difference between creating new strings and inserting 
strings into a buffer: new memory is used either way, but the latter 
doesn't increase consing.

> These ancillary allocations are parts of the hash-table object: they are allocated and freed at the same time as the mother object, and subject to the same `consing_until_gc` accounting.
> 
> The new code is radically more efficient when growing tables, because we can free the old vectors immediately (and adjust consing_until_gc accordingly) instead of leaving it as garbage waiting to be collected. This provides several benefits in itself (GCs are made less often, we can re-use hot memory). Not having to traverse them in either the mark or sweep phases is another big gain (the key_and_value parts still have to be marked, of course).

It's great that the new hash tables are garbage-collected more easily 
and produce less garbage overall, but in a real program any GC cycle 
will have to traverse the other data structures anyway. So we might be 
leaving free performance gains on the table when we induce GC cycles 
while no managed allocations are done. I could be missing something, of 
course.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-05 11:34                                   ` Mattias Engdegård
@ 2024-01-05 17:14                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-06 11:46                                       ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-05 17:14 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Dmitry Gutov, Eli Zaretskii, 68244

> But let's try anyway: here is a run of one of the main suites I've been
> using. The numbers indicate relative changes in percent of elapsed time from
> the baseline to the tip of the scratch/hash-table-perf branch, negative
> numbers being better. For example, -50 means that speed has doubled, +100
> that it has halved.
>
> [2. application/vnd.lotus-organizer; pct.org]

Lotus, eh?

> |                           | 681a2877cc2 |
> |---------------------------+-------------|
> | make empty                |       -88.9 |
> | make 4, dead              |       -88.8 |
> | make 16, dead             |       -62.3 |
> | make 64, dead             |       +25.1 |
> | make 256, dead            |       -77.7 |
> | make 1024, dead           |       -76.7 |
> | make 4096, dead           |       -81.4 |
> | make 100000, dead         |       -73.2 |
> |---------------------------+-------------|
> | make 4, alive             |       -93.3 |
> | make 16, alive            |       -75.6 |
> | make 64, alive            |       +11.5 |
> | make 256, alive           |       -82.4 |
> | make 1024, alive          |       -79.7 |
> | make 2048, alive          |       -83.3 |
> | make 4096, alive          |       -84.4 |
> | make 100000, alive        |       -75.1 |

What's up with 64?


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-05 15:41       ` Dmitry Gutov
@ 2024-01-06 11:34         ` Mattias Engdegård
  2024-01-06 11:51           ` Eli Zaretskii
  2024-01-07  3:13           ` Dmitry Gutov
  0 siblings, 2 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-06 11:34 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, 68244, Stefan Monnier

5 jan. 2024 kl. 16.41 skrev Dmitry Gutov <dmitry@gutov.dev>:

>> That's a good question and it all comes down to how we interpret `consing_until_gc`. Here we take the view that it should encompass all parts of an allocation and this seems to be consistent with existing code.
> 
> But the existing code used objects that would need to be collected by GC, right? And the new one, seemingly, does not.

But it does, similar to the same way that we deal with string data.

> So I don't quite see the advantage of increasing consing_until_gc then. It's like the difference between creating new strings and inserting strings into a buffer: new memory is used either way, but the latter doesn't increase consing.

Since we don't know exactly when objects die, we use object allocation as a proxy, assuming that on average A bytes die for every B bytes allocated and make an informed (and adjusted) guess as to what the A/B ratio might be. That is the basis for the GC clock.

Buffer memory is indeed treated differently and does not advance the GC clock as far as I can tell. Presumably the reasoning is that buffer size changes make a poor proxy for object deaths.

Of course we could reason that growing an existing hash table is also a bad proxy for object deaths, but the evidence for that is weak so I used the same metric as for other data structures just to be on the safe side.

This reminds me that the `gcstat` bookkeeping should probably include the hash-table ancillary arrays as well, since those counters are used to adjust the GC clock (see total_bytes_of_live_objects and consing_threshold). Will fix!

> It's great that the new hash tables are garbage-collected more easily and produce less garbage overall, but in a real program any GC cycle will have to traverse the other data structures anyway. So we might be leaving free performance gains on the table when we induce GC cycles while no managed allocations are done. I could be missing something, of course.

So could I, and please know that your questions are much appreciated. Are you satisfied by my replies above, or did I misunderstand your concerns?






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-05 17:14                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-01-06 11:46                                       ` Mattias Engdegård
  0 siblings, 0 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-06 11:46 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Gutov, Eli Zaretskii, 68244

5 jan. 2024 kl. 18.14 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

>> [2. application/vnd.lotus-organizer; pct.org]
> 
> Lotus, eh?

For those having trouble reading the file, I may have some Organiser licenses left for sale.

> What's up with 64?

The old (current) hash tables have a default size of 65 which is very convenient if you happen to fill it with that many entries. The new code lowers the default substantially which is generally beneficial as most tables are small, and growing them is much cheaper than before.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-06 11:34         ` Mattias Engdegård
@ 2024-01-06 11:51           ` Eli Zaretskii
  2024-01-07  3:13           ` Dmitry Gutov
  1 sibling, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2024-01-06 11:51 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: dmitry, 68244, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 6 Jan 2024 12:34:05 +0100
> Cc: 68244@debbugs.gnu.org,
>  Eli Zaretskii <eliz@gnu.org>,
>  Stefan Monnier <monnier@iro.umontreal.ca>
> 
> Buffer memory is indeed treated differently and does not advance the GC clock as far as I can tell. Presumably the reasoning is that buffer size changes make a poor proxy for object deaths.

Buffer memory is usually allocated via mmap (nowadays, implicitly by
libc), which is why it is treated differently.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-06 11:34         ` Mattias Engdegård
  2024-01-06 11:51           ` Eli Zaretskii
@ 2024-01-07  3:13           ` Dmitry Gutov
  2024-01-07  5:26             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 80+ messages in thread
From: Dmitry Gutov @ 2024-01-07  3:13 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Eli Zaretskii, 68244, Stefan Monnier

On 06/01/2024 13:34, Mattias Engdegård wrote:
> 5 jan. 2024 kl. 16.41 skrev Dmitry Gutov <dmitry@gutov.dev>:
> 
>>> That's a good question and it all comes down to how we interpret `consing_until_gc`. Here we take the view that it should encompass all parts of an allocation and this seems to be consistent with existing code.
>>
>> But the existing code used objects that would need to be collected by GC, right? And the new one, seemingly, does not.
> 
> But it does, similar to the same way that we deal with string data.

Actually, vectors might be a better comparison. And we do increase the 
tally when creating a vector (inside 'allocate_vectorlike').

>> So I don't quite see the advantage of increasing consing_until_gc then. It's like the difference between creating new strings and inserting strings into a buffer: new memory is used either way, but the latter doesn't increase consing.
> 
> Since we don't know exactly when objects die, we use object allocation as a proxy, assuming that on average A bytes die for every B bytes allocated and make an informed (and adjusted) guess as to what the A/B ratio might be. That is the basis for the GC clock.
> 
> Buffer memory is indeed treated differently and does not advance the GC clock as far as I can tell. Presumably the reasoning is that buffer size changes make a poor proxy for object deaths.

Perhaps we could look at it differently: what are the failure modes for 
not increasing the tally.

For strings, one could allocate a handful of very long strings, taking 
up a lot of memory, and if the consing tally did not take into account 
the lengths of the strings, the GC might never start, and we die of OOM.

For vectors, it almost looks different (the contained values are already 
counted, and they'd usually be larger than the memory taken by one 
cell), but then you could put many copies of the same value (could even 
be nil) into a large vector, and we're back to the same problem.

Could we do something like that with a hash-table? Probably not - the 
hashing should at least guarantee 'eq' uniqueness. But then I suppose 
someone could create an empty hash-table of a very large size. If the 
internal vectors are pre-allocated, that could have the same effect as 
the above.

The same reasoning could work for buffers too, but are they actually 
garbage-collected?

> Of course we could reason that growing an existing hash table is also a bad proxy for object deaths, but the evidence for that is weak so I used the same metric as for other data structures just to be on the safe side.
 >
> This reminds me that the `gcstat` bookkeeping should probably include the hash-table ancillary arrays as well, since those counters are used to adjust the GC clock (see total_bytes_of_live_objects and consing_threshold). Will fix!
> 
>> It's great that the new hash tables are garbage-collected more easily and produce less garbage overall, but in a real program any GC cycle will have to traverse the other data structures anyway. So we might be leaving free performance gains on the table when we induce GC cycles while no managed allocations are done. I could be missing something, of course.
> 
> So could I, and please know that your questions are much appreciated. Are you satisfied by my replies above, or did I misunderstand your concerns?

Thank you. I hope I'm not too off mark with my reasoning.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-07  3:13           ` Dmitry Gutov
@ 2024-01-07  5:26             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-07 15:39               ` Dmitry
  2024-01-07 18:36               ` Mattias Engdegård
  0 siblings, 2 replies; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-07  5:26 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Mattias Engdegård, 68244, Eli Zaretskii

Side note: I think reasoning won't get us out of this: either we decide
the choice is important, and then we try and do some
profiling/benchmarking, or we decide it's probably not worth the effort
and make an arbitrary choice under the expectation that it probably
won't make any difference anyway.

The use of memory allocation as a way to decide when to do the next GC
is just a crude tool anyway, which can often result in bad GC decisions,
anyway (e.g. typically during long periods of initialization where we
allocate many objects but don't generate almost any garbage).

We sadly don't have a better alternative, but being crude means that the
details usually don't matter anyway.


        Stefan


Dmitry Gutov [2024-01-07 05:13:39] wrote:

> On 06/01/2024 13:34, Mattias Engdegård wrote:
>> 5 jan. 2024 kl. 16.41 skrev Dmitry Gutov <dmitry@gutov.dev>:
>> 
>>>> That's a good question and it all comes down to how we interpret
>>> `consing_until_gc`. Here we take the view that it should encompass all
>>> parts of an allocation and this seems to be consistent with
>>> existing code.
>>>
>>> But the existing code used objects that would need to be collected by GC,
>>> right? And the new one, seemingly, does not.
>> But it does, similar to the same way that we deal with string data.
>
> Actually, vectors might be a better comparison. And we do increase the tally
> when creating a vector (inside 'allocate_vectorlike').
>
>>> So I don't quite see the advantage of increasing consing_until_gc
>>> then. It's like the difference between creating new strings and inserting
>>> strings into a buffer: new memory is used either way, but the latter
>>> doesn't increase consing.
>> Since we don't know exactly when objects die, we use object allocation as
>> a proxy, assuming that on average A bytes die for every B bytes allocated
>> and make an informed (and adjusted) guess as to what the A/B ratio might
>> be. That is the basis for the GC clock.
>> Buffer memory is indeed treated differently and does not advance the GC
>> clock as far as I can tell. Presumably the reasoning is that buffer size
>> changes make a poor proxy for object deaths.
>
> Perhaps we could look at it differently: what are the failure modes for not
> increasing the tally.
>
> For strings, one could allocate a handful of very long strings, taking up
> a lot of memory, and if the consing tally did not take into account the
> lengths of the strings, the GC might never start, and we die of OOM.
>
> For vectors, it almost looks different (the contained values are already
> counted, and they'd usually be larger than the memory taken by one cell),
> but then you could put many copies of the same value (could even be nil)
> into a large vector, and we're back to the same problem.
>
> Could we do something like that with a hash-table? Probably not - the
> hashing should at least guarantee 'eq' uniqueness. But then I suppose
> someone could create an empty hash-table of a very large size. If the
> internal vectors are pre-allocated, that could have the same effect as
> the above.
>
> The same reasoning could work for buffers too, but are they actually
> garbage-collected?
>
>> Of course we could reason that growing an existing hash table is also
>> a bad proxy for object deaths, but the evidence for that is weak so I used
>> the same metric as for other data structures just to be on the safe side.
>>
>> This reminds me that the `gcstat` bookkeeping should probably include the
>> hash-table ancillary arrays as well, since those counters are used to
>> adjust the GC clock (see total_bytes_of_live_objects and
>> consing_threshold). Will fix!
>> 
>>> It's great that the new hash tables are garbage-collected more easily and
>>> produce less garbage overall, but in a real program any GC cycle will
>>> have to traverse the other data structures anyway. So we might be leaving
>>> free performance gains on the table when we induce GC cycles while no
>>> managed allocations are done. I could be missing something, of course.
>> So could I, and please know that your questions are much appreciated. Are
>> you satisfied by my replies above, or did I misunderstand your concerns?
>
> Thank you. I hope I'm not too off mark with my reasoning.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-07  5:26             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-01-07 15:39               ` Dmitry
  2024-01-07 18:36               ` Mattias Engdegård
  1 sibling, 0 replies; 80+ messages in thread
From: Dmitry @ 2024-01-07 15:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Mattias Engdegård, 68244, Eli Zaretskii

[-- Attachment #1: Type: text/plain, Size: 812 bytes --]

On Sun, Jan 7, 2024, at 7:26 AM, Stefan Monnier wrote:
> Side note: I think reasoning won't get us out of this: either we decide
> the choice is important, and then we try and do some
> profiling/benchmarking
That's what I was going to suggest, but concluded that dropping the tallying increments in any of these constructors can lead to risky behavior in degenerate cases.
> The use of memory allocation as a way to decide when to do the next GC
> is just a crude tool anyway, which can often result in bad GC decisions,
> anyway (e.g. typically during long periods of initialization where we
> allocate many objects but don't generate almost any garbage).
Indeed, we have a latency/throughput tradeoff here with the current system, so it would be tempting to reduce the frequency of GCs at least in some cases.

[-- Attachment #2: Type: text/html, Size: 1171 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-07  5:26             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-07 15:39               ` Dmitry
@ 2024-01-07 18:36               ` Mattias Engdegård
  2024-01-07 19:10                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-07 18:36 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Gutov, Eli Zaretskii, 68244

[-- Attachment #1: Type: text/plain, Size: 1217 bytes --]

7 jan. 2024 kl. 06.26 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> The use of memory allocation as a way to decide when to do the next GC
> is just a crude tool anyway, which can often result in bad GC decisions,
> anyway (e.g. typically during long periods of initialization where we
> allocate many objects but don't generate almost any garbage).

In any case the changes to GC heuristics and policy that have been proposed aren't specific to hash-tables and while interesting are outside the scope of my work at hand. This means that I'll keep using the same kind of bookkeeping as before, so that the GC is reasonably well informed if we want to change it in the future.

I've pushed two new changes: a correction to the GC accounting for the ancillary hash-table vectors, and a rather more interesting change to the hash table range reduction. It now uses a Knuth multiplicative method instead of the expensive remainder, so the index is now always a power of 2 in size.

Benchmark results attached: the first column is the same as before, the middle after the accounting fix (which turned out not to be noticeable at all), and the rightmost what we got from Knuth. I'll probably keep both.


[-- Attachment #2: pct.org --]
[-- Type: application/octet-stream, Size: 4284 bytes --]

|                           | 681a2877 | e366ae38 | b9c9539d |
|---------------------------+----------+----------+----------|
| make empty                |    -88.8 |    -88.8 |    -89.9 |
| make 4, dead              |    -88.8 |    -88.9 |    -90.0 |
| make 16, dead             |    -62.3 |    -62.5 |    -70.7 |
| make 64, dead             |    +26.8 |    +24.7 |     -3.1 |
| make 256, dead            |    -78.4 |    -78.0 |    -67.7 |
| make 1024, dead           |    -76.7 |    -76.7 |    -67.0 |
| make 4096, dead           |    -81.4 |    -81.4 |    -74.6 |
| make 100000, dead         |    -73.2 |    -73.3 |    -59.0 |
|---------------------------+----------+----------+----------|
| make 4, alive             |    -93.3 |    -93.3 |    -95.4 |
| make 16, alive            |    -75.6 |    -75.6 |    -82.8 |
| make 64, alive            |    +11.4 |    +12.6 |    -22.6 |
| make 256, alive           |    -82.4 |    -82.4 |    -72.5 |
| make 1024, alive          |    -79.7 |    -79.7 |    -68.1 |
| make 2048, alive          |    -83.3 |    -83.7 |    -73.3 |
| make 4096, alive          |    -84.4 |    -84.4 |    -74.7 |
| make 100000, alive        |    -75.3 |    -75.3 |    -61.2 |
|---------------------------+----------+----------+----------|
| gethash seq 4             |     -4.7 |     -4.7 |    -11.2 |
| gethash seq 16            |     -5.7 |     -5.7 |     -7.6 |
| gethash seq 64            |     -5.5 |     -4.9 |     -6.8 |
| gethash seq 256           |     -5.3 |     -4.7 |     -6.4 |
| gethash seq 1024          |     -5.1 |     -4.9 |     -6.6 |
| gethash seq 4096          |     -5.1 |     -4.5 |     -6.6 |
| gethash seq 100000        |     -5.1 |     -4.7 |     +5.3 |
|---------------------------+----------+----------+----------|
| gethash rnd 4             |     -5.0 |     -5.0 |     -8.7 |
| gethash rnd 16            |     -4.6 |     -4.6 |    -10.0 |
| gethash rnd 64            |     -8.9 |     -8.9 |    -13.6 |
| gethash rnd 256           |     -3.2 |     -3.2 |    -10.8 |
| gethash rnd 1024          |     -4.8 |     -4.8 |    -13.2 |
| gethash rnd 4096          |     -2.3 |     -2.1 |    -12.0 |
| gethash rnd 100000        |     -4.0 |     -6.4 |    -17.6 |
|---------------------------+----------+----------+----------|
| gethash sym 1059          |    -15.4 |    -15.4 |    -18.0 |
|---------------------------+----------+----------+----------|
| gethash 20000 tbls 5 elts |    -76.5 |    -77.0 |    -77.4 |
| gethash 5263 tbls 19 elts |    -63.5 |    -63.8 |    -67.4 |
| gethash 1538 tbls 65 elts |    -28.8 |    -22.8 |    -35.5 |
| gethash 571 tbls 175 elts |    -33.7 |    -34.1 |    -40.1 |
|---------------------------+----------+----------+----------|
| insert/remove all 10      |     -5.4 |     -5.0 |    -10.8 |
| insert/remove all 100     |     -5.9 |     -5.9 |    -10.1 |
| insert/remove all 1000    |     -4.9 |     -5.1 |     -9.1 |
| insert/remove all 10000   |     -4.9 |     -5.1 |     -9.5 |
| insert/remove all 100000  |     -5.5 |     -5.5 |     +2.2 |
|---------------------------+----------+----------+----------|
| insert/remove 10          |    +10.2 |    +10.2 |     -5.2 |
| insert/remove 100         |     -7.7 |     -7.7 |    -17.5 |
| insert/remove 1000        |    -12.4 |    -12.4 |    -16.1 |
| insert/remove 10000       |     -5.1 |     -5.1 |    -10.4 |
| insert/remove 100000      |     -9.9 |     -9.9 |     -6.9 |
|---------------------------+----------+----------+----------|
| maphash 10                |     -6.1 |     -6.0 |     -7.1 |
| maphash 100               |     -0.4 |     -0.4 |     +0.8 |
| maphash 1000              |     -0.2 |     -0.2 |     +0.8 |
| maphash 10000             |     +0.8 |     +0.8 |     +0.0 |
| maphash 100000            |     -0.2 |     -0.4 |     +1.8 |
|---------------------------+----------+----------+----------|
| print empty eql           |    -43.2 |    -41.8 |    -42.6 |
| print empty eq            |    -39.1 |    -38.3 |    -38.8 |
|---------------------------+----------+----------+----------|
| read empty eql            |    -96.5 |    -96.5 |    -96.5 |
| read empty eq             |    -94.9 |    -94.9 |    -94.9 |
| read 100                  |    -36.2 |    -36.1 |    -36.8 |
| read 10000                |     -6.8 |     -7.0 |     -7.8 |

^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-07 18:36               ` Mattias Engdegård
@ 2024-01-07 19:10                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-08 18:26                   ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-07 19:10 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Dmitry Gutov, Eli Zaretskii, 68244

> I've pushed two new changes: a correction to the GC accounting for the
> ancillary hash-table vectors, and a rather more interesting change to the
> hash table range reduction. It now uses a Knuth multiplicative method
> instead of the expensive remainder, so the index is now always a power of
> 2 in size.

The change gives good results for small tables but less so for big ones.
I don't have a good intuition for why that would be: none of the
operations directly involved seem to be more costly for large tables, so
my best guess is that it leads to more collisions somehow, tho I don't
have a good idea about why that would be.


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-07 19:10                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-01-08 18:26                   ` Mattias Engdegård
  2024-01-09  0:33                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-08 18:26 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Gutov, Eli Zaretskii, 68244

7 jan. 2024 kl. 20.10 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> The change gives good results for small tables but less so for big ones.
> I don't have a good intuition for why that would be: none of the
> operations directly involved seem to be more costly for large tables, so
> my best guess is that it leads to more collisions somehow, tho I don't
> have a good idea about why that would be.

Yes, I wondered about that too. It could simply be bad sampling. The benchmarks with bad results used sequential numbers (0, 1, ...) as keys so let's start with that:

|  count |   size idxsiz  1st%  avg max |   size idxsiz  1st%  avg max
|--------+------------------------------+-----------------------------
|  10000 |  12466  15343 100.0 1.00   1 |  12288  16384 100.0 1.00   1
|  20000 |  28048  34523 100.0 1.00   1 |  24576  32768  98.8 1.01   2
|  30000 |  42072  51781 100.0 1.00   1 |  49152  65536  99.1 1.01   2
|  40000 |  42072  51781 100.0 1.00   1 |  49152  65536  94.5 1.06   2
|  50000 |  63108  77671 100.0 1.00   1 |  98304 131072 100.0 1.00   1
|  60000 |  63108  77671 100.0 1.00   1 |  98304 131072 100.0 1.00   1
|  70000 |  94662 116507 100.0 1.00   1 |  98304 131072 100.0 1.00   1
|  80000 |  94662 116507 100.0 1.00   1 |  98304 131072  96.0 1.04   2
|  90000 |  94662 116507 100.0 1.00   1 |  98304 131072  89.3 1.11   2
| 100000 | 141993 174761 100.0 1.00   1 | 196608 262144  92.9 1.07   2
| 110000 | 141993 174761 100.0 1.00   1 | 196608 262144  90.9 1.09   2
| 120000 | 141993 174761 100.0 1.00   1 | 196608 262144  89.3 1.11   2
| 130000 | 141993 174761 100.0 1.00   1 | 196608 262144  87.9 1.12   2
| 140000 | 141993 174761 100.0 1.00   1 | 196608 262144  86.7 1.13   2
| 150000 | 212989 262141 100.0 1.00   1 | 196608 262144  85.7 1.14   2
| 160000 | 212989 262141 100.0 1.00   1 | 196608 262144  84.8 1.15   2
| 170000 | 212989 262141 100.0 1.00   1 | 196608 262144  84.0 1.16   2
| 180000 | 212989 262141 100.0 1.00   1 | 196608 262144  83.3 1.17   2
| 190000 | 212989 262141 100.0 1.00   1 | 196608 262144  82.7 1.17   2
| 200000 | 212989 262141 100.0 1.00   1 | 393216 524288 100.0 1.00   1

The left columns are for the standard hash tables with remainder-based range reduction, the right ones with Knuth reduction.
'size' is the table size, 'idxsiz' the index vector size, '1st%' the portion of entries accessed with a single lookup, 'avg' the average number of accesses and 'max' the maximum number of accesses required.

The old code looks perfect (no collisions!), but the new shiny code is a disappointment.

Then again, sequential numbers are best-case for remainder-based indexing: Emacs hashes (smallish) fixnums to themselves so we are guaranteed a minimum number of collisions, actually zero since the index vector is always larger than the number of entries.

But sequential keys would be a somewhat unusual use of hash tables; a vector is a lot more efficient. Let's try with random fixnums instead, using the same seed for each table:

|  count |   size idxsiz 1st%  avg max |   size idxsiz 1st%  avg max
|--------+-----------------------------+----------------------------
|  10000 |  12466  15343 73.5 1.32   6 |  12288  16384 75.4 1.30   6
|  20000 |  28048  34523 75.7 1.29   6 |  24576  32768 75.2 1.30   6
|  30000 |  42072  51781 75.8 1.29   5 |  49152  65536 80.6 1.22   5
|  40000 |  42072  51781 69.6 1.39   7 |  49152  65536 75.0 1.30   6
|  50000 |  63108  77671 73.6 1.32   6 |  98304 131072 83.5 1.19   6
|  60000 |  63108  77671 69.6 1.39   7 |  98304 131072 80.5 1.23   6
|  70000 |  94662 116507 75.2 1.30   6 |  98304 131072 77.5 1.27   6
|  80000 |  94662 116507 72.4 1.34   7 |  98304 131072 75.0 1.30   7
|  90000 |  94662 116507 69.8 1.38   7 |  98304 131072 72.6 1.34   7
| 100000 | 141993 174761 76.2 1.29   6 | 196608 262144 83.3 1.19   6
| 110000 | 141993 174761 74.2 1.32   6 | 196608 262144 81.8 1.21   6
| 120000 | 141993 174761 72.4 1.34   7 | 196608 262144 80.3 1.23   6
| 130000 | 141993 174761 70.6 1.37   7 | 196608 262144 78.8 1.25   6
| 140000 | 141993 174761 68.8 1.40   7 | 196608 262144 77.6 1.27   6
| 150000 | 212989 262141 76.1 1.29   6 | 196608 262144 76.2 1.29   6
| 160000 | 212989 262141 74.8 1.31   6 | 196608 262144 74.9 1.30   6
| 170000 | 212989 262141 73.5 1.33   6 | 196608 262144 73.6 1.32   6
| 180000 | 212989 262141 72.3 1.35   7 | 196608 262144 72.3 1.34   6
| 190000 | 212989 262141 71.1 1.36   7 | 196608 262144 71.1 1.36   7
| 200000 | 212989 262141 69.9 1.38   7 | 393216 524288 83.1 1.19   5

Here the new code looks a bit better, but it could just be the bigger index vectors.
Not sure what to make of this.

We could try switching to a high-quality hash function (or family thereof), like Murmur or Jenkins. Then range reduction is just a matter of masking off the required number of bits.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-08 18:26                   ` Mattias Engdegård
@ 2024-01-09  0:33                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-09 10:26                       ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-09  0:33 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Dmitry Gutov, Eli Zaretskii, 68244

> The left columns are for the standard hash tables with remainder-based range
> reduction, the right ones with Knuth reduction.
> 'size' is the table size, 'idxsiz' the index vector size, '1st%' the portion
> of entries accessed with a single lookup, 'avg' the average number of
> accesses and 'max' the maximum number of accesses required.
>
> The old code looks perfect (no collisions!), but the new shiny code is a disappointment.

Ah, that could explain it, indeed.

BTW, what is the "per-entry byte-size" of your new code?
The old code had about 6 words per entry, IIRC.

> We could try switching to a high-quality hash function (or family thereof),
> like Murmur or Jenkins. Then range reduction is just a matter of masking off
> the required number of bits.

I don't see a strong need for it.

BTW, I see in the Knuth reduction you extract the bits 32..32+N of
the multiplication.  Any reason not to use the top N bits instead (so
we're not limited to 32 bits, for example)?


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-09  0:33                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-01-09 10:26                       ` Mattias Engdegård
  2024-01-13 20:06                         ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-09 10:26 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Gutov, Eli Zaretskii, 68244

9 jan. 2024 kl. 01.33 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> BTW, what is the "per-entry byte-size" of your new code?
> The old code had about 6 words per entry, IIRC.

With tables filled to capacity, the old code was about 5.25 (assuming an index vector size 1.25 times the table size). Now it's 1.625 words less, thus 3.625 words per entry. I haven't done the maths for what the average per-entry size would be if we take growth space into account.

The index vector can be shrunk further if we use a narrower index for smaller tables. This is a fairly common optimisation and usually the lower memory usage is worth an extra branch or two.

The hash-table object size is also down from 16 words to 10. 8 is actually quite achievable: consolidate the key_and_value, hash and next vectors into a single allocated block. It's just a matter of benchmarking to see what memory arrangement is the most beneficial.

>> We could try switching to a high-quality hash function (or family thereof),
>> like Murmur or Jenkins. Then range reduction is just a matter of masking off
>> the required number of bits.
> 
> I don't see a strong need for it.

Maybe not, but I wouldn't discount it out of hand. A few cheap ALU ops could easily pay for themselves if they lead to fewer collisions.

> BTW, I see in the Knuth reduction you extract the bits 32..32+N of
> the multiplication.

It's supposed to be bits [32-N,32) actually (hope I got that right).

>  Any reason not to use the top N bits instead (so
> we're not limited to 32 bits, for example)?

I thought about writing a clever expression that would work for other widths as well but it seemed like a waste of time given the current data structures.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:27                           ` Mattias Engdegård
  2024-01-04 16:39                             ` Eli Zaretskii
@ 2024-01-09 21:51                             ` Stefan Kangas
  2024-01-12 15:42                               ` Mattias Engdegård
  2024-01-14 22:08                             ` Andy Moreton
                                               ` (5 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Stefan Kangas @ 2024-01-09 21:51 UTC (permalink / raw)
  To: Mattias Engdegård, 68244

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> It's the hash-table's turn to be subjected to some performance
> tuning. The implementation has changed very little over the years and
> it's not so much a matter of a single big thing to fix as many small
> ones, so the list of changes is fairly long but ever little helps.
>
> A working patch series can be found in the scratch/hash-table-perf
> branch. Although it should all be satisfactory and an improvement on
> what was before, there are a couple of details that I'd like to do
> better, which is why this hasn't been merged yet: the way shared
> hash_table_test structs are cached isn't very elegant, nor is the way
> we deal with Qunbound in pdumper.c.

Thanks, these changes make sense to me both in terms of the performance
gains and the cleaner implementation.

The improved usability (de-emphasizing `hash-table-size', etc.) and
printed representation are certainly appreciated too.

Maybe lift `hash-table-rehash-size' and `hash-table-rehash-threshold' to
Lisp and mark them obsolete.  The hash table implementation is complex
enough for it to be worth avoiding cruft lying around.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-09 21:51                             ` Stefan Kangas
@ 2024-01-12 15:42                               ` Mattias Engdegård
  0 siblings, 0 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-12 15:42 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 68244

9 jan. 2024 kl. 22.51 skrev Stefan Kangas <stefankangas@gmail.com>:

> Maybe lift `hash-table-rehash-size' and `hash-table-rehash-threshold' to
> Lisp and mark them obsolete.

Yes, I thought about that. It's probably a good idea, and it won't inconvenience a lot of people.
(Warning about the corresponding keywords in make-hash-table is probably not worth it at this point.)






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-09 10:26                       ` Mattias Engdegård
@ 2024-01-13 20:06                         ` Mattias Engdegård
  2024-01-04 16:27                           ` Mattias Engdegård
                                             ` (3 more replies)
  0 siblings, 4 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-13 20:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Gutov, Eli Zaretskii, 68244, Stefan Kangas

All right, I've pushed the parts that I have little doubt about being a definite improvement: a batch of refactoring steps, then the hash-table printing reforms, the representation and growth algorithm improvements, and finally documentation updates, all now on master.

Not part of this yet:

- unifying hash and next vectors (or hash, next and key_and_value) in a single array: pending benchmarks.

- Knuth range reduction: probably a gain as described earlier, but I want to study the behaviour in more of an apples-to-apples comparison since the table and index sizes were changed at the same time.

- turning hash_table_test structures into Lisp objects so that they can be referenced from a symbol property: seemed to worsen performance slightly so putting this on hold for the time being.

- adaptive index vector width: measurements needed.

- fancier hashing: to be implemented.

Maybe we should promote equal-including-properties to a first-class hash table test? It's defined as a user test in  two places in the Emacs tree, and user tests are much slower than built-in ones.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-13 20:06                         ` Mattias Engdegård
  2024-01-04 16:27                           ` Mattias Engdegård
@ 2024-01-14  5:25                           ` Gerd Möllmann
  2024-01-14 14:42                             ` Mattias Engdegård
  2024-01-21 12:41                           ` Stefan Kangas
  2024-02-08  9:46                           ` Mattias Engdegård
  3 siblings, 1 reply; 80+ messages in thread
From: Gerd Möllmann @ 2024-01-14  5:25 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Dmitry Gutov, Eli Zaretskii, 68244, Stefan Monnier, Stefan Kangas

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> All right, I've pushed the parts that I have little doubt about being
> a definite improvement

These enums look a bit strange: Test_equal, Weak_Key_Or_Value.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-14  5:25                           ` Gerd Möllmann
@ 2024-01-14 14:42                             ` Mattias Engdegård
  0 siblings, 0 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-14 14:42 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Dmitry Gutov, Eli Zaretskii, 68244, Stefan Monnier, Stefan Kangas

14 jan. 2024 kl. 06.25 skrev Gerd Möllmann <gerd.moellmann@gmail.com>:

> These enums look a bit strange: Test_equal, Weak_Key_Or_Value.

Thank you for reading my changes! Yes, it probably should have been Test_Equal but I thought that it was more important to evoke the name of the symbol (`equal`) that that value encodes. But these are just names and can be altered as we see fit.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:27                           ` Mattias Engdegård
  2024-01-04 16:39                             ` Eli Zaretskii
  2024-01-09 21:51                             ` Stefan Kangas
@ 2024-01-14 22:08                             ` Andy Moreton
  2024-01-15 12:31                               ` Eli Zaretskii
  2024-01-15 20:01                             ` Andy Moreton
                                               ` (4 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Andy Moreton @ 2024-01-14 22:08 UTC (permalink / raw)
  To: 68244

On Sat 13 Jan 2024, Mattias Engdegård wrote:

> All right, I've pushed the parts that I have little doubt about being a
> definite improvement: a batch of refactoring steps, then the hash-table
> printing reforms, the representation and growth algorithm improvements, and
> finally documentation updates, all now on master.

Something in this patch series causes a build failure on Windows using
the Mingw64 toolchain. I've tried bisecting this, and at each step doing
"git clean -xdf" and then autogen, configure and make for a native build.

The last good commit on master is:
  c3d0cc50faf5 ("Remove rehash-threshold and rehash-size struct members")

Bisect identifies this as the bad commit:
  d3cefd3e9835 ("Leaner hash table dumping and thawing")

I have included the end of the log from the failing build below.

    AndyM

                           -----o0o-----

cp -f temacs.exe bootstrap-emacs.exe
rm -f bootstrap-emacs.pdmp
./temacs --batch  -l loadup --temacs=pbootstrap \
	--bin-dest c:/emacs/30.0.50/mingw64-x86_64-O2-native/bin/ --eln-dest c:/emacs/30.0.50/mingw64-x86_64-O2-native/lib/emacs/30.0.50/
Loading loadup.el (source)...
Dump mode: pbootstrap
Using load-path (c:/emacs/git/emacs/master/lisp c:/emacs/git/emacs/master/lisp/emacs-lisp c:/emacs/git/emacs/master/lisp/progmodes c:/emacs/git/emacs/master/lisp/language c:/emacs/git/emacs/master/lisp/international c:/emacs/git/emacs/master/lisp/textmodes c:/emacs/git/emacs/master/lisp/vc)
Loading emacs-lisp/debug-early (source)...

...[snipped]...

Loading c:/emacs/git/emacs/master/lisp/emacs-lisp/rmc.el (source)...
Finding pointers to doc strings...
Finding pointers to doc strings...done
Dumping under the name bootstrap-emacs.pdmp
Dumping fingerprint: 9f59e49a3ba95f3d7451327de9cf2863b9ddb392220c5f88f384f16245a619c2
Dump complete
Byte counts: header=100 hot=16147196 discardable=207320 cold=11360080
Reloc counts: hot=1123689 discardable=5662
ANCIENT=yes make -C ../lisp compile-first EMACS="../src/bootstrap-emacs.exe"
make[3]: Entering directory '/c/emacs/git/emacs/master/build/mingw64-x86_64-O2-native/lisp'
  ELC      /c/emacs/git/emacs/master/lisp/emacs-lisp/macroexp.elc

C:/emacs/git/emacs/master/src/lisp.h:2564: Emacs fatal error: assertion failed: 0 < size

Backtrace:
00007ff722cebc3e
00007ff722b674b1
00007ff722be90dc
00007ff722c33077
00007ff722c26f8a
00007ff722c271e1
00007ff722c28a62
00007ff722c26dc8
00007ff722c271e1
00007ff722c27515
00007ff722c28e7a
00007ff722c267a1
00007ff722c271e1
00007ff722c297a7
00007ff722c26dc8
00007ff722c271e1
00007ff722c27515
00007ff722c28e7a
00007ff722c267a1
00007ff722c271e1
00007ff722c28a62
00007ff722c26dc8
00007ff722c28358
00007ff722c26dc8
00007ff722c5d820
00007ff722c66167
00007ff722c67cd1
00007ff722c26f36
00007ff722c271e1
00007ff722c26dc8
00007ff722c271e1
00007ff722c28a62
00007ff722c26dc8
00007ff722c271e1
00007ff722c28a62
00007ff722c26dc8
00007ff722c29a80
00007ff722c26dc8
00007ff722c271e1
00007ff722c28a62
00007ff722c26dc8
00007ff722c271e1
00007ff722c26dc8
00007ff722c271e1
00007ff722c27515
00007ff722c2269e
00007ff722c67708
00007ff722c67898
00007ff722c22382
00007ff722c3dde8
00007ff722c26f72
00007ff722c5d820
00007ff722c66167
00007ff722c67cd1
00007ff722c26f36
00007ff722c271e1
00007ff722c26dc8
00007ff722c271e1
00007ff722c28a62
00007ff722c26dc8
00007ff722c271e1
00007ff722c28a62
...
make[3]: *** [Makefile:325: /c/emacs/git/emacs/master/lisp/emacs-lisp/macroexp.elc] Error 3
make[3]: Leaving directory '/c/emacs/git/emacs/master/build/mingw64-x86_64-O2-native/lisp'
make[2]: *** [Makefile:1017: bootstrap-emacs.pdmp] Error 2
make[2]: Leaving directory '/c/emacs/git/emacs/master/build/mingw64-x86_64-O2-native/src'
make[1]: *** [Makefile:554: src] Error 2
make[1]: Leaving directory '/c/emacs/git/emacs/master/build/mingw64-x86_64-O2-native'
make[1]: Entering directory '/c/emacs/git/emacs/master/build/mingw64-x86_64-O2-native'






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-14 22:08                             ` Andy Moreton
@ 2024-01-15 12:31                               ` Eli Zaretskii
  2024-01-15 13:26                                 ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2024-01-15 12:31 UTC (permalink / raw)
  To: Andy Moreton, Mattias Engdegård; +Cc: 68244

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Sun, 14 Jan 2024 22:08:14 +0000
> 
> On Sat 13 Jan 2024, Mattias Engdegård wrote:
> 
> > All right, I've pushed the parts that I have little doubt about being a
> > definite improvement: a batch of refactoring steps, then the hash-table
> > printing reforms, the representation and growth algorithm improvements, and
> > finally documentation updates, all now on master.
> 
> Something in this patch series causes a build failure on Windows using
> the Mingw64 toolchain. I've tried bisecting this, and at each step doing
> "git clean -xdf" and then autogen, configure and make for a native build.
> 
> The last good commit on master is:
>   c3d0cc50faf5 ("Remove rehash-threshold and rehash-size struct members")
> 
> Bisect identifies this as the bad commit:
>   d3cefd3e9835 ("Leaner hash table dumping and thawing")
> 
> I have included the end of the log from the failing build below.
> 
>     AndyM
> 
>                            -----o0o-----
> 
> cp -f temacs.exe bootstrap-emacs.exe
> rm -f bootstrap-emacs.pdmp
> ./temacs --batch  -l loadup --temacs=pbootstrap \
> 	--bin-dest c:/emacs/30.0.50/mingw64-x86_64-O2-native/bin/ --eln-dest c:/emacs/30.0.50/mingw64-x86_64-O2-native/lib/emacs/30.0.50/
> Loading loadup.el (source)...
> Dump mode: pbootstrap
> Using load-path (c:/emacs/git/emacs/master/lisp c:/emacs/git/emacs/master/lisp/emacs-lisp c:/emacs/git/emacs/master/lisp/progmodes c:/emacs/git/emacs/master/lisp/language c:/emacs/git/emacs/master/lisp/international c:/emacs/git/emacs/master/lisp/textmodes c:/emacs/git/emacs/master/lisp/vc)
> Loading emacs-lisp/debug-early (source)...
> 
> ...[snipped]...
> 
> Loading c:/emacs/git/emacs/master/lisp/emacs-lisp/rmc.el (source)...
> Finding pointers to doc strings...
> Finding pointers to doc strings...done
> Dumping under the name bootstrap-emacs.pdmp
> Dumping fingerprint: 9f59e49a3ba95f3d7451327de9cf2863b9ddb392220c5f88f384f16245a619c2
> Dump complete
> Byte counts: header=100 hot=16147196 discardable=207320 cold=11360080
> Reloc counts: hot=1123689 discardable=5662
> ANCIENT=yes make -C ../lisp compile-first EMACS="../src/bootstrap-emacs.exe"
> make[3]: Entering directory '/c/emacs/git/emacs/master/build/mingw64-x86_64-O2-native/lisp'
>   ELC      /c/emacs/git/emacs/master/lisp/emacs-lisp/macroexp.elc
> 
> C:/emacs/git/emacs/master/src/lisp.h:2564: Emacs fatal error: assertion failed: 0 < size

Thanks, but I see no assertion 0 < size on line 2564 of lisp.h, or
anywhere else in lisp.h.  Can you tell which assertion fails?

Adding Mattias to the discussion.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-15 12:31                               ` Eli Zaretskii
@ 2024-01-15 13:26                                 ` Mattias Engdegård
  2024-01-18 18:13                                   ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-15 13:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 68244, Andy Moreton

15 jan. 2024 kl. 13.31 skrev Eli Zaretskii <eliz@gnu.org>:

>> Bisect identifies this as the bad commit:
>>  d3cefd3e9835 ("Leaner hash table dumping and thawing")

Thank you, I can reproduce the (or at least an) assertion failure in this commit. However it's probably a ghost error: although I've tried to make sure that there is a correct build at each point, there have been a lot of work going on rearranging things for correctness and clarity, and apparently I never built this particular one with checking enabled the last time.

However this assertion should be removed by the next commit so the error is very short-lived (and it's just an over-zealous assertion, not an actual flaw). What problems do you observe when building master?

(And yes, I should update some hashes in pdumper.c; it's been a while.)






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:27                           ` Mattias Engdegård
                                               ` (2 preceding siblings ...)
  2024-01-14 22:08                             ` Andy Moreton
@ 2024-01-15 20:01                             ` Andy Moreton
  2024-01-15 20:21                               ` Eli Zaretskii
  2024-01-16 21:57                             ` Andy Moreton
                                               ` (3 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Andy Moreton @ 2024-01-15 20:01 UTC (permalink / raw)
  To: 68244

On Mon 15 Jan 2024, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Sun, 14 Jan 2024 22:08:14 +0000
>> 
>> On Sat 13 Jan 2024, Mattias Engdegård wrote:
>> 
>> > All right, I've pushed the parts that I have little doubt about being a
>> > definite improvement: a batch of refactoring steps, then the hash-table
>> > printing reforms, the representation and growth algorithm improvements, and
>> > finally documentation updates, all now on master.
>> 
>> Something in this patch series causes a build failure on Windows using
>> the Mingw64 toolchain. I've tried bisecting this, and at each step doing
>> "git clean -xdf" and then autogen, configure and make for a native build.
>> 
>> The last good commit on master is:
>>   c3d0cc50faf5 ("Remove rehash-threshold and rehash-size struct members")
>> 
>> Bisect identifies this as the bad commit:
>>   d3cefd3e9835 ("Leaner hash table dumping and thawing")
>> 
>> I have included the end of the log from the failing build below.
>> 
>>     AndyM
>> 
>>                            -----o0o-----
>> 
>> cp -f temacs.exe bootstrap-emacs.exe
>> rm -f bootstrap-emacs.pdmp
>> ./temacs --batch  -l loadup --temacs=pbootstrap \
>> 	--bin-dest c:/emacs/30.0.50/mingw64-x86_64-O2-native/bin/ --eln-dest c:/emacs/30.0.50/mingw64-x86_64-O2-native/lib/emacs/30.0.50/
>> Loading loadup.el (source)...
>> Dump mode: pbootstrap
>> Using load-path (c:/emacs/git/emacs/master/lisp c:/emacs/git/emacs/master/lisp/emacs-lisp c:/emacs/git/emacs/master/lisp/progmodes c:/emacs/git/emacs/master/lisp/language c:/emacs/git/emacs/master/lisp/international c:/emacs/git/emacs/master/lisp/textmodes c:/emacs/git/emacs/master/lisp/vc)
>> Loading emacs-lisp/debug-early (source)...
>> 
>> ...[snipped]...
>> 
>> Loading c:/emacs/git/emacs/master/lisp/emacs-lisp/rmc.el (source)...
>> Finding pointers to doc strings...
>> Finding pointers to doc strings...done
>> Dumping under the name bootstrap-emacs.pdmp
>> Dumping fingerprint: 9f59e49a3ba95f3d7451327de9cf2863b9ddb392220c5f88f384f16245a619c2
>> Dump complete
>> Byte counts: header=100 hot=16147196 discardable=207320 cold=11360080
>> Reloc counts: hot=1123689 discardable=5662
>> ANCIENT=yes make -C ../lisp compile-first EMACS="../src/bootstrap-emacs.exe"
>> make[3]: Entering directory '/c/emacs/git/emacs/master/build/mingw64-x86_64-O2-native/lisp'
>>   ELC      /c/emacs/git/emacs/master/lisp/emacs-lisp/macroexp.elc
>> 
>> C:/emacs/git/emacs/master/src/lisp.h:2564: Emacs fatal error: assertion failed: 0 < size
>
> Thanks, but I see no assertion 0 < size on line 2564 of lisp.h, or
> anywhere else in lisp.h.  Can you tell which assertion fails?

In commit d3cefd3e9835, line 2564 of lisp.h has "eassume (0 < size);" in
HASH_TABLE_SIZE().

If the hash table changes now allow zero sized tables then that would
trigger in a build with ENABLE_CHECKING. My build script had checking
enabled, which would explain the failure. Sorry for not mentioning that
earlier (I had forgotten that it was enabled).

With ENABLE_CHECKING disabled the build succeeds. Thus the invariants
need looking at to allow a checked build to work.

    AndyM






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-15 20:01                             ` Andy Moreton
@ 2024-01-15 20:21                               ` Eli Zaretskii
  0 siblings, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2024-01-15 20:21 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 68244

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Mon, 15 Jan 2024 20:01:28 +0000
> 
> On Mon 15 Jan 2024, Eli Zaretskii wrote:
> 
> >> From: Andy Moreton <andrewjmoreton@gmail.com>
> >> Date: Sun, 14 Jan 2024 22:08:14 +0000
> >> 
> >> On Sat 13 Jan 2024, Mattias Engdegård wrote:
> >> 
> >> > All right, I've pushed the parts that I have little doubt about being a
> >> > definite improvement: a batch of refactoring steps, then the hash-table
> >> > printing reforms, the representation and growth algorithm improvements, and
> >> > finally documentation updates, all now on master.
> >> 
> >> Something in this patch series causes a build failure on Windows using
> >> the Mingw64 toolchain. I've tried bisecting this, and at each step doing
> >> "git clean -xdf" and then autogen, configure and make for a native build.
> >> 
> >> The last good commit on master is:
> >>   c3d0cc50faf5 ("Remove rehash-threshold and rehash-size struct members")
> >> 
> >> Bisect identifies this as the bad commit:
> >>   d3cefd3e9835 ("Leaner hash table dumping and thawing")
> >> 
> >> I have included the end of the log from the failing build below.
> >> 
> >>     AndyM
> >> 
> >>                            -----o0o-----
> >> 
> >> cp -f temacs.exe bootstrap-emacs.exe
> >> rm -f bootstrap-emacs.pdmp
> >> ./temacs --batch  -l loadup --temacs=pbootstrap \
> >> 	--bin-dest c:/emacs/30.0.50/mingw64-x86_64-O2-native/bin/ --eln-dest c:/emacs/30.0.50/mingw64-x86_64-O2-native/lib/emacs/30.0.50/
> >> Loading loadup.el (source)...
> >> Dump mode: pbootstrap
> >> Using load-path (c:/emacs/git/emacs/master/lisp c:/emacs/git/emacs/master/lisp/emacs-lisp c:/emacs/git/emacs/master/lisp/progmodes c:/emacs/git/emacs/master/lisp/language c:/emacs/git/emacs/master/lisp/international c:/emacs/git/emacs/master/lisp/textmodes c:/emacs/git/emacs/master/lisp/vc)
> >> Loading emacs-lisp/debug-early (source)...
> >> 
> >> ...[snipped]...
> >> 
> >> Loading c:/emacs/git/emacs/master/lisp/emacs-lisp/rmc.el (source)...
> >> Finding pointers to doc strings...
> >> Finding pointers to doc strings...done
> >> Dumping under the name bootstrap-emacs.pdmp
> >> Dumping fingerprint: 9f59e49a3ba95f3d7451327de9cf2863b9ddb392220c5f88f384f16245a619c2
> >> Dump complete
> >> Byte counts: header=100 hot=16147196 discardable=207320 cold=11360080
> >> Reloc counts: hot=1123689 discardable=5662
> >> ANCIENT=yes make -C ../lisp compile-first EMACS="../src/bootstrap-emacs.exe"
> >> make[3]: Entering directory '/c/emacs/git/emacs/master/build/mingw64-x86_64-O2-native/lisp'
> >>   ELC      /c/emacs/git/emacs/master/lisp/emacs-lisp/macroexp.elc
> >> 
> >> C:/emacs/git/emacs/master/src/lisp.h:2564: Emacs fatal error: assertion failed: 0 < size
> >
> > Thanks, but I see no assertion 0 < size on line 2564 of lisp.h, or
> > anywhere else in lisp.h.  Can you tell which assertion fails?
> 
> In commit d3cefd3e9835, line 2564 of lisp.h has "eassume (0 < size);" in
> HASH_TABLE_SIZE().
> 
> If the hash table changes now allow zero sized tables then that would
> trigger in a build with ENABLE_CHECKING. My build script had checking
> enabled, which would explain the failure. Sorry for not mentioning that
> earlier (I had forgotten that it was enabled).
> 
> With ENABLE_CHECKING disabled the build succeeds. Thus the invariants
> need looking at to allow a checked build to work.

I always build with ENABLE_CHECKING, so I still don't understand why I
don't see the problem.  Can you suggest a way that avoids a full
bootstrap, but just compiles a (small) bunch of Lisp files?





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:27                           ` Mattias Engdegård
                                               ` (3 preceding siblings ...)
  2024-01-15 20:01                             ` Andy Moreton
@ 2024-01-16 21:57                             ` Andy Moreton
  2024-01-17  3:31                               ` Eli Zaretskii
  2024-01-18 20:29                             ` Andy Moreton
                                               ` (2 subsequent siblings)
  7 siblings, 1 reply; 80+ messages in thread
From: Andy Moreton @ 2024-01-16 21:57 UTC (permalink / raw)
  To: 68244

On Mon 15 Jan 2024, Mattias Engdegård wrote:

> 15 jan. 2024 kl. 13.31 skrev Eli Zaretskii <eliz@gnu.org>:
>
>>> Bisect identifies this as the bad commit:
>>>  d3cefd3e9835 ("Leaner hash table dumping and thawing")
>
> Thank you, I can reproduce the (or at least an) assertion failure in this
> commit. However it's probably a ghost error: although I've tried to make sure
> that there is a correct build at each point, there have been a lot of work
> going on rearranging things for correctness and clarity, and apparently I
> never built this particular one with checking enabled the last time.
>
> However this assertion should be removed by the next commit so the error is
> very short-lived (and it's just an over-zealous assertion, not an actual
> flaw). What problems do you observe when building master?
>
> (And yes, I should update some hashes in pdumper.c; it's been a while.)

I bisected from newer master to get to that point. Bootstrapping master
commit f19f5604deb7 ("Update pdumper hashes for buffer and Lisp_Hash_Table")
is still failing, and trying to continue the buid by retrying after that
is glacially slow, so something is badly broken at that point.

Note that the same commit bootstrapped after "git clean -xdf" still has
multiple backtraces when building without ENABLE_CHECKING.

    AndyM






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-16 21:57                             ` Andy Moreton
@ 2024-01-17  3:31                               ` Eli Zaretskii
  0 siblings, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2024-01-17  3:31 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 68244

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Tue, 16 Jan 2024 21:57:04 +0000
> 
> On Mon 15 Jan 2024, Mattias Engdegård wrote:
> 
> > 15 jan. 2024 kl. 13.31 skrev Eli Zaretskii <eliz@gnu.org>:
> >
> >>> Bisect identifies this as the bad commit:
> >>>  d3cefd3e9835 ("Leaner hash table dumping and thawing")
> >
> > Thank you, I can reproduce the (or at least an) assertion failure in this
> > commit. However it's probably a ghost error: although I've tried to make sure
> > that there is a correct build at each point, there have been a lot of work
> > going on rearranging things for correctness and clarity, and apparently I
> > never built this particular one with checking enabled the last time.
> >
> > However this assertion should be removed by the next commit so the error is
> > very short-lived (and it's just an over-zealous assertion, not an actual
> > flaw). What problems do you observe when building master?
> >
> > (And yes, I should update some hashes in pdumper.c; it's been a while.)
> 
> I bisected from newer master to get to that point. Bootstrapping master
> commit f19f5604deb7 ("Update pdumper hashes for buffer and Lisp_Hash_Table")
> is still failing, and trying to continue the buid by retrying after that
> is glacially slow, so something is badly broken at that point.
> 
> Note that the same commit bootstrapped after "git clean -xdf" still has
> multiple backtraces when building without ENABLE_CHECKING.

Can you show the first crash/assertion violation when building with
ENABLE_CHECKING?  It is important to see the error message, and if you
can run the failing command under GDB and show a backtrace, it will be
even more useful.

Thanks.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-15 13:26                                 ` Mattias Engdegård
@ 2024-01-18 18:13                                   ` Mattias Engdegård
  0 siblings, 0 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-01-18 18:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 68244, Andy Moreton

I still don't know what problems you observed but have now committed fixes for two other assertion failures to master, ef01250e and e7a6ce84. Andy, did they help?






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:27                           ` Mattias Engdegård
                                               ` (4 preceding siblings ...)
  2024-01-16 21:57                             ` Andy Moreton
@ 2024-01-18 20:29                             ` Andy Moreton
  2024-01-19  6:37                               ` Eli Zaretskii
  2024-01-20 20:20                             ` Andy Moreton
  2024-01-21 13:03                             ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  7 siblings, 1 reply; 80+ messages in thread
From: Andy Moreton @ 2024-01-18 20:29 UTC (permalink / raw)
  To: 68244

On Thu 18 Jan 2024, Mattias Engdegård wrote:

> I still don't know what problems you observed but have now committed fixes for
> two other assertion failures to master, ef01250e and e7a6ce84. Andy, did they
> help?

I am begining to suspect a toolchain problem, as I see backtraces when
bootstrapping emacs-28 and emacs-29 native (.eln) builds, but not when
bootstrapping a non-native (.elc) build.

    AndyM







^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-18 20:29                             ` Andy Moreton
@ 2024-01-19  6:37                               ` Eli Zaretskii
  0 siblings, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2024-01-19  6:37 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 68244

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Thu, 18 Jan 2024 20:29:52 +0000
> 
> On Thu 18 Jan 2024, Mattias Engdegård wrote:
> 
> > I still don't know what problems you observed but have now committed fixes for
> > two other assertion failures to master, ef01250e and e7a6ce84. Andy, did they
> > help?
> 
> I am begining to suspect a toolchain problem, as I see backtraces when
> bootstrapping emacs-28 and emacs-29 native (.eln) builds, but not when
> bootstrapping a non-native (.elc) build.

Do the backtraces follow an assertion violation?  Or are those crashes
(as opposed to aborts)?

Can you show a couple of examples, please?





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:27                           ` Mattias Engdegård
                                               ` (5 preceding siblings ...)
  2024-01-18 20:29                             ` Andy Moreton
@ 2024-01-20 20:20                             ` Andy Moreton
  2024-01-21  5:11                               ` Eli Zaretskii
  2024-01-21 13:03                             ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  7 siblings, 1 reply; 80+ messages in thread
From: Andy Moreton @ 2024-01-20 20:20 UTC (permalink / raw)
  To: 68244

On Fri 19 Jan 2024, Eli Zaretskii wrote:

>> From: Andy Moreton <andrewjmoreton@gmail.com>
>> Date: Thu, 18 Jan 2024 20:29:52 +0000
>> 
>> On Thu 18 Jan 2024, Mattias Engdegård wrote:
>> 
>> > I still don't know what problems you observed but have now committed fixes for
>> > two other assertion failures to master, ef01250e and e7a6ce84. Andy, did they
>> > help?
>> 
>> I am begining to suspect a toolchain problem, as I see backtraces when
>> bootstrapping emacs-28 and emacs-29 native (.eln) builds, but not when
>> bootstrapping a non-native (.elc) build.
>
> Do the backtraces follow an assertion violation?  Or are those crashes
> (as opposed to aborts)?
>
> Can you show a couple of examples, please?

I can no longer reproduce the problems on master or on emacs-29 with
ENABLE_CHECKING.

I still get a few backtraces repeatably when bootstrapping a native
build on the emacs-28 branch, but I suspect that is not of much interest
any more.

    AndyM






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-20 20:20                             ` Andy Moreton
@ 2024-01-21  5:11                               ` Eli Zaretskii
  0 siblings, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2024-01-21  5:11 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 68244

> From: Andy Moreton <andrewjmoreton@gmail.com>
> Date: Sat, 20 Jan 2024 20:20:57 +0000
> 
> I can no longer reproduce the problems on master or on emacs-29 with
> ENABLE_CHECKING.

OK, thanks.

> I still get a few backtraces repeatably when bootstrapping a native
> build on the emacs-28 branch, but I suspect that is not of much interest
> any more.

Right.  If you want to be able to compile that branch anyway, my
suggestion is to lower native-comp-speed to 1.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-13 20:06                         ` Mattias Engdegård
  2024-01-04 16:27                           ` Mattias Engdegård
  2024-01-14  5:25                           ` Gerd Möllmann
@ 2024-01-21 12:41                           ` Stefan Kangas
  2024-02-08  9:46                           ` Mattias Engdegård
  3 siblings, 0 replies; 80+ messages in thread
From: Stefan Kangas @ 2024-01-21 12:41 UTC (permalink / raw)
  To: Mattias Engdegård, Stefan Monnier; +Cc: Dmitry Gutov, Eli Zaretskii, 68244

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> Maybe we should promote equal-including-properties to a first-class
> hash table test? It's defined as a user test in two places in the
> Emacs tree, and user tests are much slower than built-in ones.

It might be worth a try.  I see that it's used in the native compiler,
for example, and we're obviously very happy if we can make that stuff
more performant.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-04 16:27                           ` Mattias Engdegård
                                               ` (6 preceding siblings ...)
  2024-01-20 20:20                             ` Andy Moreton
@ 2024-01-21 13:03                             ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-01-22  9:17                               ` João Távora
  7 siblings, 1 reply; 80+ messages in thread
From: Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-21 13:03 UTC (permalink / raw)
  To: João Távora; +Cc: Mattias Engdegård, 68244

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

Mattias Engdegård [2024-01-04 17:27 +0100] wrote:

> It's the hash-table's turn to be subjected to some performance tuning.

Thanks!

> The implementation has changed very little over the years and it's not
> so much a matter of a single big thing to fix as many small ones, so
> the list of changes is fairly long but ever little helps.

João, is this tiny additional straw okay for the camel's back?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Further-shrink-eglot.patch --]
[-- Type: text/x-diff, Size: 1002 bytes --]

From 359e5d5b90754373ff8a2db6431421d7838c113b Mon Sep 17 00:00:00 2001
From: "Basil L. Contovounesios" <contovob@tcd.ie>
Date: Fri, 19 Jan 2024 13:50:29 +0100
Subject: [PATCH] Further shrink eglot--{}

Up to and including Emacs 29, :size 0 was an alias for :size 1.
Emacs 30 gained support for :size 0 hash tables (bug#68244).

* lisp/progmodes/eglot.el (eglot--{}): Define as truly zero-sized.
---
 lisp/progmodes/eglot.el | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index c5cfdd3cedd..511000927cf 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -575,7 +575,7 @@ eglot--tag-faces
 
 (defvaralias 'eglot-{} 'eglot--{})
 
-(defconst eglot--{} (make-hash-table :size 1) "The empty JSON object.")
+(defconst eglot--{} (make-hash-table :size 0) "The empty JSON object.")
 
 (defun eglot--executable-find (command &optional remote)
   "Like Emacs 27's `executable-find', ignore REMOTE on Emacs 26."
-- 
2.43.0


[-- Attachment #3: Type: text/plain, Size: 11 bytes --]


-- 
Basil

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-21 13:03                             ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-01-22  9:17                               ` João Távora
  2024-01-22  9:18                                 ` João Távora
  2024-01-23  9:44                                 ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 80+ messages in thread
From: João Távora @ 2024-01-22  9:17 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: Mattias Engdegård, 68244

[-- Attachment #1: Type: text/plain, Size: 653 bytes --]

On Sun, Jan 21, 2024, 13:03 Basil L. Contovounesios <contovob@tcd.ie> wrote:

> Mattias Engdegård [2024-01-04 17:27 +0100] wrote:
>
> > It's the hash-table's turn to be subjected to some performance tuning.
>
> Thanks!
>
> > The implementation has changed very little over the years and it's not
> > so much a matter of a single big thing to fix as many small ones, so
> > the list of changes is fairly long but ever little helps.
>
> João, is this tiny additional straw okay for the camel's back?
>

Dine, I guess.  But this is about a singleton hash table object right?
Will it work with Emacs 28 and 29 via GNU Elpa?

João

>

[-- Attachment #2: Type: text/html, Size: 1453 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-22  9:17                               ` João Távora
@ 2024-01-22  9:18                                 ` João Távora
  2024-01-23  9:44                                 ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 80+ messages in thread
From: João Távora @ 2024-01-22  9:18 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: Mattias Engdegård, 68244

On Mon, Jan 22, 2024 at 9:17 AM João Távora <joaotavora@gmail.com> wrote:

> Dine, I guess.  But this is about a singleton hash table object right?

Yes, you should probably "dine" regularly as much as possible.  But
this change is also "fine".





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-22  9:17                               ` João Távora
  2024-01-22  9:18                                 ` João Távora
@ 2024-01-23  9:44                                 ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 80+ messages in thread
From: Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-01-23  9:44 UTC (permalink / raw)
  To: João Távora; +Cc: 68244

João Távora [2024-01-22 09:17 +0000] wrote:

> Dine, I guess.

Pushed, and thanks!  I'll send you the receipt separately.

> But this is about a singleton hash table object right?

Yes, specifically an empty one.

> Will it work with Emacs 28 and 29 via GNU Elpa?

Should do:
$ for v in 24.5 25.3 26.3 27.2 28.2 29.1; do
    emacs-$v -Q -batch \
      -eval '(message "%.2s %s" emacs-version (make-hash-table :size 0))'
  done
24 #s(hash-table size 1 test eql rehash-size 1.5 rehash-threshold 0.8 data ())
25 #s(hash-table size 1 test eql rehash-size 1.5 rehash-threshold 0.8 data ())
26 #s(hash-table size 1 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
27 #s(hash-table size 1 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
28 #s(hash-table size 1 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
29 #s(hash-table size 1 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())

-- 
Basil





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-01-13 20:06                         ` Mattias Engdegård
                                             ` (2 preceding siblings ...)
  2024-01-21 12:41                           ` Stefan Kangas
@ 2024-02-08  9:46                           ` Mattias Engdegård
  2024-02-08 14:19                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  3 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-08  9:46 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

I stopped dithering and pushed the Knuth range reduction change to master.

We should probably do the same for obarrays but those have more low-hanging fruit. I'll be back with some changes.








^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-08  9:46                           ` Mattias Engdegård
@ 2024-02-08 14:19                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-08 14:36                               ` Gerd Möllmann
  2024-02-08 14:42                               ` Mattias Engdegård
  0 siblings, 2 replies; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-02-08 14:19 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

> We should probably do the same for obarrays but those have more low-hanging
> fruit. I'll be back with some changes.

For me, the more important aspect of obarrays is that we should have
a separate obarray type (if it can share code with hash-table, that's
gravy).
Of course, we'd still have to *also* accept plain vectors, sadly.

Some years ago I proposed a patch which changed nothing to the
implementation of obarrays but introduced a new PVEC_OBARRAY which was
used instead of the "PVEC_VECTOR" (and given the variable size of
vectors *and* of those obarrays, these two tags were treated specially).
It suffered from the lack of bits in the word that holds the PVEC_*
tag and the vector's size limiting the size of obarrays to
too-small value.

But I think we can do something much simpler: vectors are accepted by
simply storing in the first field a reference to the actual obarray
object, so there's no need to play any games and/or share representation
between vectors and obarray.


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-08 14:19                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-02-08 14:36                               ` Gerd Möllmann
  2024-02-08 14:42                               ` Mattias Engdegård
  1 sibling, 0 replies; 80+ messages in thread
From: Gerd Möllmann @ 2024-02-08 14:36 UTC (permalink / raw)
  To: 68244; +Cc: mattias.engdegard, dmitry, monnier, eliz, stefankangas

Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs@gnu.org> writes:

> But I think we can do something much simpler: vectors are accepted by
> simply storing in the first field a reference to the actual obarray
> object, so there's no need to play any games and/or share representation
> between vectors and obarray.

Fun fact:

  static Lisp_Object
  pkg_fake_me_an_obarray (Lisp_Object vector)
  {
    eassert (VECTORP (vector));
    Lisp_Object package = Faref (vector, make_fixnum (0));
    if (!PACKAGEP (package))
      {
        package = pkg_make_package (build_string ("obarray"),
                                    Flength (vector));
        Faset (vector, make_fixnum (0), package);
      }
    return package;
  }

Works like a charm :-). And the next pointer in Lisp_Symbol is also
gone.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-08 14:19                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-08 14:36                               ` Gerd Möllmann
@ 2024-02-08 14:42                               ` Mattias Engdegård
  2024-02-08 15:13                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-08 14:42 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

8 feb. 2024 kl. 15.19 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> For me, the more important aspect of obarrays is that we should have
> a separate obarray type

Oh yes. This is necessary for automatic growth if nothing else.

> Of course, we'd still have to *also* accept plain vectors, sadly.

In some contexts such as `try-complete` perhaps, but I don't think we're saddled with it forever in places that actually use an obarray (like `intern`).

>  we can do something much simpler: vectors are accepted by
> simply storing in the first field a reference to the actual obarray
> object, so there's no need to play any games and/or share representation
> between vectors and obarray.

Why is that simpler than having `intern` accept both obarray objects and plain vectors, during a transitional period?






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-08 14:42                               ` Mattias Engdegård
@ 2024-02-08 15:13                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-08 17:29                                   ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-02-08 15:13 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

>>  we can do something much simpler: vectors are accepted by
>> simply storing in the first field a reference to the actual obarray
>> object, so there's no need to play any games and/or share representation
>> between vectors and obarray.
>
> Why is that simpler than having `intern` accept both obarray objects and
> plain vectors, during a transitional period?

I don't know in which sense what I propose is different from what you
mean by "accept both obarray objects and plain vectors": what I propose
is a way to "accept both obarray objects and plain vectors".


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-08 15:13                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-02-08 17:29                                   ` Mattias Engdegård
  2024-02-08 17:49                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-08 17:29 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

> I don't know in which sense what I propose is different from what you
> mean by "accept both obarray objects and plain vectors": what I propose
> is a way to "accept both obarray objects and plain vectors".

Of course, sorry about that.

And you are right about the need for vector compatibility, just discovered that the hard way. The practice to use `make-vector` to create obarrays is widespread. (I have patch for the Emacs tree.)







^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-08 17:29                                   ` Mattias Engdegård
@ 2024-02-08 17:49                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-12 12:16                                       ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-02-08 17:49 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

> And you are right about the need for vector compatibility, just discovered
> that the hard way.

Welcome to the club :-)


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-08 17:49                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-02-12 12:16                                       ` Mattias Engdegård
  2024-02-12 13:36                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-13  9:05                                         ` Gerd Möllmann
  0 siblings, 2 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-12 12:16 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

8 feb. 2024 kl. 18.49 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

>> And you are right about the need for vector compatibility, just discovered
>> that the hard way.
> 
> Welcome to the club :-)

And to show that I'm repeating all your mistakes and ignoring your hard-won wisdom, I made a patch for PVEC_OBARRAY.

* It does give a nice speed-up, not only in microbenchmarks. Relint on the Emacs directory tree sees a speed-up of 5 % which is not bad considering how much work that code does.

* Contrary to your suggestion I went with an entirely new type, which means that:

- obarray objects grow automatically and use the same faster hashing (Knuth) as hash tables
- `obarrayp` is now true for both obarray objects and vectors; `obarray-object-p` detects the new type only.
- `obarray-make` now produces an obarray object.
- All old built-in functions that take an obarray now accept both vectors and obarray objects
- New function `obarray-clear` to replace code that filled vectors with 0

* Compatibility with existing code is excellent except for some places that used `obarray-make` but then assumed the result to be vectors (using `vectorp` instead of `obarrayp` etc).

Your suggestion to represent obarrays as a single-element vector containing an actual object (PVEC_OBARRAY or a hash table) would kind of help here but it still seems like a half-measure and perpetuates some problems that we would like to avoid with a new type.

Maybe a different kind of compromise would be better: `obarray-make` is kept unchanged as make-vector (but deprecated), and a new `make-obarray` (for example) creates the new objects.

Oh, and there is a small glitch in the existing code: when a symbol is uninterned, its status is set to SYMBOL_UNINTERNED, but if its containing obarray is GCed away or manually cleared then this doesn't happen. This is probably not worth fixing but at least with PVEC_OBARRAY we could if we wanted to.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-12 12:16                                       ` Mattias Engdegård
@ 2024-02-12 13:36                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-13  9:05                                         ` Gerd Möllmann
  1 sibling, 0 replies; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-02-12 13:36 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

> - `obarrayp` is now true for both obarray objects and vectors;
> `obarray-object-p` detects the new type only.

Is this really necessary?


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-12 12:16                                       ` Mattias Engdegård
  2024-02-12 13:36                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-02-13  9:05                                         ` Gerd Möllmann
  2024-02-13 10:12                                           ` Mattias Engdegård
  2024-02-24  2:46                                           ` Dmitry Gutov
  1 sibling, 2 replies; 80+ messages in thread
From: Gerd Möllmann @ 2024-02-13  9:05 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Dmitry Gutov, Eli Zaretskii, 68244, Stefan Monnier, Stefan Kangas

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> * Compatibility with existing code is excellent except for some places
> that used `obarray-make` but then assumed the result to be vectors
> (using `vectorp` instead of `obarrayp` etc).

I wonder, what does your code do for the places using (make-vector n 0)
create obarrays? I'm not 100% sure, but I think Cedet was an example of
that in Emacs itself.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-13  9:05                                         ` Gerd Möllmann
@ 2024-02-13 10:12                                           ` Mattias Engdegård
  2024-02-13 12:12                                             ` Gerd Möllmann
  2024-02-13 12:43                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-24  2:46                                           ` Dmitry Gutov
  1 sibling, 2 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-13 10:12 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Dmitry Gutov, Eli Zaretskii, 68244, Stefan Monnier, Stefan Kangas

13 feb. 2024 kl. 10.05 skrev Gerd Möllmann <gerd.moellmann@gmail.com>:

> I wonder, what does your code do for the places using (make-vector n 0)
> create obarrays? I'm not 100% sure, but I think Cedet was an example of
> that in Emacs itself.

That code would still work, but just for completeness I changed all those instances to using `obarray-make`, including cedet. (See the branch scratch/obarray.)

A more worrying problem is code that uses `obarray-make` but assumes the result to be vectors. It may be unsafe to let obarray-make return an obarray object for that reason. With Stefan's suggestion, if I understood it right, we'd have `obarray-make` return (vector 0). I'll give it a try.







^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-13 10:12                                           ` Mattias Engdegård
@ 2024-02-13 12:12                                             ` Gerd Möllmann
  2024-02-13 12:43                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 80+ messages in thread
From: Gerd Möllmann @ 2024-02-13 12:12 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Dmitry Gutov, Eli Zaretskii, 68244, Stefan Monnier, Stefan Kangas

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> 13 feb. 2024 kl. 10.05 skrev Gerd Möllmann <gerd.moellmann@gmail.com>:
>
>> I wonder, what does your code do for the places using (make-vector n 0)
>> create obarrays? I'm not 100% sure, but I think Cedet was an example of
>> that in Emacs itself.
>
> That code would still work, but just for completeness I changed all
> those instances to using `obarray-make`, including cedet. (See the
> branch scratch/obarray.)
>
> A more worrying problem is code that uses `obarray-make` but assumes
> the result to be vectors. It may be unsafe to let obarray-make return
> an obarray object for that reason. With Stefan's suggestion, if I
> understood it right, we'd have `obarray-make` return (vector 0). I'll
> give it a try.

Thanks for the pointer.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-13 10:12                                           ` Mattias Engdegård
  2024-02-13 12:12                                             ` Gerd Möllmann
@ 2024-02-13 12:43                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-14 12:37                                               ` Mattias Engdegård
  1 sibling, 1 reply; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-02-13 12:43 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

> A more worrying problem is code that uses `obarray-make` but assumes the
> result to be vectors.  It may be unsafe to let obarray-make return an obarray
> object for that reason.

I don't think it'd be "unsafe": it just introduces a bit
of incompatibility.
But `obarrayp` has been with us since Emacs-25 and it's trivial to
change code using `vectorp`, so I wouldn't worry about it.

> With Stefan's suggestion, if I understood it right,
> we'd have `obarray-make` return (vector 0).

I did not suggest such a thing.


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-13 12:43                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-02-14 12:37                                               ` Mattias Engdegård
  2024-02-14 13:05                                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-14 12:37 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

13 feb. 2024 kl. 13.43 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

>> A more worrying problem is code that uses `obarray-make` but assumes the
>> result to be vectors.  It may be unsafe to let obarray-make return an obarray
>> object for that reason.
> 
> I don't think it'd be "unsafe": it just introduces a bit
> of incompatibility.
> But `obarrayp` has been with us since Emacs-25 and it's trivial to
> change code using `vectorp`, so I wouldn't worry about it.

If we can do that, it's naturally preferable. Let's see how it goes.

>> With Stefan's suggestion, if I understood it right,
>> we'd have `obarray-make` return (vector 0).
> 
> I did not suggest such a thing.

Sorry, didn't mean to misrepresent. I should have said that it is one way of implementing `obarray-make` if compatibility with vector-assuming code really is a serious concern.

> I think it's worth introducing a bit of incompatibility, for the benefit
> of a cleaner API.  Such incompatibility should be very easy to fix while
> still maintaining compatibility with old Emacsen (at least back to
> Emacs-25, where the `obarray-make` and `obarrayp` were introduced).

I'm all for it but those obarray functions weren't mentioned in NEWS at the time nor in the manual (which even today recommends, even mandates, use of make-vector) so perhaps we need a bridge?






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-14 12:37                                               ` Mattias Engdegård
@ 2024-02-14 13:05                                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-02-14 13:21                                                   ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-02-14 13:05 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

>>> With Stefan's suggestion, if I understood it right,
>>> we'd have `obarray-make` return (vector 0).
>> I did not suggest such a thing.
> Sorry, didn't mean to misrepresent. I should have said that it is one way of
> implementing `obarray-make` if compatibility with vector-assuming code
> really is a serious concern.

BTW, my idea adjusted for the kind of compatibility you're after would
have been to define `obarray-make` as (vector (internal-make-real-obarray))

>> I think it's worth introducing a bit of incompatibility, for the benefit
>> of a cleaner API.  Such incompatibility should be very easy to fix while
>> still maintaining compatibility with old Emacsen (at least back to
>> Emacs-25, where the `obarray-make` and `obarrayp` were introduced).
> I'm all for it but those obarray functions weren't mentioned in NEWS at the
> time nor in the manual (which even today recommends, even mandates, use of
> make-vector) so perhaps we need a bridge?

You might be right.  I guess time will tell :-)


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-14 13:05                                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-02-14 13:21                                                   ` Mattias Engdegård
  2024-02-17 19:50                                                     ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-14 13:21 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

14 feb. 2024 kl. 14.05 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> BTW, my idea adjusted for the kind of compatibility you're after would
> have been to define `obarray-make` as (vector (internal-make-real-obarray))

Yes, but since most primitives would need to accept a zero-filled vector anyway (and replace the first element with an obarray object), `obarray-make` might just as well just return [0]. Essentially:

  (intern X [0 ...]) -> (intern X [NEW-OBARRAY ...]) -> (intern X NEW-OBARRAY)

  (intern-soft X [0 ...]) -> nil   ;ie, treat as empty
  (intern-soft X [OBARRAY ...]) -> (intern-soft X OBARRAY)






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-14 13:21                                                   ` Mattias Engdegård
@ 2024-02-17 19:50                                                     ` Mattias Engdegård
  2024-02-20 10:21                                                       ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-17 19:50 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

The scratch/obarray branch has now been updated. Now obarrayp detects obarray objects only.
Some added polish: manual updates, the obarray type can be used in cl-defmethod, iterators for pretties internal code, etc.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-17 19:50                                                     ` Mattias Engdegård
@ 2024-02-20 10:21                                                       ` Mattias Engdegård
  2024-02-20 14:00                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-20 10:21 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Gerd Möllmann, Dmitry Gutov, Eli Zaretskii, 68244,
	Stefan Kangas

> The scratch/obarray branch has now been updated. Now obarrayp detects obarray objects only.

Except for a few cosmetic changes that branch is what I will push to master in a few days unless anyone objects. Getting the new type in place is the important part here. (It's really a reform long overdue.)






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-20 10:21                                                       ` Mattias Engdegård
@ 2024-02-20 14:00                                                         ` Eli Zaretskii
  2024-02-20 16:11                                                           ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2024-02-20 14:00 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: gerd.moellmann, dmitry, 68244, monnier, stefankangas

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Tue, 20 Feb 2024 11:21:41 +0100
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>,
>  Dmitry Gutov <dmitry@gutov.dev>,
>  Eli Zaretskii <eliz@gnu.org>,
>  68244@debbugs.gnu.org,
>  Stefan Kangas <stefankangas@gmail.com>
> 
> > The scratch/obarray branch has now been updated. Now obarrayp detects obarray objects only.
> 
> Except for a few cosmetic changes that branch is what I will push to master in a few days unless anyone objects. Getting the new type in place is the important part here. (It's really a reform long overdue.)

How many different platforms and configurations tried to build and use
the branch?  For such a change, I think I'd like to see all the major
platforms be able to compile and run it, before we land it.

Thanks.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-20 14:00                                                         ` Eli Zaretskii
@ 2024-02-20 16:11                                                           ` Mattias Engdegård
  2024-02-20 17:12                                                             ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-20 16:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, dmitry, 68244, monnier, stefankangas

20 feb. 2024 kl. 15.00 skrev Eli Zaretskii <eliz@gnu.org>:

> How many different platforms and configurations tried to build and use
> the branch?  For such a change, I think I'd like to see all the major
> platforms be able to compile and run it, before we land it.

In this case the risk for platform-specific bugs is not high (no system-dependent code nor, from what I can tell, any architecture or configuration-dependent code(*)).

Of course accidents can always happen. If someone would like to try building the scratch/obarray branch on a non-64-bit machine then I'd be very grateful.

(*) Building with --enable-checking=structs is certain to fail because there's no point in updating the pdumper struct hash until it's actually merged.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-20 16:11                                                           ` Mattias Engdegård
@ 2024-02-20 17:12                                                             ` Eli Zaretskii
  2024-02-21 12:59                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2024-02-20 17:12 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: gerd.moellmann, dmitry, 68244, monnier, stefankangas

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Tue, 20 Feb 2024 17:11:23 +0100
> Cc: monnier@iro.umontreal.ca,
>  gerd.moellmann@gmail.com,
>  dmitry@gutov.dev,
>  68244@debbugs.gnu.org,
>  stefankangas@gmail.com
> 
> 20 feb. 2024 kl. 15.00 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > How many different platforms and configurations tried to build and use
> > the branch?  For such a change, I think I'd like to see all the major
> > platforms be able to compile and run it, before we land it.
> 
> In this case the risk for platform-specific bugs is not high (no system-dependent code nor, from what I can tell, any architecture or configuration-dependent code(*)).

Famous last words ;-)

> Of course accidents can always happen. If someone would like to try building the scratch/obarray branch on a non-64-bit machine then I'd be very grateful.

Please ask on emacs-devel that the branch be tried on platforms you
didn't try it, and let's please wait for a few days to give people
chance to do that and report back.

Thanks.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-20 17:12                                                             ` Eli Zaretskii
@ 2024-02-21 12:59                                                               ` Eli Zaretskii
  2024-02-21 20:13                                                                 ` Andrea Corallo
  2024-02-23 12:16                                                                 ` Mattias Engdegård
  0 siblings, 2 replies; 80+ messages in thread
From: Eli Zaretskii @ 2024-02-21 12:59 UTC (permalink / raw)
  To: mattias.engdegard; +Cc: gerd.moellmann, dmitry, 68244, monnier, stefankangas

> Cc: gerd.moellmann@gmail.com, dmitry@gutov.dev, 68244@debbugs.gnu.org,
>  monnier@iro.umontreal.ca, stefankangas@gmail.com
> Date: Tue, 20 Feb 2024 19:12:06 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> 
> Please ask on emacs-devel that the branch be tried on platforms you
> didn't try it, and let's please wait for a few days to give people
> chance to do that and report back.

I've built the branch on this platform and configuration:

  In GNU Emacs 30.0.50 (build 1, i686-pc-mingw32) of 2024-02-21 built on
   ELIZ-PC
  Windowing system distributor 'Microsoft Corp.', version 10.0.22631
  System Description: Microsoft Windows 10 Enterprise (v10.0.2009.22631.3155)

  Configured using:
   'configure -C --prefix=/d/usr --with-wide-int
   --without-native-compilation --enable-checking=yes,glyphs 'CFLAGS=-O0
   -gdwarf-4 -g3''

and it builds cleanly and passes a few tests related to obarrays.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-21 12:59                                                               ` Eli Zaretskii
@ 2024-02-21 20:13                                                                 ` Andrea Corallo
  2024-02-23 12:16                                                                 ` Mattias Engdegård
  1 sibling, 0 replies; 80+ messages in thread
From: Andrea Corallo @ 2024-02-21 20:13 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: mattias.engdegard, gerd.moellmann, dmitry, monnier, 68244,
	stefankangas

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: gerd.moellmann@gmail.com, dmitry@gutov.dev, 68244@debbugs.gnu.org,
>>  monnier@iro.umontreal.ca, stefankangas@gmail.com
>> Date: Tue, 20 Feb 2024 19:12:06 +0200
>> From: Eli Zaretskii <eliz@gnu.org>
>> 
>> Please ask on emacs-devel that the branch be tried on platforms you
>> didn't try it, and let's please wait for a few days to give people
>> chance to do that and report back.
>
> I've built the branch on this platform and configuration:
>
>   In GNU Emacs 30.0.50 (build 1, i686-pc-mingw32) of 2024-02-21 built on
>    ELIZ-PC
>   Windowing system distributor 'Microsoft Corp.', version 10.0.22631
>   System Description: Microsoft Windows 10 Enterprise (v10.0.2009.22631.3155)
>
>   Configured using:
>    'configure -C --prefix=/d/usr --with-wide-int
>    --without-native-compilation --enable-checking=yes,glyphs 'CFLAGS=-O0
>    -gdwarf-4 -g3''
>
> and it builds cleanly and passes a few tests related to obarrays.

Hi all,

I've built as well the branch on aarch64-unknown-linux-gnu configured
as:

 'configure --with-native-compilation --with-x-toolkit=no
 --prefix=/home/andcor03 --with-xpm=ifavailable --with-jpeg=ifavailable
 --with-gif=ifavailable --with-tiff=ifavailable
 --with-gnutls=ifavailable'


I can't see regressions for make check over 20997aa2072.

Bests

  Andrea





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-21 12:59                                                               ` Eli Zaretskii
  2024-02-21 20:13                                                                 ` Andrea Corallo
@ 2024-02-23 12:16                                                                 ` Mattias Engdegård
  2024-02-24  9:45                                                                   ` Mattias Engdegård
  1 sibling, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-23 12:16 UTC (permalink / raw)
  To: Eli Zaretskii, Andrea Corallo
  Cc: gerd.moellmann, dmitry, 68244, monnier, stefankangas

21 feb. 2024 kl. 13.59 skrev Eli Zaretskii <eliz@gnu.org>:

> I've built the branch on this platform and configuration:
> 
>  In GNU Emacs 30.0.50 (build 1, i686-pc-mingw32) of 2024-02-21 built on
>  [...] --with-wide-int

21 feb. 2024 kl. 21.13 skrev Andrea Corallo <acorallo@gnu.org>:

> I've built as well the branch on aarch64-unknown-linux-gnu

Eli and Andrea, thank you very much! I think this spans a sufficient space of relevant builds (in particular since both you and I regularly commit changes with more platform-dependent stuff).

Now pushed to master, so that it can get some real testing.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-13  9:05                                         ` Gerd Möllmann
  2024-02-13 10:12                                           ` Mattias Engdegård
@ 2024-02-24  2:46                                           ` Dmitry Gutov
  1 sibling, 0 replies; 80+ messages in thread
From: Dmitry Gutov @ 2024-02-24  2:46 UTC (permalink / raw)
  To: Gerd Möllmann, Mattias Engdegård
  Cc: Eli Zaretskii, 68244, Stefan Monnier, Stefan Kangas

On 13/02/2024 11:05, Gerd Möllmann wrote:
> Mattias Engdegård<mattias.engdegard@gmail.com>  writes:
> 
>> * Compatibility with existing code is excellent except for some places
>> that used `obarray-make` but then assumed the result to be vectors
>> (using `vectorp` instead of `obarrayp` etc).
> I wonder, what does your code do for the places using (make-vector n 0)
> create obarrays? I'm not 100% sure, but I think Cedet was an example of
> that in Emacs itself.

FWIW, there is this old-ish package called smartrep, which uses

   (make-vector n nil)

to create an obarray value which it later calls (intern ...) with.

The above usage broke with the latest master (signaling 
wrong-type-argument obarrayp). Changing nil to 0 made it work again.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-23 12:16                                                                 ` Mattias Engdegård
@ 2024-02-24  9:45                                                                   ` Mattias Engdegård
  2024-02-24 10:30                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-24  9:45 UTC (permalink / raw)
  To: Eli Zaretskii, Andrea Corallo
  Cc: gerd.moellmann, dmitry, 68244, monnier, stefankangas

15b6d72599b9 on master:

> +If you have code which creates obarrays as a simple Lisp vector:
> +
> +   (make-vector N nil)
> +
> +and then calls 'intern' using such an obarray as second argument, this
> +will now signal a wrong-type-argument error; replace nil with zero to
> +make it work again.

Thank you Eli, this might be useful, and you are absolutely right that we should help users fix their code.

But this cannot have come from nowhere. Did anyone actually tried to create obarrays in this broken way? It goes against all previous documentation and violates many invariants at once: it would have filled an obarray with multiple copies of the nil symbol which itself is interned in a different obarray.

Running `mapatoms` would have given very surprising results (what does nil have in its .u.s.next pointer? what happens if we remove nil from this pseudo-obarray?) and it must have been a recipe for accidents. It cannot have been a widespread usage.

And if anyone did try to create an obarray in that incorrect way, better recommend using `obarray-make` which is more future-safe than an already obsolete form.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24  9:45                                                                   ` Mattias Engdegård
@ 2024-02-24 10:30                                                                     ` Eli Zaretskii
  2024-02-24 10:53                                                                       ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2024-02-24 10:30 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 68244, gerd.moellmann, dmitry, stefankangas, acorallo, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 24 Feb 2024 10:45:03 +0100
> Cc: gerd.moellmann@gmail.com,
>  dmitry@gutov.dev,
>  68244@debbugs.gnu.org,
>  monnier@iro.umontreal.ca,
>  stefankangas@gmail.com
> 
> 15b6d72599b9 on master:
> 
> > +If you have code which creates obarrays as a simple Lisp vector:
> > +
> > +   (make-vector N nil)
> > +
> > +and then calls 'intern' using such an obarray as second argument, this
> > +will now signal a wrong-type-argument error; replace nil with zero to
> > +make it work again.
> 
> Thank you Eli, this might be useful, and you are absolutely right that we should help users fix their code.
> 
> But this cannot have come from nowhere. Did anyone actually tried to create obarrays in this broken way?

Yes, see Dmitry's message up-thread, which named the offending
package.  I've looked at it, and verified that it really does use this
paradigm.

> And if anyone did try to create an obarray in that incorrect way, better recommend using `obarray-make` which is more future-safe than an already obsolete form.

I'm fine with adding to NEWS the text that suggests obarray-make, but
the text I added has to stay, since the make-vector method evidently
exists in the wild.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 10:30                                                                     ` Eli Zaretskii
@ 2024-02-24 10:53                                                                       ` Mattias Engdegård
  2024-02-24 11:03                                                                         ` Eli Zaretskii
  2024-02-24 17:14                                                                         ` Dmitry Gutov
  0 siblings, 2 replies; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-24 10:53 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: 68244, gerd.moellmann, dmitry, stefankangas, acorallo, monnier

24 feb. 2024 kl. 11.30 skrev Eli Zaretskii <eliz@gnu.org>:

> Yes, see Dmitry's message up-thread, which named the offending
> package.  I've looked at it, and verified that it really does use this
> paradigm.

Thank you, somehow I didn't get that mail. Perhaps he fell out of grace with Gmail.

> I'm fine with adding to NEWS the text that suggests obarray-make, but
> the text I added has to stay, since the make-vector method evidently
> exists in the wild.

Right, it now suggests obarray-make and makes it clear that anything other than 0-filled vectors already was wrong.






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 10:53                                                                       ` Mattias Engdegård
@ 2024-02-24 11:03                                                                         ` Eli Zaretskii
  2024-02-24 17:20                                                                           ` Dmitry Gutov
  2024-02-24 17:14                                                                         ` Dmitry Gutov
  1 sibling, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2024-02-24 11:03 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 68244, gerd.moellmann, dmitry, stefankangas, acorallo, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 24 Feb 2024 11:53:32 +0100
> Cc: acorallo@gnu.org,
>  gerd.moellmann@gmail.com,
>  dmitry@gutov.dev,
>  68244@debbugs.gnu.org,
>  monnier@iro.umontreal.ca,
>  stefankangas@gmail.com
> 
> 24 feb. 2024 kl. 11.30 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Yes, see Dmitry's message up-thread, which named the offending
> > package.  I've looked at it, and verified that it really does use this
> > paradigm.
> 
> Thank you, somehow I didn't get that mail. Perhaps he fell out of grace with Gmail.
> 
> > I'm fine with adding to NEWS the text that suggests obarray-make, but
> > the text I added has to stay, since the make-vector method evidently
> > exists in the wild.
> 
> Right, it now suggests obarray-make and makes it clear that anything other than 0-filled vectors already was wrong.

Not exactly what I asked to do; fixed.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 10:53                                                                       ` Mattias Engdegård
  2024-02-24 11:03                                                                         ` Eli Zaretskii
@ 2024-02-24 17:14                                                                         ` Dmitry Gutov
  1 sibling, 0 replies; 80+ messages in thread
From: Dmitry Gutov @ 2024-02-24 17:14 UTC (permalink / raw)
  To: Mattias Engdegård, Eli Zaretskii
  Cc: gerd.moellmann, acorallo, stefankangas, 68244, monnier

On 24/02/2024 12:53, Mattias Engdegård wrote:
> 24 feb. 2024 kl. 11.30 skrev Eli Zaretskii<eliz@gnu.org>:
> 
>> Yes, see Dmitry's message up-thread, which named the offending
>> package.  I've looked at it, and verified that it really does use this
>> paradigm.
> Thank you, somehow I didn't get that mail. Perhaps he fell out of grace with Gmail.

Yes: gmail has been bouncing some of my emails lately (coming from 
fastmail).

 From what I hear from support, they should ultimately arrive, though 
(after some retries).





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 11:03                                                                         ` Eli Zaretskii
@ 2024-02-24 17:20                                                                           ` Dmitry Gutov
  2024-02-24 17:43                                                                             ` Mattias Engdegård
  2024-02-24 17:54                                                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 80+ messages in thread
From: Dmitry Gutov @ 2024-02-24 17:20 UTC (permalink / raw)
  To: Eli Zaretskii, Mattias Engdegård
  Cc: gerd.moellmann, acorallo, stefankangas, 68244, monnier

On 24/02/2024 13:03, Eli Zaretskii wrote:
>>> I'm fine with adding to NEWS the text that suggests obarray-make, but
>>> the text I added has to stay, since the make-vector method evidently
>>> exists in the wild.
>> Right, it now suggests obarray-make and makes it clear that anything other than 0-filled vectors already was wrong.
> Not exactly what I asked to do; fixed.

Thanks.

We can't simply recommend using obarray-make instead to third-party code 
because it usually targets some older Emacs releases as well.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 17:20                                                                           ` Dmitry Gutov
@ 2024-02-24 17:43                                                                             ` Mattias Engdegård
  2024-02-24 17:48                                                                               ` Dmitry Gutov
  2024-02-24 17:54                                                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-24 17:43 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: 68244, gerd.moellmann, stefankangas, Eli Zaretskii, acorallo,
	monnier

24 feb. 2024 kl. 18.20 skrev Dmitry Gutov <dmitry@gutov.dev>:

> We can't simply recommend using obarray-make instead to third-party code because it usually targets some older Emacs releases as well.

We can certainly recommend functions that are available in the current version, and definitely ones that have been around since Emacs 25 (which is when obarray-make arrived).






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 17:43                                                                             ` Mattias Engdegård
@ 2024-02-24 17:48                                                                               ` Dmitry Gutov
  2024-02-24 17:53                                                                                 ` Mattias Engdegård
  0 siblings, 1 reply; 80+ messages in thread
From: Dmitry Gutov @ 2024-02-24 17:48 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 68244, gerd.moellmann, stefankangas, Eli Zaretskii, acorallo,
	monnier

On 24/02/2024 19:43, Mattias Engdegård wrote:
> 24 feb. 2024 kl. 18.20 skrev Dmitry Gutov<dmitry@gutov.dev>:
> 
>> We can't simply recommend using obarray-make instead to third-party code because it usually targets some older Emacs releases as well.
> We can certainly recommend functions that are available in the current version, and definitely ones that have been around since Emacs 25 (which is when obarray-make arrived).

Ah, sorry. I just looked at the output of describe-function.

obarray-make is not mentioned in NEWS.25, so its help currently says

   Probably introduced at or before Emacs version 30.1.

FWIW, though, smartrep far predates Emacs 25, and doesn't declare a 
minimum Emacs version. OTOH it hasn't been updated for 9 years and 
probably doesn't have too many users left.





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 17:48                                                                               ` Dmitry Gutov
@ 2024-02-24 17:53                                                                                 ` Mattias Engdegård
  2024-02-24 18:08                                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Mattias Engdegård @ 2024-02-24 17:53 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: 68244, gerd.moellmann, stefankangas, Eli Zaretskii, acorallo,
	monnier

24 feb. 2024 kl. 18.48 skrev Dmitry Gutov <dmitry@gutov.dev>:

> Ah, sorry. I just looked at the output of describe-function.
> 
> obarray-make is not mentioned in NEWS.25, so its help currently says
> 
>  Probably introduced at or before Emacs version 30.1.

That's a good point, thank you. Would anyone object to my adding a terse notice to NEWS.25 about the obarray functions added in that release?







^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 17:20                                                                           ` Dmitry Gutov
  2024-02-24 17:43                                                                             ` Mattias Engdegård
@ 2024-02-24 17:54                                                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 80+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-02-24 17:54 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: 68244, Mattias Engdegård, gerd.moellmann, stefankangas,
	Eli Zaretskii, acorallo

> We can't simply recommend using obarray-make instead to third-party code
> because it usually targets some older Emacs releases as well.

`obarray-make` was added in Emacs-25.1, so most packages can make use of
it nowadays (but yes, there are still some which want to support older
releases).


        Stefan






^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 17:53                                                                                 ` Mattias Engdegård
@ 2024-02-24 18:08                                                                                   ` Eli Zaretskii
  2024-02-24 18:31                                                                                     ` Dmitry Gutov
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2024-02-24 18:08 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 68244, gerd.moellmann, dmitry, stefankangas, acorallo, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 24 Feb 2024 18:53:06 +0100
> Cc: Eli Zaretskii <eliz@gnu.org>,
>  acorallo@gnu.org,
>  gerd.moellmann@gmail.com,
>  68244@debbugs.gnu.org,
>  monnier@iro.umontreal.ca,
>  stefankangas@gmail.com
> 
> 24 feb. 2024 kl. 18.48 skrev Dmitry Gutov <dmitry@gutov.dev>:
> 
> > Ah, sorry. I just looked at the output of describe-function.
> > 
> > obarray-make is not mentioned in NEWS.25, so its help currently says
> > 
> >  Probably introduced at or before Emacs version 30.1.
> 
> That's a good point, thank you. Would anyone object to my adding a terse notice to NEWS.25 about the obarray functions added in that release?

Instead of trying to change history, wouldn't it be better to ass some
mechanism for specifying the first release explicitly?  Or maybe we
already have it?





^ permalink raw reply	[flat|nested] 80+ messages in thread

* bug#68244: hash-table improvements
  2024-02-24 18:08                                                                                   ` Eli Zaretskii
@ 2024-02-24 18:31                                                                                     ` Dmitry Gutov
  0 siblings, 0 replies; 80+ messages in thread
From: Dmitry Gutov @ 2024-02-24 18:31 UTC (permalink / raw)
  To: Eli Zaretskii, Mattias Engdegård
  Cc: gerd.moellmann, acorallo, stefankangas, 68244, monnier

On 24/02/2024 20:08, Eli Zaretskii wrote:
> Instead of trying to change history, wouldn't it be better to ass some
> mechanism for specifying the first release explicitly?  Or maybe we
> already have it?

I think editing the older NEWS.xx is that method.

The Help for obarray-make shows the right version now.





^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2024-02-24 18:31 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <170438379722.3921.9312235725296561206@vcs2.savannah.gnu.org>
     [not found] ` <20240104155642.B4A99C00344@vcs2.savannah.gnu.org>
2024-01-04 17:32   ` scratch/hash-table-perf 2d28042f56a 19/35: Use non-Lisp allocation for internal hash-table vectors Dmitry Gutov
2024-01-05 10:33     ` bug#68244: hash-table improvements Mattias Engdegård
2024-01-05 15:41       ` Dmitry Gutov
2024-01-06 11:34         ` Mattias Engdegård
2024-01-06 11:51           ` Eli Zaretskii
2024-01-07  3:13           ` Dmitry Gutov
2024-01-07  5:26             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-07 15:39               ` Dmitry
2024-01-07 18:36               ` Mattias Engdegård
2024-01-07 19:10                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-08 18:26                   ` Mattias Engdegård
2024-01-09  0:33                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-09 10:26                       ` Mattias Engdegård
2024-01-13 20:06                         ` Mattias Engdegård
2024-01-04 16:27                           ` Mattias Engdegård
2024-01-04 16:39                             ` Eli Zaretskii
2024-01-04 17:02                               ` Mattias Engdegård
2024-01-04 17:45                                 ` Eli Zaretskii
2024-01-05 11:34                                   ` Mattias Engdegård
2024-01-05 17:14                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-06 11:46                                       ` Mattias Engdegård
2024-01-09 21:51                             ` Stefan Kangas
2024-01-12 15:42                               ` Mattias Engdegård
2024-01-14 22:08                             ` Andy Moreton
2024-01-15 12:31                               ` Eli Zaretskii
2024-01-15 13:26                                 ` Mattias Engdegård
2024-01-18 18:13                                   ` Mattias Engdegård
2024-01-15 20:01                             ` Andy Moreton
2024-01-15 20:21                               ` Eli Zaretskii
2024-01-16 21:57                             ` Andy Moreton
2024-01-17  3:31                               ` Eli Zaretskii
2024-01-18 20:29                             ` Andy Moreton
2024-01-19  6:37                               ` Eli Zaretskii
2024-01-20 20:20                             ` Andy Moreton
2024-01-21  5:11                               ` Eli Zaretskii
2024-01-21 13:03                             ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-22  9:17                               ` João Távora
2024-01-22  9:18                                 ` João Távora
2024-01-23  9:44                                 ` Basil L. Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-01-14  5:25                           ` Gerd Möllmann
2024-01-14 14:42                             ` Mattias Engdegård
2024-01-21 12:41                           ` Stefan Kangas
2024-02-08  9:46                           ` Mattias Engdegård
2024-02-08 14:19                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-08 14:36                               ` Gerd Möllmann
2024-02-08 14:42                               ` Mattias Engdegård
2024-02-08 15:13                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-08 17:29                                   ` Mattias Engdegård
2024-02-08 17:49                                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-12 12:16                                       ` Mattias Engdegård
2024-02-12 13:36                                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-13  9:05                                         ` Gerd Möllmann
2024-02-13 10:12                                           ` Mattias Engdegård
2024-02-13 12:12                                             ` Gerd Möllmann
2024-02-13 12:43                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-14 12:37                                               ` Mattias Engdegård
2024-02-14 13:05                                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-14 13:21                                                   ` Mattias Engdegård
2024-02-17 19:50                                                     ` Mattias Engdegård
2024-02-20 10:21                                                       ` Mattias Engdegård
2024-02-20 14:00                                                         ` Eli Zaretskii
2024-02-20 16:11                                                           ` Mattias Engdegård
2024-02-20 17:12                                                             ` Eli Zaretskii
2024-02-21 12:59                                                               ` Eli Zaretskii
2024-02-21 20:13                                                                 ` Andrea Corallo
2024-02-23 12:16                                                                 ` Mattias Engdegård
2024-02-24  9:45                                                                   ` Mattias Engdegård
2024-02-24 10:30                                                                     ` Eli Zaretskii
2024-02-24 10:53                                                                       ` Mattias Engdegård
2024-02-24 11:03                                                                         ` Eli Zaretskii
2024-02-24 17:20                                                                           ` Dmitry Gutov
2024-02-24 17:43                                                                             ` Mattias Engdegård
2024-02-24 17:48                                                                               ` Dmitry Gutov
2024-02-24 17:53                                                                                 ` Mattias Engdegård
2024-02-24 18:08                                                                                   ` Eli Zaretskii
2024-02-24 18:31                                                                                     ` Dmitry Gutov
2024-02-24 17:54                                                                             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-02-24 17:14                                                                         ` Dmitry Gutov
2024-02-24  2:46                                           ` Dmitry Gutov
     [not found] ` <20240104155642.6326FC00344@vcs2.savannah.gnu.org>
2024-01-04 18:52   ` scratch/hash-table-perf 3e9e68333ae 16/35: Remove rehash-threshold and rehash-size struct members Dmitry Gutov

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.