unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Hatables are slow
@ 2022-02-21 13:18 Stefan Israelsson Tampe
  2022-02-21 22:17 ` Mikael Djurfeldt
  2022-02-23  7:40 ` Linus Björnstam
  0 siblings, 2 replies; 6+ messages in thread
From: Stefan Israelsson Tampe @ 2022-02-21 13:18 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1412 bytes --]

A datastructure I fancy is hash tables. But I found out that hashtables in
guile are really slow, How? First of all we make a hash table

(define h (make-hash-table))

Then add values
(for-each (lambda (i) (hash-set! h i i)) (iota 20000000))

Then the following operation cost say 5s
(hash-fold (lambda (k v s) (+ k v s)) 0 h)

It is possible with the foreign interface to speedt this up to 2s using
guiles internal interface. But this is slow for such a simple application.
Now let's change focus. Assume the in stead an assoc,

(define l (map (lambda (i) (cons i i)) (iota 20000000)))

Then
ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s))
$5 = 199999990000000
;; 0.114530s real time, 0.114391s run time.  0.000000s spent in GC.

That's 20X faster. What have happened?, Well hashmaps has terrible memory
layout for scanning. So essentially keeping a list of the created values
consed on a list not only get you an ordered hashmap, you also have 20X
increase in speed, you sacrifice memory, say about 25-50% extra. The
problem actually more that when you remove elements updating the ordered
list is very expensive. In python-on-guile I have solved this by moving to
a doubly linked list when people start's to delete single elements. For
small hashmap things are different.

I suggest that guile should have a proper faster standard hashmap
implemention of such kind in scheme.

Stefan

[-- Attachment #2: Type: text/html, Size: 2125 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hatables are slow
  2022-02-21 13:18 Hatables are slow Stefan Israelsson Tampe
@ 2022-02-21 22:17 ` Mikael Djurfeldt
  2022-02-21 22:18   ` Mikael Djurfeldt
  2022-02-23  7:40 ` Linus Björnstam
  1 sibling, 1 reply; 6+ messages in thread
From: Mikael Djurfeldt @ 2022-02-21 22:17 UTC (permalink / raw)
  To: Stefan Israelsson Tampe; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2168 bytes --]

The hash table in Guile is rather standard (at least according to what was
standard in the old ages :). (I *have* some vague memory that I might have
implemented a simpler/faster table at some point, but that is not in the
code base now.)

The structure is a vector of alists. It's of course important that the
alists don't get too long, so there's some resizing going on. If you call
(make-hash-table), the size of the table starts out at 31, so in your use
case, there will be several resizing steps.

What happens with speed if you do (make-hash-table 5000000) instead?

Best regards,
Mikael

On Mon, Feb 21, 2022 at 2:55 PM Stefan Israelsson Tampe <
stefan.itampe@gmail.com> wrote:

> A datastructure I fancy is hash tables. But I found out that hashtables in
> guile are really slow, How? First of all we make a hash table
>
> (define h (make-hash-table))
>
> Then add values
> (for-each (lambda (i) (hash-set! h i i)) (iota 20000000))
>
> Then the following operation cost say 5s
> (hash-fold (lambda (k v s) (+ k v s)) 0 h)
>
> It is possible with the foreign interface to speedt this up to 2s using
> guiles internal interface. But this is slow for such a simple application.
> Now let's change focus. Assume the in stead an assoc,
>
> (define l (map (lambda (i) (cons i i)) (iota 20000000)))
>
> Then
> ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s))
> $5 = 199999990000000
> ;; 0.114530s real time, 0.114391s run time.  0.000000s spent in GC.
>
> That's 20X faster. What have happened?, Well hashmaps has terrible memory
> layout for scanning. So essentially keeping a list of the created values
> consed on a list not only get you an ordered hashmap, you also have 20X
> increase in speed, you sacrifice memory, say about 25-50% extra. The
> problem actually more that when you remove elements updating the ordered
> list is very expensive. In python-on-guile I have solved this by moving to
> a doubly linked list when people start's to delete single elements. For
> small hashmap things are different.
>
> I suggest that guile should have a proper faster standard hashmap
> implemention of such kind in scheme.
>
> Stefan
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 3203 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hatables are slow
  2022-02-21 22:17 ` Mikael Djurfeldt
@ 2022-02-21 22:18   ` Mikael Djurfeldt
  0 siblings, 0 replies; 6+ messages in thread
From: Mikael Djurfeldt @ 2022-02-21 22:18 UTC (permalink / raw)
  To: Stefan Israelsson Tampe; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2380 bytes --]

(Note that the resizing means *rehashing* of all elements.)

On Mon, Feb 21, 2022 at 11:17 PM Mikael Djurfeldt <mikael@djurfeldt.com>
wrote:

> The hash table in Guile is rather standard (at least according to what was
> standard in the old ages :). (I *have* some vague memory that I might have
> implemented a simpler/faster table at some point, but that is not in the
> code base now.)
>
> The structure is a vector of alists. It's of course important that the
> alists don't get too long, so there's some resizing going on. If you call
> (make-hash-table), the size of the table starts out at 31, so in your use
> case, there will be several resizing steps.
>
> What happens with speed if you do (make-hash-table 5000000) instead?
>
> Best regards,
> Mikael
>
> On Mon, Feb 21, 2022 at 2:55 PM Stefan Israelsson Tampe <
> stefan.itampe@gmail.com> wrote:
>
>> A datastructure I fancy is hash tables. But I found out that hashtables
>> in guile are really slow, How? First of all we make a hash table
>>
>> (define h (make-hash-table))
>>
>> Then add values
>> (for-each (lambda (i) (hash-set! h i i)) (iota 20000000))
>>
>> Then the following operation cost say 5s
>> (hash-fold (lambda (k v s) (+ k v s)) 0 h)
>>
>> It is possible with the foreign interface to speedt this up to 2s using
>> guiles internal interface. But this is slow for such a simple application.
>> Now let's change focus. Assume the in stead an assoc,
>>
>> (define l (map (lambda (i) (cons i i)) (iota 20000000)))
>>
>> Then
>> ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s))
>> $5 = 199999990000000
>> ;; 0.114530s real time, 0.114391s run time.  0.000000s spent in GC.
>>
>> That's 20X faster. What have happened?, Well hashmaps has terrible memory
>> layout for scanning. So essentially keeping a list of the created values
>> consed on a list not only get you an ordered hashmap, you also have 20X
>> increase in speed, you sacrifice memory, say about 25-50% extra. The
>> problem actually more that when you remove elements updating the ordered
>> list is very expensive. In python-on-guile I have solved this by moving to
>> a doubly linked list when people start's to delete single elements. For
>> small hashmap things are different.
>>
>> I suggest that guile should have a proper faster standard hashmap
>> implemention of such kind in scheme.
>>
>> Stefan
>>
>>
>>
>>

[-- Attachment #2: Type: text/html, Size: 3645 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hatables are slow
  2022-02-21 13:18 Hatables are slow Stefan Israelsson Tampe
  2022-02-21 22:17 ` Mikael Djurfeldt
@ 2022-02-23  7:40 ` Linus Björnstam
  2022-02-23  8:02   ` Stefan Israelsson Tampe
  2022-02-23 13:46   ` Mikael Djurfeldt
  1 sibling, 2 replies; 6+ messages in thread
From: Linus Björnstam @ 2022-02-23  7:40 UTC (permalink / raw)
  To: Stefan Israelsson Tampe, guile-devel

Hej!

I would also propose a hash table based on a more sane interface. The equality and hash procedures should be associated with the hash table at creation rather than every time the hash table is used. Like in R6RS, srfi-69, or srfi-12X (intermediate hash tables). 

Maybe the current HT could be relegated to some kind of compat or deprecated library to be removed in 3.4... I am no maintainer, but I think we can all agree that the current API, while fine in the context of guile 1.6, is somewhat clunky by today's standards. It is also commonplace enough that regular deprecation might become rough.

Just the simple fact that hash-set! and hashq-set! can be used interchangeably while you at the same time NEVER EVER should mix them is somewhat unnerving.

I would say a hash table that specifies everything at creation time (with maybe an opportunity to use something like the hashx-* functions for daredevils and for future srfi needs) is the way to go.

Best regards
  Linus Björnstam

On Mon, 21 Feb 2022, at 14:18, Stefan Israelsson Tampe wrote:
> A datastructure I fancy is hash tables. But I found out that hashtables 
> in guile are really slow, How? First of all we make a hash table
>
> (define h (make-hash-table))
>
> Then add values
> (for-each (lambda (i) (hash-set! h i i)) (iota 20000000))
>
> Then the following operation cost say 5s
> (hash-fold (lambda (k v s) (+ k v s)) 0 h)
>
> It is possible with the foreign interface to speedt this up to 2s using 
> guiles internal interface. But this is slow for such a simple 
> application. Now let's change focus. Assume the in stead an assoc,
>
> (define l (map (lambda (i) (cons i i)) (iota 20000000)))
>
> Then
> ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s)) 
> $5 = 199999990000000 
> ;; 0.114530s real time, 0.114391s run time.  0.000000s spent in GC.
>
> That's 20X faster. What have happened?, Well hashmaps has terrible 
> memory layout for scanning. So essentially keeping a list of the 
> created values consed on a list not only get you an ordered hashmap, 
> you also have 20X increase in speed, you sacrifice memory, say about 
> 25-50% extra. The problem actually more that when you remove elements 
> updating the ordered list is very expensive. In python-on-guile I have 
> solved this by moving to a doubly linked list when people start's to 
> delete single elements. For small hashmap things are different.
>
> I suggest that guile should have a proper faster standard hashmap 
> implemention of such kind in scheme.
>
> Stefan



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hatables are slow
  2022-02-23  7:40 ` Linus Björnstam
@ 2022-02-23  8:02   ` Stefan Israelsson Tampe
  2022-02-23 13:46   ` Mikael Djurfeldt
  1 sibling, 0 replies; 6+ messages in thread
From: Stefan Israelsson Tampe @ 2022-02-23  8:02 UTC (permalink / raw)
  To: Linus Björnstam; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2925 bytes --]

yep that's a working solution, currently I make a proper goops class and
the methods will dispatch to the right version.

On Wed, Feb 23, 2022 at 8:41 AM Linus Björnstam <
linus.bjornstam@veryfast.biz> wrote:

> Hej!
>
> I would also propose a hash table based on a more sane interface. The
> equality and hash procedures should be associated with the hash table at
> creation rather than every time the hash table is used. Like in R6RS,
> srfi-69, or srfi-12X (intermediate hash tables).
>
> Maybe the current HT could be relegated to some kind of compat or
> deprecated library to be removed in 3.4... I am no maintainer, but I think
> we can all agree that the current API, while fine in the context of guile
> 1.6, is somewhat clunky by today's standards. It is also commonplace enough
> that regular deprecation might become rough.
>
> Just the simple fact that hash-set! and hashq-set! can be used
> interchangeably while you at the same time NEVER EVER should mix them is
> somewhat unnerving.
>
> I would say a hash table that specifies everything at creation time (with
> maybe an opportunity to use something like the hashx-* functions for
> daredevils and for future srfi needs) is the way to go.
>
> Best regards
>   Linus Björnstam
>
> On Mon, 21 Feb 2022, at 14:18, Stefan Israelsson Tampe wrote:
> > A datastructure I fancy is hash tables. But I found out that hashtables
> > in guile are really slow, How? First of all we make a hash table
> >
> > (define h (make-hash-table))
> >
> > Then add values
> > (for-each (lambda (i) (hash-set! h i i)) (iota 20000000))
> >
> > Then the following operation cost say 5s
> > (hash-fold (lambda (k v s) (+ k v s)) 0 h)
> >
> > It is possible with the foreign interface to speedt this up to 2s using
> > guiles internal interface. But this is slow for such a simple
> > application. Now let's change focus. Assume the in stead an assoc,
> >
> > (define l (map (lambda (i) (cons i i)) (iota 20000000)))
> >
> > Then
> > ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s))
> > $5 = 199999990000000
> > ;; 0.114530s real time, 0.114391s run time.  0.000000s spent in GC.
> >
> > That's 20X faster. What have happened?, Well hashmaps has terrible
> > memory layout for scanning. So essentially keeping a list of the
> > created values consed on a list not only get you an ordered hashmap,
> > you also have 20X increase in speed, you sacrifice memory, say about
> > 25-50% extra. The problem actually more that when you remove elements
> > updating the ordered list is very expensive. In python-on-guile I have
> > solved this by moving to a doubly linked list when people start's to
> > delete single elements. For small hashmap things are different.
> >
> > I suggest that guile should have a proper faster standard hashmap
> > implemention of such kind in scheme.
> >
> > Stefan
>

[-- Attachment #2: Type: text/html, Size: 3417 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hatables are slow
  2022-02-23  7:40 ` Linus Björnstam
  2022-02-23  8:02   ` Stefan Israelsson Tampe
@ 2022-02-23 13:46   ` Mikael Djurfeldt
  1 sibling, 0 replies; 6+ messages in thread
From: Mikael Djurfeldt @ 2022-02-23 13:46 UTC (permalink / raw)
  To: Linus Björnstam; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2933 bytes --]

I agree. Also, if there is no strong reason to deviate from RnRS, that
would be a good choice. (But, I'm also no maintainer.)

On Wed, Feb 23, 2022 at 8:42 AM Linus Björnstam <
linus.bjornstam@veryfast.biz> wrote:

> Hej!
>
> I would also propose a hash table based on a more sane interface. The
> equality and hash procedures should be associated with the hash table at
> creation rather than every time the hash table is used. Like in R6RS,
> srfi-69, or srfi-12X (intermediate hash tables).
>
> Maybe the current HT could be relegated to some kind of compat or
> deprecated library to be removed in 3.4... I am no maintainer, but I think
> we can all agree that the current API, while fine in the context of guile
> 1.6, is somewhat clunky by today's standards. It is also commonplace enough
> that regular deprecation might become rough.
>
> Just the simple fact that hash-set! and hashq-set! can be used
> interchangeably while you at the same time NEVER EVER should mix them is
> somewhat unnerving.
>
> I would say a hash table that specifies everything at creation time (with
> maybe an opportunity to use something like the hashx-* functions for
> daredevils and for future srfi needs) is the way to go.
>
> Best regards
>   Linus Björnstam
>
> On Mon, 21 Feb 2022, at 14:18, Stefan Israelsson Tampe wrote:
> > A datastructure I fancy is hash tables. But I found out that hashtables
> > in guile are really slow, How? First of all we make a hash table
> >
> > (define h (make-hash-table))
> >
> > Then add values
> > (for-each (lambda (i) (hash-set! h i i)) (iota 20000000))
> >
> > Then the following operation cost say 5s
> > (hash-fold (lambda (k v s) (+ k v s)) 0 h)
> >
> > It is possible with the foreign interface to speedt this up to 2s using
> > guiles internal interface. But this is slow for such a simple
> > application. Now let's change focus. Assume the in stead an assoc,
> >
> > (define l (map (lambda (i) (cons i i)) (iota 20000000)))
> >
> > Then
> > ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s))
> > $5 = 199999990000000
> > ;; 0.114530s real time, 0.114391s run time.  0.000000s spent in GC.
> >
> > That's 20X faster. What have happened?, Well hashmaps has terrible
> > memory layout for scanning. So essentially keeping a list of the
> > created values consed on a list not only get you an ordered hashmap,
> > you also have 20X increase in speed, you sacrifice memory, say about
> > 25-50% extra. The problem actually more that when you remove elements
> > updating the ordered list is very expensive. In python-on-guile I have
> > solved this by moving to a doubly linked list when people start's to
> > delete single elements. For small hashmap things are different.
> >
> > I suggest that guile should have a proper faster standard hashmap
> > implemention of such kind in scheme.
> >
> > Stefan
>
>

[-- Attachment #2: Type: text/html, Size: 3430 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-02-23 13:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 13:18 Hatables are slow Stefan Israelsson Tampe
2022-02-21 22:17 ` Mikael Djurfeldt
2022-02-21 22:18   ` Mikael Djurfeldt
2022-02-23  7:40 ` Linus Björnstam
2022-02-23  8:02   ` Stefan Israelsson Tampe
2022-02-23 13:46   ` Mikael Djurfeldt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).