unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: "Linus Björnstam" <linus.bjornstam@veryfast.biz>
To: "Stefan Israelsson Tampe" <stefan.itampe@gmail.com>,
	guile-devel <guile-devel@gnu.org>
Subject: Re: Hatables are slow
Date: Wed, 23 Feb 2022 08:40:57 +0100	[thread overview]
Message-ID: <4dccd80b-18f2-40e3-b6b2-c1d97bd91224@www.fastmail.com> (raw)
In-Reply-To: <CAGua6m0tEEQ7qkiqPfpuH6VYTQmv_fTJvcz7tEa5Hb8yb=MuMQ@mail.gmail.com>

Hej!

I would also propose a hash table based on a more sane interface. The equality and hash procedures should be associated with the hash table at creation rather than every time the hash table is used. Like in R6RS, srfi-69, or srfi-12X (intermediate hash tables). 

Maybe the current HT could be relegated to some kind of compat or deprecated library to be removed in 3.4... I am no maintainer, but I think we can all agree that the current API, while fine in the context of guile 1.6, is somewhat clunky by today's standards. It is also commonplace enough that regular deprecation might become rough.

Just the simple fact that hash-set! and hashq-set! can be used interchangeably while you at the same time NEVER EVER should mix them is somewhat unnerving.

I would say a hash table that specifies everything at creation time (with maybe an opportunity to use something like the hashx-* functions for daredevils and for future srfi needs) is the way to go.

Best regards
  Linus Björnstam

On Mon, 21 Feb 2022, at 14:18, Stefan Israelsson Tampe wrote:
> A datastructure I fancy is hash tables. But I found out that hashtables 
> in guile are really slow, How? First of all we make a hash table
>
> (define h (make-hash-table))
>
> Then add values
> (for-each (lambda (i) (hash-set! h i i)) (iota 20000000))
>
> Then the following operation cost say 5s
> (hash-fold (lambda (k v s) (+ k v s)) 0 h)
>
> It is possible with the foreign interface to speedt this up to 2s using 
> guiles internal interface. But this is slow for such a simple 
> application. Now let's change focus. Assume the in stead an assoc,
>
> (define l (map (lambda (i) (cons i i)) (iota 20000000)))
>
> Then
> ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s)) 
> $5 = 199999990000000 
> ;; 0.114530s real time, 0.114391s run time.  0.000000s spent in GC.
>
> That's 20X faster. What have happened?, Well hashmaps has terrible 
> memory layout for scanning. So essentially keeping a list of the 
> created values consed on a list not only get you an ordered hashmap, 
> you also have 20X increase in speed, you sacrifice memory, say about 
> 25-50% extra. The problem actually more that when you remove elements 
> updating the ordered list is very expensive. In python-on-guile I have 
> solved this by moving to a doubly linked list when people start's to 
> delete single elements. For small hashmap things are different.
>
> I suggest that guile should have a proper faster standard hashmap 
> implemention of such kind in scheme.
>
> Stefan



  parent reply	other threads:[~2022-02-23  7:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 13:18 Hatables are slow Stefan Israelsson Tampe
2022-02-21 22:17 ` Mikael Djurfeldt
2022-02-21 22:18   ` Mikael Djurfeldt
2022-02-23  7:40 ` Linus Björnstam [this message]
2022-02-23  8:02   ` Stefan Israelsson Tampe
2022-02-23 13:46   ` Mikael Djurfeldt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4dccd80b-18f2-40e3-b6b2-c1d97bd91224@www.fastmail.com \
    --to=linus.bjornstam@veryfast.biz \
    --cc=guile-devel@gnu.org \
    --cc=stefan.itampe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).