From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mikael Djurfeldt Newsgroups: gmane.lisp.guile.devel Subject: Re: Hatables are slow Date: Wed, 23 Feb 2022 14:46:28 +0100 Message-ID: References: <4dccd80b-18f2-40e3-b6b2-c1d97bd91224@www.fastmail.com> Reply-To: mikael@djurfeldt.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000ed4a9105d8afb1dd" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18937"; mail-complaints-to="usenet@ciao.gmane.io" Cc: guile-devel To: =?UTF-8?Q?Linus_Bj=C3=B6rnstam?= Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Wed Feb 23 14:47:39 2022 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nMrzX-0004jy-GQ for guile-devel@m.gmane-mx.org; Wed, 23 Feb 2022 14:47:39 +0100 Original-Received: from localhost ([::1]:44922 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nMrzW-0001nX-3B for guile-devel@m.gmane-mx.org; Wed, 23 Feb 2022 08:47:38 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:40876) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nMryd-0001iM-50 for guile-devel@gnu.org; Wed, 23 Feb 2022 08:46:44 -0500 Original-Received: from mail-vs1-f46.google.com ([209.85.217.46]:38423) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nMrya-0004kc-Kl for guile-devel@gnu.org; Wed, 23 Feb 2022 08:46:42 -0500 Original-Received: by mail-vs1-f46.google.com with SMTP id d11so3211656vsm.5 for ; Wed, 23 Feb 2022 05:46:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=0sxeQcp5v7Oy3ZW+1N9vncTRR39mWt+aFEpY9LZArRY=; b=BMYTeNlyDlBQan/xC3t+p5lPTKWqRaQlNST9p6HBXH4vMp63JtPKER43Ax2EiuTa+A QOx1E1AzgNlse8dWw2kXjs8G1oj4ZbQ50augfGSyvslpkIYwGL48R7zCH/HsImMYxABg b39LtOdvbU4WUS8x/uTDGih7Md/e7a6DyYPqbHegmaj8F5J0Y3qO0aZr0BN61t3Thqt/ OlCgFj9jamwjWem10ixuEpPzzZ97UCys0RMkdRDOzTE14XVXYpDjhDNKP+gSxRom/3pc g/MUPfkVLr8xXQFS1VOf4EH5h68v3dn3zJcdQqc2/K65UV1OcOiU7QN+QDtRkcD+MS8A tPkw== X-Gm-Message-State: AOAM532VbcIUHm/XJemVGpZ1eBetU+jPYBBKjmf0uVTr28rxoOLbEPVz oStOE1aPEZjP5NOds+tpxd7cMCDjIiCK0Vk5kkE= X-Google-Smtp-Source: ABdhPJzf1tjtNIZYg8Z7DCh/ot6Ni3r/4IYvulDwU1ZA+wkkLMLLI7uWNFraXvMBs/BFivugV9oTRSU9v0/iO1CiQ/g= X-Received: by 2002:a05:6102:7ba:b0:31b:f0c6:adf9 with SMTP id x26-20020a05610207ba00b0031bf0c6adf9mr11637015vsg.16.1645623999546; Wed, 23 Feb 2022 05:46:39 -0800 (PST) In-Reply-To: <4dccd80b-18f2-40e3-b6b2-c1d97bd91224@www.fastmail.com> Received-SPF: pass client-ip=209.85.217.46; envelope-from=mdjurfeldt@gmail.com; helo=mail-vs1-f46.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:21147 Archived-At: --000000000000ed4a9105d8afb1dd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I agree. Also, if there is no strong reason to deviate from RnRS, that would be a good choice. (But, I'm also no maintainer.) On Wed, Feb 23, 2022 at 8:42 AM Linus Bj=C3=B6rnstam < linus.bjornstam@veryfast.biz> wrote: > Hej! > > I would also propose a hash table based on a more sane interface. The > equality and hash procedures should be associated with the hash table at > creation rather than every time the hash table is used. Like in R6RS, > srfi-69, or srfi-12X (intermediate hash tables). > > Maybe the current HT could be relegated to some kind of compat or > deprecated library to be removed in 3.4... I am no maintainer, but I thin= k > we can all agree that the current API, while fine in the context of guile > 1.6, is somewhat clunky by today's standards. It is also commonplace enou= gh > that regular deprecation might become rough. > > Just the simple fact that hash-set! and hashq-set! can be used > interchangeably while you at the same time NEVER EVER should mix them is > somewhat unnerving. > > I would say a hash table that specifies everything at creation time (with > maybe an opportunity to use something like the hashx-* functions for > daredevils and for future srfi needs) is the way to go. > > Best regards > Linus Bj=C3=B6rnstam > > On Mon, 21 Feb 2022, at 14:18, Stefan Israelsson Tampe wrote: > > A datastructure I fancy is hash tables. But I found out that hashtables > > in guile are really slow, How? First of all we make a hash table > > > > (define h (make-hash-table)) > > > > Then add values > > (for-each (lambda (i) (hash-set! h i i)) (iota 20000000)) > > > > Then the following operation cost say 5s > > (hash-fold (lambda (k v s) (+ k v s)) 0 h) > > > > It is possible with the foreign interface to speedt this up to 2s using > > guiles internal interface. But this is slow for such a simple > > application. Now let's change focus. Assume the in stead an assoc, > > > > (define l (map (lambda (i) (cons i i)) (iota 20000000))) > > > > Then > > ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s)= ) > > $5 =3D 199999990000000 > > ;; 0.114530s real time, 0.114391s run time. 0.000000s spent in GC. > > > > That's 20X faster. What have happened?, Well hashmaps has terrible > > memory layout for scanning. So essentially keeping a list of the > > created values consed on a list not only get you an ordered hashmap, > > you also have 20X increase in speed, you sacrifice memory, say about > > 25-50% extra. The problem actually more that when you remove elements > > updating the ordered list is very expensive. In python-on-guile I have > > solved this by moving to a doubly linked list when people start's to > > delete single elements. For small hashmap things are different. > > > > I suggest that guile should have a proper faster standard hashmap > > implemention of such kind in scheme. > > > > Stefan > > --000000000000ed4a9105d8afb1dd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I agree. Also, if there is no strong reason to deviate fro= m RnRS, that would be a good choice. (But, I'm also no maintainer.)
=

= On Wed, Feb 23, 2022 at 8:42 AM Linus Bj=C3=B6rnstam <linus.bjornstam@veryfast.biz> wrote:
Hej!

I would also propose a hash table based on a more sane interface. The equal= ity and hash procedures should be associated with the hash table at creatio= n rather than every time the hash table is used. Like in R6RS, srfi-69, or = srfi-12X (intermediate hash tables).

Maybe the current HT could be relegated to some kind of compat or deprecate= d library to be removed in 3.4... I am no maintainer, but I think we can al= l agree that the current API, while fine in the context of guile 1.6, is so= mewhat clunky by today's standards. It is also commonplace enough that = regular deprecation might become rough.

Just the simple fact that hash-set! and hashq-set! can be used interchangea= bly while you at the same time NEVER EVER should mix them is somewhat unner= ving.

I would say a hash table that specifies everything at creation time (with m= aybe an opportunity to use something like the hashx-* functions for daredev= ils and for future srfi needs) is the way to go.

Best regards
=C2=A0 Linus Bj=C3=B6rnstam

On Mon, 21 Feb 2022, at 14:18, Stefan Israelsson Tampe wrote:
> A datastructure I fancy is hash tables. But I found out that hashtable= s
> in guile are really slow, How? First of all we make a hash table
>
> (define h (make-hash-table))
>
> Then add values
> (for-each (lambda (i) (hash-set! h i i)) (iota 20000000))
>
> Then the following operation cost say 5s
> (hash-fold (lambda (k v s) (+ k v s)) 0 h)
>
> It is possible with the foreign interface to speedt this up to 2s usin= g
> guiles internal interface. But this is slow for such a simple
> application. Now let's change focus. Assume the in stead an assoc,=
>
> (define l (map (lambda (i) (cons i i)) (iota 20000000)))
>
> Then
> ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s= ))
> $5 =3D 199999990000000
> ;; 0.114530s real time, 0.114391s run time.=C2=A0 0.000000s spent in G= C.
>
> That's 20X faster. What have happened?, Well hashmaps has terrible=
> memory layout for scanning. So essentially keeping a list of the
> created values consed on a list not only get you an ordered hashmap, <= br> > you also have 20X increase in speed, you sacrifice memory, say about <= br> > 25-50% extra. The problem actually more that when you remove elements =
> updating the ordered list is very expensive. In python-on-guile I have=
> solved this by moving to a doubly linked list when people start's = to
> delete single elements. For small hashmap things are different.
>
> I suggest that guile should have a proper faster standard hashmap
> implemention of such kind in scheme.
>
> Stefan

--000000000000ed4a9105d8afb1dd--