From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mikael Djurfeldt Newsgroups: gmane.lisp.guile.devel Subject: Re: Hatables are slow Date: Mon, 21 Feb 2022 23:18:26 +0100 Message-ID: References: Reply-To: mikael@djurfeldt.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000245c4e05d88e9d12" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25389"; mail-complaints-to="usenet@ciao.gmane.io" Cc: guile-devel To: Stefan Israelsson Tampe Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Mon Feb 21 23:19:03 2022 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nMH1L-0006OG-EO for guile-devel@m.gmane-mx.org; Mon, 21 Feb 2022 23:19:03 +0100 Original-Received: from localhost ([::1]:59704 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nMH1K-0004EN-Cj for guile-devel@m.gmane-mx.org; Mon, 21 Feb 2022 17:19:02 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:43640) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nMH10-0004ED-54 for guile-devel@gnu.org; Mon, 21 Feb 2022 17:18:42 -0500 Original-Received: from mail-ua1-f45.google.com ([209.85.222.45]:39580) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nMH0w-00062N-4U for guile-devel@gnu.org; Mon, 21 Feb 2022 17:18:40 -0500 Original-Received: by mail-ua1-f45.google.com with SMTP id 102so6472428uag.6 for ; Mon, 21 Feb 2022 14:18:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=FVWUI8HIa2ONdGzyS4n/meY/ZIgQdEljVACdcKAPP1M=; b=uF5oQTCbhRhOv01TkdXmwo2qq61lBCldgQLO7L8+7OiO8UVp4nawmXq4JOtCukHkul YbkZc9OiTrX1OkHCXHa8/Eq9KHWCdjaIe/MCZvOky9sMcmwVVTZQeY1wnE0bcnBBJuFC TL+caiX/9W/Uihz8f/MyiI0TR3iCD1X5ZHtNsoJcVcbLZb4BEijQY+YJ9YX8IFMKfzMa PJugC4Rcmj+10mtnmJ4Simu2EFCOOqiC97h7qPkEjY63KczSxY6eIIhCm6E1+Fhjfzyd GA7dqUCNbLI/v1rAAk0ZsjkoJooYHWLMCdt4wiV3Ixyt7vUjfXB3bOGft0jEgFpq5FDL finQ== X-Gm-Message-State: AOAM532SyAQu5QQFmAYGs8UIoiRspoSQGuMwzURh3crNzjaJklanaPm+ JdH4EVe8OgrjrYN+ngS3Vds6jTfoIXwoh31u+eU= X-Google-Smtp-Source: ABdhPJzTZx6h4aOatAdhjX8jMrJwMSElyHRqiifrxfdkYkHpM/QKYX/TBySqL/oY5HaK2QD8ggIfeconFyHaIOhUYyk= X-Received: by 2002:a9f:3dc7:0:b0:33d:1812:15ac with SMTP id e7-20020a9f3dc7000000b0033d181215acmr8769550uaj.79.1645481916913; Mon, 21 Feb 2022 14:18:36 -0800 (PST) In-Reply-To: Received-SPF: pass client-ip=209.85.222.45; envelope-from=mdjurfeldt@gmail.com; helo=mail-ua1-f45.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:21139 Archived-At: --000000000000245c4e05d88e9d12 Content-Type: text/plain; charset="UTF-8" (Note that the resizing means *rehashing* of all elements.) On Mon, Feb 21, 2022 at 11:17 PM Mikael Djurfeldt wrote: > The hash table in Guile is rather standard (at least according to what was > standard in the old ages :). (I *have* some vague memory that I might have > implemented a simpler/faster table at some point, but that is not in the > code base now.) > > The structure is a vector of alists. It's of course important that the > alists don't get too long, so there's some resizing going on. If you call > (make-hash-table), the size of the table starts out at 31, so in your use > case, there will be several resizing steps. > > What happens with speed if you do (make-hash-table 5000000) instead? > > Best regards, > Mikael > > On Mon, Feb 21, 2022 at 2:55 PM Stefan Israelsson Tampe < > stefan.itampe@gmail.com> wrote: > >> A datastructure I fancy is hash tables. But I found out that hashtables >> in guile are really slow, How? First of all we make a hash table >> >> (define h (make-hash-table)) >> >> Then add values >> (for-each (lambda (i) (hash-set! h i i)) (iota 20000000)) >> >> Then the following operation cost say 5s >> (hash-fold (lambda (k v s) (+ k v s)) 0 h) >> >> It is possible with the foreign interface to speedt this up to 2s using >> guiles internal interface. But this is slow for such a simple application. >> Now let's change focus. Assume the in stead an assoc, >> >> (define l (map (lambda (i) (cons i i)) (iota 20000000))) >> >> Then >> ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s)) >> $5 = 199999990000000 >> ;; 0.114530s real time, 0.114391s run time. 0.000000s spent in GC. >> >> That's 20X faster. What have happened?, Well hashmaps has terrible memory >> layout for scanning. So essentially keeping a list of the created values >> consed on a list not only get you an ordered hashmap, you also have 20X >> increase in speed, you sacrifice memory, say about 25-50% extra. The >> problem actually more that when you remove elements updating the ordered >> list is very expensive. In python-on-guile I have solved this by moving to >> a doubly linked list when people start's to delete single elements. For >> small hashmap things are different. >> >> I suggest that guile should have a proper faster standard hashmap >> implemention of such kind in scheme. >> >> Stefan >> >> >> >> --000000000000245c4e05d88e9d12 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
(Note that the resizing means *rehashing* of all elements.= )

On Mon, Feb 21, 2022 at 11:17 PM Mikael Djurfeldt <mikael@djurfeldt.com> wrote:
The hash = table in Guile is rather standard (at least according to what was standard = in the old ages :). (I *have* some vague memory that I might have implement= ed a simpler/faster table at some point, but that is not in the code base n= ow.)

The structure is a vector of alists. It's= of course important that the alists don't get too long, so there's= some resizing going on. If you call (make-hash-table), the size of the tab= le starts out at 31, so in your use case, there will be several resizing st= eps.

What happens with speed if you do (make-hash-= table 5000000) instead?

Best regards,
Mi= kael

On Mon, Feb 21, 2022 at 2:55 PM Stefan Israelsson Tampe <= ;stefan.itampe= @gmail.com> wrote:
A datastructure I fancy is hash tables. But I f= ound out that hashtables in guile are really slow, How? First of all we mak= e a hash table

(define h (make-hash-table))
Then add values
(for-each (lambda (i) (hash-set! h i= i)) (iota 20000000))

Then the following operation= cost say 5s
(hash-fold (lambda (k v s) (+ k v s)) 0 h)

It is possible with the foreign interface to speedt this up= to 2s using guiles internal interface. But this is slow for such a simple = application. Now let's change focus. Assume the in stead an assoc,

(define l (map (lambda (i) (cons i i)) (iota 20000000)= ))

Then
ime (let lp ((l ll) (s 0)) (if (pair= ? l) (lp (cdr l) (+ s (caar l))) s))
$5 =3D 199999990000000
;; 0.114530s real time, 0.114391s run time. =C2=A00.000000s spent in GC= .

That's 20X = faster. What have happened?, Well hashmaps has terrible memory layout for s= canning. So essentially keeping a list of the created values consed on a li= st not only get you an ordered hashmap, you also have 20X increase in speed= , you sacrifice memory, say about 25-50% extra. The problem actually more t= hat when you remove elements updating the ordered list is very expensive. I= n python-on-guile I have solved this by moving to a doubly linked list when= people start's to delete single elements. For small hashmap things are= different.

I suggest that guile sh= ould have a proper faster standard hashmap implemention of such kind in sch= eme.

Stefan



--000000000000245c4e05d88e9d12--