From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mikael Djurfeldt Newsgroups: gmane.lisp.guile.devel Subject: Re: Hatables are slow Date: Mon, 21 Feb 2022 23:17:18 +0100 Message-ID: References: Reply-To: mikael@djurfeldt.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000237d9005d88e9933" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21183"; mail-complaints-to="usenet@ciao.gmane.io" Cc: guile-devel To: Stefan Israelsson Tampe Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Mon Feb 21 23:18:09 2022 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nMH0T-0005Ho-3X for guile-devel@m.gmane-mx.org; Mon, 21 Feb 2022 23:18:09 +0100 Original-Received: from localhost ([::1]:59168 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nMH0R-0003qq-LZ for guile-devel@m.gmane-mx.org; Mon, 21 Feb 2022 17:18:07 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:43470) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nMGzt-0003qT-Bh for guile-devel@gnu.org; Mon, 21 Feb 2022 17:17:33 -0500 Original-Received: from mail-vs1-f46.google.com ([209.85.217.46]:36673) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nMGzr-0005xK-2u for guile-devel@gnu.org; Mon, 21 Feb 2022 17:17:32 -0500 Original-Received: by mail-vs1-f46.google.com with SMTP id e26so19101762vso.3 for ; Mon, 21 Feb 2022 14:17:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=vR4y8LnqH/6Lcw6Bh8O58FSdNgkZr9W7yyKmSRkDsBY=; b=qaap2AC1cXFf0ZnljvBWykVkKH/uA0oZCmeSluo/UvJeSVmdyf93bgEJdehNe8gkOu GErffx9rUvx/xhneOFuLNi7rtL/2bupvgVkdOMcB5Q2XjKCxTUGGyCW/oRKHU3mBUSNr uZCaoy/Syt+wiMeMW0aStx0T6o/POhSfafvHoKty1CCpjj0S7neIlgZjaiSuS16UX2E9 3fCK3awOV+8S9rx4ZtkBi2m8lGPAfb41+PLKs5jAWH00MIjy9PEHIFIVgml85HrRbjr6 rZtcQKCOcrwRIAA1uqNUYiNJiWDNgCY3FUCfxSxSegdxXLmzs6cJ/mAqrsmmc4eu+vEn 6JhQ== X-Gm-Message-State: AOAM532yDBwbVBD8zq6UlR+wiXHwt102ORNIXFK1BBMC5VcvF2sBu8cH 01F85F3PSvKTzjq4iqCj+jdLJquXFTZYvgUtQVU17FRp X-Google-Smtp-Source: ABdhPJwYa9a5kHVqlLzi7ynb6PzW/D1YEkggWBk1nT0dgrmxcgvYLkvbLmEBFuPGijiFILH3Yq826eYey5i3Ktcl0Gw= X-Received: by 2002:a05:6102:7ba:b0:31b:f0c6:adf9 with SMTP id x26-20020a05610207ba00b0031bf0c6adf9mr8268863vsg.16.1645481849747; Mon, 21 Feb 2022 14:17:29 -0800 (PST) In-Reply-To: Received-SPF: pass client-ip=209.85.217.46; envelope-from=mdjurfeldt@gmail.com; helo=mail-vs1-f46.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:21138 Archived-At: --000000000000237d9005d88e9933 Content-Type: text/plain; charset="UTF-8" The hash table in Guile is rather standard (at least according to what was standard in the old ages :). (I *have* some vague memory that I might have implemented a simpler/faster table at some point, but that is not in the code base now.) The structure is a vector of alists. It's of course important that the alists don't get too long, so there's some resizing going on. If you call (make-hash-table), the size of the table starts out at 31, so in your use case, there will be several resizing steps. What happens with speed if you do (make-hash-table 5000000) instead? Best regards, Mikael On Mon, Feb 21, 2022 at 2:55 PM Stefan Israelsson Tampe < stefan.itampe@gmail.com> wrote: > A datastructure I fancy is hash tables. But I found out that hashtables in > guile are really slow, How? First of all we make a hash table > > (define h (make-hash-table)) > > Then add values > (for-each (lambda (i) (hash-set! h i i)) (iota 20000000)) > > Then the following operation cost say 5s > (hash-fold (lambda (k v s) (+ k v s)) 0 h) > > It is possible with the foreign interface to speedt this up to 2s using > guiles internal interface. But this is slow for such a simple application. > Now let's change focus. Assume the in stead an assoc, > > (define l (map (lambda (i) (cons i i)) (iota 20000000))) > > Then > ime (let lp ((l ll) (s 0)) (if (pair? l) (lp (cdr l) (+ s (caar l))) s)) > $5 = 199999990000000 > ;; 0.114530s real time, 0.114391s run time. 0.000000s spent in GC. > > That's 20X faster. What have happened?, Well hashmaps has terrible memory > layout for scanning. So essentially keeping a list of the created values > consed on a list not only get you an ordered hashmap, you also have 20X > increase in speed, you sacrifice memory, say about 25-50% extra. The > problem actually more that when you remove elements updating the ordered > list is very expensive. In python-on-guile I have solved this by moving to > a doubly linked list when people start's to delete single elements. For > small hashmap things are different. > > I suggest that guile should have a proper faster standard hashmap > implemention of such kind in scheme. > > Stefan > > > > --000000000000237d9005d88e9933 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The hash table in Guile is rather standard (at least = according to what was standard in the old ages :). (I *have* some vague mem= ory that I might have implemented a simpler/faster table at some point, but= that is not in the code base now.)

The structure = is a vector of alists. It's of course important that the alists don'= ;t get too long, so there's some resizing going on. If you call (make-h= ash-table), the size of the table starts out at 31, so in your use case, th= ere will be several resizing steps.

What happens w= ith speed if you do (make-hash-table 5000000) instead?

=
Best regards,
Mikael

On Mon, Feb 21, 2022 at 2:55 P= M Stefan Israelsson Tampe <st= efan.itampe@gmail.com> wrote:
A datastructure I fancy is hash table= s. But I found out that hashtables in guile are really slow, How? First of = all we make a hash table

(define h (make-hash-table))

Then add values
(for-each (lambda (i) (has= h-set! h i i)) (iota 20000000))

Then the following= operation cost say 5s
(hash-fold (lambda (k v s) (+ k v s)) 0 h)=

It is possible with the foreign interface to spee= dt this up to 2s using guiles internal interface. But this is slow for such= a simple application. Now let's change focus. Assume the in stead an a= ssoc,

(define l (map (lambda (i) (cons i i)) (iota= 20000000)))

Then
ime (let lp ((l ll) (s 0))= (if (pair? l) (lp (cdr l) (+ s (caar l))) s))
$5 =3D 199999990000000
;; 0.114530s real time, 0.114391s run time. =C2=A00.000000s spent in GC= .

That's 20X = faster. What have happened?, Well hashmaps has terrible memory layout for s= canning. So essentially keeping a list of the created values consed on a li= st not only get you an ordered hashmap, you also have 20X increase in speed= , you sacrifice memory, say about 25-50% extra. The problem actually more t= hat when you remove elements updating the ordered list is very expensive. I= n python-on-guile I have solved this by moving to a doubly linked list when= people start's to delete single elements. For small hashmap things are= different.

I suggest that guile sh= ould have a proper faster standard hashmap implemention of such kind in sch= eme.

Stefan



--000000000000237d9005d88e9933--