From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.devel Subject: Re: Growable arrays? Date: Thu, 14 Jun 2012 19:15:31 +0200 Organization: Organization?!? Message-ID: <87fw9xvmq4.fsf@fencepost.gnu.org> References: <87hauku0mb.fsf@fencepost.gnu.org> <87sjdyyncv.fsf@netris.org> <87vciuuf12.fsf@fencepost.gnu.org> <87k3z9vrds.fsf@fencepost.gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: dough.gmane.org 1339694158 3738 80.91.229.3 (14 Jun 2012 17:15:58 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 14 Jun 2012 17:15:58 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Jun 14 19:15:58 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SfDeF-0003NP-0Q for guile-devel@m.gmane.org; Thu, 14 Jun 2012 19:15:55 +0200 Original-Received: from localhost ([::1]:43423 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SfDeE-0003Qy-Sc for guile-devel@m.gmane.org; Thu, 14 Jun 2012 13:15:54 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:54758) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SfDeB-0003Qe-VK for guile-devel@gnu.org; Thu, 14 Jun 2012 13:15:53 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SfDe9-0006F8-Ug for guile-devel@gnu.org; Thu, 14 Jun 2012 13:15:51 -0400 Original-Received: from plane.gmane.org ([80.91.229.3]:60785) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SfDe9-0006Ew-NO for guile-devel@gnu.org; Thu, 14 Jun 2012 13:15:49 -0400 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1SfDe4-0002pN-OH for guile-devel@gnu.org; Thu, 14 Jun 2012 19:15:44 +0200 Original-Received: from p508eb9b7.dip.t-dialin.net ([80.142.185.183]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 14 Jun 2012 19:15:44 +0200 Original-Received: from dak by p508eb9b7.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 14 Jun 2012 19:15:44 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 58 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: p508eb9b7.dip.t-dialin.net X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux) Cancel-Lock: sha1:C2fMrUYCMAGx/zjkVFrLr8GG6Js= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14624 Archived-At: Daniel Hartwig writes: > On 14 June 2012 23:34, David Kastrup wrote: > >> Huh?  When resizing a hash table by doubling, you need to recoalesce >> each bucket to two buckets, one of which is the original.  Doubling >> the size of the underlying vector is a reasonable start to do this >> operation as a half-in-place rather than a three-address variant. > > What is this half-in-place algorithm that makes this efficient? If > the table is to remain balanced, all items should be rehashed and > reallocated to a new bucket and there is no correlation between an > items old and new buckets (hash). Huh? Hashing computes a hash value, and the bucket is determined by hash % vectorsize. Double the vector size, and the new bucket locations are at bucket and bucket+vectorsize, so you need to coalesce the bucket at bucket into the buckets at bucket and bucket+vectorsize. Why would there be no correlation between old and new buckets when they are calculated based on the same hash code? > 1. allocate new vector > 2. assign each item to it's new bucket > 3. destroy old vector > > … when the table does not control it's own memory but uses an > underlying resizable type: > > 1. resize vector > 1a. allocate new vector > 1b. move data to head of new vector > 1c. destroy old vector > 2. assign each item to it's new bucket but this time watch out for > already populated buckets … I have no idea what you are talking about. Each bucket contains a list. > which to me /seems/ less efficient, and particularly bad if the > buckets are linked lists or open addressed. Complexity is pretty much the same if we don't bother about memory locality. Being able to process the buckets sequentially, however, is advantageous because they then stay mostly in cache. Splitting a single bucket list into two lists is easy to do in-place. > But I am curious if this can be done efficiently, and indeed if it is > more efficient than the table managing it's own memory. It can never be more efficient than "the table managing its own memory" since obviously it is then able to pick the best algorithm. Which in this case is simply identical with the general-purpose algorithm. So while it is not more efficient in computing resources (apart from the small impact due to duplicate code), it is more efficient in programmer resources: cost of writing, understanding, debugging, documenting. -- David Kastrup