* Resizing hash tables in Guile [not found] ` <xy74r7ckqmy.fsf@nada.kth.se> @ 2003-02-11 13:59 ` Mikael Djurfeldt 2003-02-11 17:34 ` Roland Orre ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Mikael Djurfeldt @ 2003-02-11 13:59 UTC (permalink / raw) Cc: Greg Troxel I've just committed resizing hash table functionality to the Guile core hashtable functions (in hashtab.c). Currently, the only thing which has changed in the API is that the size argument to the hash table constructors is now optional. If omitted, you get a resizing table. Please comment on this API and also take a look at the implementation in hashtab.c. Questions: Is this a good thing? Should we keep it? I made the hash tables thread-safe (locking/unlocking a mutex at hash table access and rehashing). Is that good? An alternativ is to require of the programmer to make sure the hash tables aren't accessed in parallel. The implementation isn't quite finished. Removal should be rewritten for the new tables. Also, weak hash tables need to be handled in hashtab.c in order to maintain a correct item count. M _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-11 13:59 ` Resizing hash tables in Guile Mikael Djurfeldt @ 2003-02-11 17:34 ` Roland Orre [not found] ` <ljy94lhgkb.fsf@burns.dt.e-technik.uni-dortmund.de> 2003-02-12 16:10 ` Marius Vollmer 2003-02-12 20:55 ` Rob Browning 2 siblings, 1 reply; 15+ messages in thread From: Roland Orre @ 2003-02-11 17:34 UTC (permalink / raw) Cc: Greg Troxel On Tue, 2003-02-11 at 14:59, Mikael Djurfeldt wrote: > I've just committed resizing hash table functionality to the Guile > core hashtable functions (in hashtab.c). > > Currently, the only thing which has changed in the API is that the > size argument to the hash table constructors is now optional. If > omitted, you get a resizing table. > > Please comment on this API and also take a look at the implementation > in hashtab.c. I've studied the code and consider this a perfect minimalistic solution. It works without disturbing any old code. I'll test drive the code tonight. > Questions: > > Is this a good thing? Should we keep it? It's definitely a good thing. In our application we sometimes need just a few tens of items hashed but often many millions of items without really knowing beforehand. It's definitely a good thing to not need to bother about analyzing optimal table sizes any longer. We hope it will stay and we will start using it in our application as soon as possible. > I made the hash tables thread-safe (locking/unlocking a mutex at hash > table access and rehashing). Is that good? An alternativ is to > require of the programmer to make sure the hash tables aren't accessed > in parallel. In our case I think we will most likely set up the different processes to use different hash tables but we have still not parallellized our applications. Could/should the mutex protection maybe be optional? Best regards Roland _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <ljy94lhgkb.fsf@burns.dt.e-technik.uni-dortmund.de>]
* Re: Resizing hash tables in Guile [not found] ` <ljy94lhgkb.fsf@burns.dt.e-technik.uni-dortmund.de> @ 2003-02-12 17:47 ` Mikael Djurfeldt 2003-02-12 20:44 ` Rob Browning 0 siblings, 1 reply; 15+ messages in thread From: Mikael Djurfeldt @ 2003-02-12 17:47 UTC (permalink / raw) Cc: guile-devel Marius Vollmer <marius.vollmer@uni-dortmund.de> writes: > Roland Orre <orre@nada.kth.se> writes: > >> > I made the hash tables thread-safe (locking/unlocking a mutex at hash >> > table access and rehashing). Is that good? An alternativ is to >> > require of the programmer to make sure the hash tables aren't accessed >> > in parallel. >> >> In our case I think we will most likely set up the different processes >> to use different hash tables but we have still not parallellized our >> applications. Could/should the mutex protection maybe be optional? > > Since users must consciously ask for resizable hash tables, we can > require them to do the locking themselves. Should we? I would say > yes, since people need to be aware of thread issues anyway, and maybe > they have a better scheme for ensuring thread safe access to a hash > table. I agree. When implementing the current thread safety I've encountered a cluster of very interesting problems. I'll write up a small text about this, put in the workbook and post on the list. I'd say I can do this within a few weeks. For example, I've discovered that the with-mutex construct we provide in threads.scm is of dubious utility. It also seems hard to second-guess the "users" needs when trying to provide high-level thread-safety constructs in general. I'll remove the thread protection code from hashtab.c. > Removing the locking from the core code should improve performance, > right? Yes, at least marginally. > However, the hash tables should be somewhat thread safe: they might > not work as advertised when multiple threads access them, but the > application might not crash. That is, adding an element from one > thread while another thread is doing the same thing might make one of > the elements disappear, but it must leave the hash table in a valid > state. > > Do the non-resizing hash tables behgave that way, incidentally? I'd say yes. > (Ahh, I just love cooperative threading. These things are so much > easier and efficient when there is only one running thread with > defined switch points... :-) I think the current restriction on signals only causing exceptions at well defined points takes us a good way towards that. I doubt that there are still many points in Guile which need to be modified to get the thread safety to where we want it. M _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-12 17:47 ` Mikael Djurfeldt @ 2003-02-12 20:44 ` Rob Browning 0 siblings, 0 replies; 15+ messages in thread From: Rob Browning @ 2003-02-12 20:44 UTC (permalink / raw) Cc: Marius Vollmer Mikael Djurfeldt <djurfeldt@nada.kth.se> writes: > I agree. When implementing the current thread safety I've > encountered a cluster of very interesting problems. I'll write up a > small text about this, put in the workbook and post on the list. > I'd say I can do this within a few weeks. For example, I've > discovered that the with-mutex construct we provide in threads.scm > is of dubious utility. It also seems hard to second-guess the > "users" needs when trying to provide high-level thread-safety > constructs in general. Sorry I haven't had a chance to comment sooner, but I also feel that we should probably let the user handle thread-safety. Way back when I first started programming heavily with threads, I had the naive impression that everything should "just be thread safe" :> After a lot of hacking, I would have to say I was definitively wrong. In my experience, the issues surrounding thread-safety and the tradeoffs involved are too complex to be easily managed in a blanket way. As a trivial example, it can be frustrating if you're trying to get high performance for a segment of code, to have no way to avoid hundreds or thousands of mutex grabs when you already *know* that the code segment in question is protected by some other means. -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-11 13:59 ` Resizing hash tables in Guile Mikael Djurfeldt 2003-02-11 17:34 ` Roland Orre @ 2003-02-12 16:10 ` Marius Vollmer 2003-02-12 17:53 ` Mikael Djurfeldt 2003-02-12 20:55 ` Rob Browning 2 siblings, 1 reply; 15+ messages in thread From: Marius Vollmer @ 2003-02-12 16:10 UTC (permalink / raw) Cc: Greg Troxel Mikael Djurfeldt <djurfeldt@nada.kth.se> writes: > Is this a good thing? Should we keep it? I think resizing hash tables are important. Thanks for writing them! > The implementation isn't quite finished. Removal should be rewritten > for the new tables. Also, weak hash tables need to be handled in > hashtab.c in order to maintain a correct item count. If you are not going to finish this soonish, please make a note in workbook/tasks/TODO about what needs to still be done. -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-12 16:10 ` Marius Vollmer @ 2003-02-12 17:53 ` Mikael Djurfeldt 2003-02-12 20:17 ` Roland Orre 0 siblings, 1 reply; 15+ messages in thread From: Mikael Djurfeldt @ 2003-02-12 17:53 UTC (permalink / raw) Cc: Greg Troxel Marius Vollmer <mvo@zagadka.de> writes: > Mikael Djurfeldt <djurfeldt@nada.kth.se> writes: > >> Is this a good thing? Should we keep it? > > I think resizing hash tables are important. I've been thinking that maybe we should continue the move and let the resizing tables entirely replace the fixed size ones. It seems a little silly to have to explain to Guile users that there are two (sorry, eight) different kinds of hash tables... Also, I think the opacity of the resizing table objects is an advantage rather than a disadvantage. If they are opaque, we can any time modify the underlying implementation (the well-known data abstraction argument). What do you say? >> The implementation isn't quite finished. Removal should be rewritten >> for the new tables. Also, weak hash tables need to be handled in >> hashtab.c in order to maintain a correct item count. > > If you are not going to finish this soonish, please make a note in > workbook/tasks/TODO about what needs to still be done. Right. Removal is fixed now. Weak tables could be fixed on a train ride during the weekend. :) (BTW, at that time it would be good if we have come to a decision with regard to dropping the fixed-size vector of alist tables.) M _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-12 17:53 ` Mikael Djurfeldt @ 2003-02-12 20:17 ` Roland Orre 2003-02-13 9:35 ` Mikael Djurfeldt 2003-02-13 9:52 ` Joris van der Hoeven 0 siblings, 2 replies; 15+ messages in thread From: Roland Orre @ 2003-02-12 20:17 UTC (permalink / raw) Cc: Greg Troxel On Wed, 2003-02-12 at 18:53, Mikael Djurfeldt wrote: > I've been thinking that maybe we should continue the move and let the > resizing tables entirely replace the fixed size ones. It seems a > little silly to have to explain to Guile users that there are two > (sorry, eight) different kinds of hash tables... Also, I think the > opacity of the resizing table objects is an advantage rather than a > disadvantage. If they are opaque, we can any time modify the > underlying implementation (the well-known data abstraction argument). > > What do you say? I think that this is a a good idea, but I would like the optional size argument to still be there as an initial guess about the number of items to expect to avoid resizing and to optimize performance when we have advance information about how many items to expect. Of course this additional level of abstraction is nice as it also gives a future option to use trees if the user e.g gives an optional comparision operator. By the way, Joris van der Hoeven mentioned that this type of resizing hash tables are faster than trees, which have a time complexity of O(log n)+reshuffling time if they are balanced. Do you have any numbers showing how much faster and if possible if there are any conditions? The reshuffling time will grow at least O(n) when the size of our linear tables increases if I have understood right. Best regards Roland Orre _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-12 20:17 ` Roland Orre @ 2003-02-13 9:35 ` Mikael Djurfeldt 2003-02-13 13:55 ` Harvey J. Stein 2003-02-13 9:52 ` Joris van der Hoeven 1 sibling, 1 reply; 15+ messages in thread From: Mikael Djurfeldt @ 2003-02-13 9:35 UTC (permalink / raw) Cc: Greg Troxel Roland Orre <orre@nada.kth.se> writes: > On Wed, 2003-02-12 at 18:53, Mikael Djurfeldt wrote: >> I've been thinking that maybe we should continue the move and let the >> resizing tables entirely replace the fixed size ones. [...] > I think that this is a a good idea, but I would like the optional size > argument to still be there as an initial guess about the number of > items to expect [...] Agreed. > Of course this additional level of abstraction is nice as it also gives > a future option to use trees if the user e.g gives an optional > comparision operator. > > By the way, Joris van der Hoeven mentioned that this type of resizing > hash tables are faster than trees, which have a time complexity > of O(log n)+reshuffling time if they are balanced. Do you have any > numbers showing how much faster and if possible if there are any > conditions? The reshuffling time will grow at least O(n) when the > size of our linear tables increases if I have understood right. I doubt that we'll ever use trees under the hash table API, the reason being that the user expects O(1) for hash table operations. Even though O(log n) grows slowly, it is substantially different for large enough data sets. Regarding reshuffling time: Yes, rehuffling means that every operation isn't O(1), but it *does* mean that they are O(1) on average. You can understand this by a trick sometimes used in algorithm run-time analysis called "amortization": The idea is that for every operation on the hash table you "pay" some extra virtual time, and the sum of this time can then be used to account for the reshuffling time. In my implementation, the hash table is roughly doubled for each rehash. Before rehashing occurs, you have inserted N elements. This has cost you less than cN seconds. Rehashing is O(2N) = O(N), so we can say it will cost us less than dN seconds. If we now pay off d seconds per operation in advance, and note that the argument above holds equally well for each rehashing point, we realize that each operation costs less than c + d seconds on average. This means that, on average, operations are O(1). Mikael _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-13 9:35 ` Mikael Djurfeldt @ 2003-02-13 13:55 ` Harvey J. Stein 2003-02-13 14:24 ` Joris van der Hoeven 0 siblings, 1 reply; 15+ messages in thread From: Harvey J. Stein @ 2003-02-13 13:55 UTC (permalink / raw) Cc: Greg Troxel Mikael Djurfeldt <djurfeldt@nada.kth.se> writes: > Regarding reshuffling time: Yes, rehuffling means that every operation > isn't O(1), but it *does* mean that they are O(1) on average. You can > understand this by a trick sometimes used in algorithm run-time > analysis called "amortization": > > The idea is that for every operation on the hash table you "pay" some > extra virtual time, and the sum of this time can then be used to > account for the reshuffling time. In my implementation, the hash > table is roughly doubled for each rehash. Before rehashing occurs, > you have inserted N elements. This has cost you less than cN seconds. > Rehashing is O(2N) = O(N), so we can say it will cost us less than dN > seconds. If we now pay off d seconds per operation in advance, and > note that the argument above holds equally well for each rehashing > point, we realize that each operation costs less than c + d seconds on > average. This means that, on average, operations are O(1). Inserts are, but lookups aren't necessarily. Lookups being O(1) requires uniformity of bucket sizes. Worst case hash table lookup time is still O(N). And good hashing functions are still hard to write. People overestimate log(N) and overuse O(). When comparing an O(1) algorithm to an O(log(N)) algorithm, it really comes down to the actual functions involved, and actual problem size, not just the asymptotic behavior. 2^32 is over 4,000,000,000. With this many items, log(N) is still just 32, so an O(log(N)) algorithm will still beat an O(1) algorithm if it's really log_2(N) vs 32. And you can't have 2^32 items in your hash table unless this is on a 64bit machine. And with this many items in your hash table, you'd need 16gb (32gb on a 64bit machine) just to store the addresses, let alone the keys and the items themselves, so by this time you're probably hitting the swap so hard that the whole question is moot. Also, if a person's relying on O(1) for hash table performance, it might be not because they need that on average, but because they need an upper bound on the operation time, in which case automatic resizing would also violate this, even though it maintains O(1) on average. Trees also sort the data for you, which hash tables don't give you. So, ideally, one would have a hash table object with & without resizing, and various sorts of tree (AVL, red/black, B*, etc) objects. insert and delete and map would be methods that work on all of the above, with map on trees returning the data in sorted order. For that matter, insert & delete might as well also work on lists... BTW, the hash table implementation in Tcl seems to be quite good. -- Harvey Stein Bloomberg LP hjstein@bloomberg.com _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-13 13:55 ` Harvey J. Stein @ 2003-02-13 14:24 ` Joris van der Hoeven 2003-02-13 18:30 ` Harvey J. Stein 0 siblings, 1 reply; 15+ messages in thread From: Joris van der Hoeven @ 2003-02-13 14:24 UTC (permalink / raw) Cc: Greg Troxel > > Regarding reshuffling time: Yes, rehuffling means that every operation > > isn't O(1), but it *does* mean that they are O(1) on average. You can > > understand this by a trick sometimes used in algorithm run-time > > analysis called "amortization": > > > > The idea is that for every operation on the hash table you "pay" some > > extra virtual time, and the sum of this time can then be used to > > account for the reshuffling time. In my implementation, the hash > > table is roughly doubled for each rehash. Before rehashing occurs, > > you have inserted N elements. This has cost you less than cN seconds. > > Rehashing is O(2N) = O(N), so we can say it will cost us less than dN > > seconds. If we now pay off d seconds per operation in advance, and > > note that the argument above holds equally well for each rehashing > > point, we realize that each operation costs less than c + d seconds on > > average. This means that, on average, operations are O(1). > > Inserts are, but lookups aren't necessarily. Both aren't necessarily, because inserting requires looking up too. > Lookups being O(1) requires uniformity of bucket sizes. > Worst case hash table lookup time is still O(N). You can also store a binary search tree in each of the buckets, if you think that your hash function is bad. > And good hashing functions are still hard to write. I do not really agree. A good hash algorithm for lists (or strings), which I use in TeXmacs, is to rotate the 32 bit integer hash values of each of the members by a prime number like 3, 5, 7 or 11 and progressively take the exclusive or. This seems to lead to bucket sizes as predicted by probability theory, even for hash tables of size 2^p. > People overestimate log(N) and overuse O(). When comparing an O(1) > algorithm to an O(log(N)) algorithm, it really comes down to the > actual functions involved, and actual problem size, not just the > asymptotic behavior. 2^32 is over 4,000,000,000. A factor 10 is still a factor 10 though. (2^10 ~~ 1000). > With this many > items, log(N) is still just 32, so an O(log(N)) algorithm will still > beat an O(1) algorithm if it's really log_2(N) vs 32. Yes, but the O(1) is really *table lookup* multiplied by a small constant here, so this is *fast*. You may adjust the small constant by choosing an appropriate threshold for "size/nr buckets". > Also, if a person's relying on O(1) for hash table performance, it might be > not because they need that on average, but because they need an upper > bound on the operation time, in which case automatic resizing would > also violate this, even though it maintains O(1) on average. This is a more serious drawback of standard hash tables, but, as I said before, we already have garbage collection in Guile anyway... > Trees also sort the data for you, which hash tables don't give you. But you need a compairison operation for that, which may be even less natural than a hash function. > So, ideally, one would have a hash table object with & without > resizing, and various sorts of tree (AVL, red/black, B*, etc) objects. > insert and delete and map would be methods that work on all of the > above, with map on trees returning the data in sorted order. For that > matter, insert & delete might as well also work on lists... Agreed: ideally, we have everything :^) _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-13 14:24 ` Joris van der Hoeven @ 2003-02-13 18:30 ` Harvey J. Stein 2003-02-13 20:02 ` Paul Jarc 0 siblings, 1 reply; 15+ messages in thread From: Harvey J. Stein @ 2003-02-13 18:30 UTC (permalink / raw) Cc: Greg Troxel Joris van der Hoeven <TeXmacs@math.u-psud.fr> writes: > > > average. This means that, on average, operations are O(1). > > > > Inserts are, but lookups aren't necessarily. > > Both aren't necessarily, because inserting requires looking up too. Yes, if you want to return an error for inserting 2 items with the same key. No, if you allow multiple items with the same key. > > And good hashing functions are still hard to write. > > I do not really agree. A good hash algorithm for lists (or strings), > which I use in TeXmacs, is ... I guess "hard" isn't quite the word for it. What I meant was that the obvious ones don't work well. You basically need to look up a good one to get good behavior. Tcl's fcn works pretty well too. I forget what it does. > > With this many items, log(N) is still just 32, so an O(log(N)) > > algorithm will still beat an O(1) algorithm if it's really > > log_2(N) vs 32. > > Yes, but the O(1) is really *table lookup* multiplied by a small > constant here, so this is *fast*. You may adjust the small constant > by choosing an appropriate threshold for "size/nr buckets". Not just table lookup. Also hash calculation from the key, which can be more time consuming than a key comparison, and key comparisons for everything in the bucket. > > Trees also sort the data for you, which hash tables don't give you. > > But you need a compairison operation for that, > which may be even less natural than a hash function. Yes. For hash tables you just need a key equality test. For trees you need to be able to order the keys. > > So, ideally, one would have a hash table object with & without > > resizing, and various sorts of tree (AVL, red/black, B*, etc) objects. > > insert and delete and map would be methods that work on all of the > > above, with map on trees returning the data in sorted order. For that > > matter, insert & delete might as well also work on lists... > > Agreed: ideally, we have everything :^) I think we're on the same page here... STk had a nice scheme interface for hash tables. The code uses the Tcl hash tables, they autoresize and if I recall correctly, many of the important parameters can be modified. You might want to model the guile stuff on this, or utilize this work. I advocated this when the discussion came up years ago on the guile mailing list. BTW, there was a long discussion of hash tables on the guile list some number of years ago. At the time I didn't like liked the guile (aka scm) hash tables. They were a very minimal implementation. The user had to do too much to use them. I don't know what it's like these days, but it'd certainly be nice to have a good implementation available which includes autosizing. -- Harvey Stein Bloomberg LP hjstein@bloomberg.com _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-13 18:30 ` Harvey J. Stein @ 2003-02-13 20:02 ` Paul Jarc 0 siblings, 0 replies; 15+ messages in thread From: Paul Jarc @ 2003-02-13 20:02 UTC (permalink / raw) HJSTEIN@bloomberg.net (Harvey J. Stein) wrote: > Joris van der Hoeven <TeXmacs@math.u-psud.fr> writes: >> inserting requires looking up too. > > Yes, if you want to return an error for inserting 2 items with the > same key. Or if you want to replace existing entries, as Guile's current hash tables do. >> But you need a compairison operation for that, >> which may be even less natural than a hash function. > > Yes. For hash tables you just need a key equality test. For trees > you need to be able to order the keys. It's easy enough to support this: (< (hash key1 0) (hash key2 0)) But this is probably only useful in the general case. It should be possible to substitute a less expensive ordering function when one is available. paul _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-12 20:17 ` Roland Orre 2003-02-13 9:35 ` Mikael Djurfeldt @ 2003-02-13 9:52 ` Joris van der Hoeven 1 sibling, 0 replies; 15+ messages in thread From: Joris van der Hoeven @ 2003-02-13 9:52 UTC (permalink / raw) Cc: Greg Troxel > > I've been thinking that maybe we should continue the move and let the > > resizing tables entirely replace the fixed size ones. It seems a > > little silly to have to explain to Guile users that there are two > > (sorry, eight) different kinds of hash tables... Also, I think the > > opacity of the resizing table objects is an advantage rather than a > > disadvantage. If they are opaque, we can any time modify the > > underlying implementation (the well-known data abstraction argument). > > > > What do you say? > > I think that this is a a good idea, but I would like the optional size > argument to still be there as an initial guess about the number of > items to expect to avoid resizing and to optimize performance when > we have advance information about how many items to expect. In fact, it is even better to specify the resizing behaviour using optional arguments. For instance: up-ratio : size up when size/slots >= up-ratio up-factor : new nr slots := old nr slots * up-factor down-ratio : size up when size/slots < down-ratio down-factor: new nr slots := old nr slots / down-factor Using a small up-ratio will improve the constant factor in the lookup speed, but require the usage of a larger number of slots. > Of course this additional level of abstraction is nice as it also gives > a future option to use trees if the user e.g gives an optional > comparision operator. Yes. > By the way, Joris van der Hoeven mentioned that this type of resizing > hash tables are faster than trees, which have a time complexity > of O(log n)+reshuffling time if they are balanced. Do you have any > numbers showing how much faster and if possible if there are any > conditions? The reshuffling time will grow at least O(n) when the > size of our linear tables increases if I have understood right. Well: lookup time in a balanced search tree is O(log n), while lookup time in a table is O(1)... _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-11 13:59 ` Resizing hash tables in Guile Mikael Djurfeldt 2003-02-11 17:34 ` Roland Orre 2003-02-12 16:10 ` Marius Vollmer @ 2003-02-12 20:55 ` Rob Browning 2003-02-13 10:43 ` Mikael Djurfeldt 2 siblings, 1 reply; 15+ messages in thread From: Rob Browning @ 2003-02-12 20:55 UTC (permalink / raw) Cc: Greg Troxel Mikael Djurfeldt <djurfeldt@nada.kth.se> writes: > I've just committed resizing hash table functionality to the Guile > core hashtable functions (in hashtab.c). > > Currently, the only thing which has changed in the API is that the > size argument to the hash table constructors is now optional. If > omitted, you get a resizing table. Looks good to me. If you have time, could you (very) briefly summarize the behavior? Mostly I mean wrt how the implementation handles resizes (i.e. when does it decide to resize, how does it pick the new size, etc.). That'd be nice to have in comments in the file so that you can get an overview without having to read the code in detail first. Thanks -- Rob Browning rlb @defaultvalue.org, @linuxdevel.com, and @debian.org Previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Resizing hash tables in Guile 2003-02-12 20:55 ` Rob Browning @ 2003-02-13 10:43 ` Mikael Djurfeldt 0 siblings, 0 replies; 15+ messages in thread From: Mikael Djurfeldt @ 2003-02-13 10:43 UTC (permalink / raw) Cc: Greg Troxel Rob Browning <rlb@defaultvalue.org> writes: > If you have time, could you (very) briefly summarize the behavior? > Mostly I mean wrt how the implementation handles resizes (i.e. when > does it decide to resize, how does it pick the new size, etc.). Good idea. Done. M _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2003-02-13 20:02 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <Pine.GSO.3.96.1030208160743.22945E-100000@anh> [not found] ` <87lm0o7951.fsf@alice.rotty.yi.org> [not found] ` <rmiu1fcnrhj.fsf@fnord.ir.bbn.com> [not found] ` <1044889242.1033.310.camel@localhost> [not found] ` <xy77kc8krhr.fsf@nada.kth.se> [not found] ` <xy74r7ckqmy.fsf@nada.kth.se> 2003-02-11 13:59 ` Resizing hash tables in Guile Mikael Djurfeldt 2003-02-11 17:34 ` Roland Orre [not found] ` <ljy94lhgkb.fsf@burns.dt.e-technik.uni-dortmund.de> 2003-02-12 17:47 ` Mikael Djurfeldt 2003-02-12 20:44 ` Rob Browning 2003-02-12 16:10 ` Marius Vollmer 2003-02-12 17:53 ` Mikael Djurfeldt 2003-02-12 20:17 ` Roland Orre 2003-02-13 9:35 ` Mikael Djurfeldt 2003-02-13 13:55 ` Harvey J. Stein 2003-02-13 14:24 ` Joris van der Hoeven 2003-02-13 18:30 ` Harvey J. Stein 2003-02-13 20:02 ` Paul Jarc 2003-02-13 9:52 ` Joris van der Hoeven 2003-02-12 20:55 ` Rob Browning 2003-02-13 10:43 ` Mikael Djurfeldt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).