* Comparing two hash tables for equality? @ 2018-08-26 10:01 Aleksandar Sandic 2018-08-26 22:49 ` Mark H Weaver 0 siblings, 1 reply; 8+ messages in thread From: Aleksandar Sandic @ 2018-08-26 10:01 UTC (permalink / raw) To: guile-user Hello everyone, I have been looking through the reference manual, but there does not seem to be a procedure for comparing two hash tables for equality. The procedure 'equal?' only returns true if the two tables are also 'eq?': scheme@(guile-user)> (define h1 (make-hash-table)) scheme@(guile-user)> (define h2 (make-hash-table)) scheme@(guile-user)> (equal? h1 h2) $1 = #f scheme@(guile-user)> (equal? h1 h1) $2 = #t Is this the intended behaviour? If so, how can I compare two hash tables for equality then? I have rolled my own function for the time being: (define (hash-table-equal? h1 h2) "- Scheme procedure: hash-table-equal? h1 h2 Compare two hash tables for entry-wise equality. The arguments 'h1' and 'h2' are not equal if and only if either: - One of them is not a hash table - They do not have the same number of entries - Values for the same key are not 'equal?' or 'hash-equal?' Two hash tables are always 'hash-equal?' if they are 'eq?'." ;; First take care of the trivial cases. Then compare the length of the ;; tables; if the lengths don't match the tables not equal. Finally, loop ;; over the keys of one table and compare the values with those in the other ;; table. ;; ;; This algorithm traverses the first tables thrice and the second table ;; twice in the worst case: Once each to count the keys, once the first one ;; to collect the keys, and once both to compare all values. The loop will ;; terminate the instant a mismatch is found. ;; ;; If the first value is a hash table we call this function recursively ;; because hash tables are never 'equal?' (unless they are 'eq?') (cond ((not (hash-table? h1)) #f) ((not (hash-table? h2)) #f) ((eq? h1 h2) #t) (else (let ((n-keys1 (hash-fold (λ (k v keys) (1+ keys)) 0 h1)) (n-keys2 (hash-fold (λ (k v keys) (1+ keys)) 0 h2))) (if (not (= n-keys1 n-keys2)) #f (do ((keys (hash-fold (λ (k v keys) (cons k keys)) '() h1) (cdr keys)) (same? #t)) ((or (not same?) (null? keys)) same?) (let* ((key (car keys)) (v1 (hash-ref h1 key)) (v2 (hash-get-handle h2 key))) (cond ((not v2) (set! same? #f)) ((hash-table? v1) (set! same? (hash-table-equal? v1 v2))) ((not (equal? v1 (cdr v2))) (set! same? #f)))))))))) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Comparing two hash tables for equality? 2018-08-26 10:01 Comparing two hash tables for equality? Aleksandar Sandic @ 2018-08-26 22:49 ` Mark H Weaver 2018-08-26 23:37 ` John Cowan ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Mark H Weaver @ 2018-08-26 22:49 UTC (permalink / raw) To: Aleksandar Sandic; +Cc: guile-user Hi Aleksandar, Aleksandar Sandic <asflin@gmx.de> writes: > I have been looking through the reference manual, but there does not seem to > be a procedure for comparing two hash tables for equality. The procedure > 'equal?' only returns true if the two tables are also 'eq?': > > scheme@(guile-user)> (define h1 (make-hash-table)) > scheme@(guile-user)> (define h2 (make-hash-table)) > scheme@(guile-user)> (equal? h1 h2) > $1 = #f > scheme@(guile-user)> (equal? h1 h1) > $2 = #t > > Is this the intended behaviour? Yes. That might seem surprising. I have a few things to say on that, in no particular order: An equality test on hash tables needs to know how to compare the keys and how to compare the values. There's no way to pass those additional arguments to 'equal?', so it can't do that job. Hash table implementations typically don't offer an equality test on the hash tables themselves, and I don't recall anyone ever asking for such an operation before now. I guess that's because in the use case where hash tables are beneficial -- when the tables are very large -- comparing them for equality is expensive. While hash tables are indispensible for certain use cases, they also have very significant downsides, and I tend to avoid them whenever possible. Their most severe downside, in my opinion, is that they are fundamentally incompatible with a functional programming style. They invariably force all code that uses them to be written in an imperative style. I cannot stress enough how important this is. It's also not possible to efficiently create a new hash table based on an existing one, but with one or more additional entries. In Scheme, you can efficiently 'cons' or 'append' some entries onto the front of an existing list without modifying the original list. That includes association lists. To do the same with a hash table, you must make a copy of the entire hash table and then add the new entries to the copy. There can be no sharing of storage between multiple hash tables, as can be done with lists, association lists, or balanced trees. Even if you don't need sharing, hash tables also use more space than those other data structures. So, I would ask yourself whether the benefits of hash tables truly outweigh the downsides in your particular use case. If you're comfortable sharing details, I'd be curious to hear what you're trying to accomplish here, at a higher level. Maybe there's a better way. Anyway, if you really need to compare hash tables for equality, here's some suggested code that's simpler than the code you wrote, and should be quite a bit more efficient as well. In particular, this code performs no heap allocation: --8<---------------cut here---------------start------------->8--- (use-modules (ice-9 match)) (define (hash-table-subset? val=? h1 h2) "H1 and H2 must be hash tables which use equal? as the equality test for keys. Return #t if and only if for every entry (K . V1) in H1, there exists an entry (K . V2) in H2 such that (VAL=? V1 V2), otherwise return #f." (hash-fold (lambda (k v1 equal-so-far) (and equal-so-far (match (hash-get-handle h2 k) ((_ . v2) (val=? v1 v2)) (#f #f)))) #t h1)) (define (hash-table=? val=? h1 h2) "H1 and H2 must be hash tables which use equal? as the equality test for keys. Return #t if H1 and H2 have the same keys (in the sense of equal?) and each key has the same value (in the sense of val=?), otherwise return #f." (and (hash-table-subset? val=? h1 h2) (hash-table-subset? val=? h2 h1))) (define (equal*? a b) (or (equal? a b) (and (hash-table? a) (hash-table? b) (hash-table=? equal*? a b)))) --8<---------------cut here---------------end--------------->8--- You might not need 'equal*?' here. I included it only to match more closely the behavior of your code. For most purposes, 'hash-table=?' is actually more flexible than 'equal*?', as long as the "type" of values stored in a given hash table are known. If the values stored in the hash table are numbers, then 'val=?' should be '='. If the values are themselves hash tables, then 'val=?' should be (lambda (a b) (hash-table=? foo=? a b)) for some value of 'foo=?'. In some other cases, 'val=?' should be 'equal?'. If you truly need hash tables whose values could contain either hash tables or some other type, then something like 'equal*?' might be the right tool. However, keep in mind that 'equal?' is usually not the desired equality test for numbers, because (equal? 1 1.0) => #false and (equal? +0.0 -0.0) => #false. 'equal?' tests for _operational_ equivalence, which makes distinctions important for memoization and some other purposes, whereas '=' tests for mathematical equality and is usually what you want for numbers. Regards, Mark ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Comparing two hash tables for equality? 2018-08-26 22:49 ` Mark H Weaver @ 2018-08-26 23:37 ` John Cowan 2018-08-27 22:14 ` Aleksandar Sandic 2018-08-27 20:25 ` Arne Babenhauserheide 2018-08-27 21:53 ` Aleksandar Sandic 2 siblings, 1 reply; 8+ messages in thread From: John Cowan @ 2018-08-26 23:37 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-user, asflin On Sun, Aug 26, 2018 at 6:51 PM Mark H Weaver <mhw@netris.org> wrote: > An equality test on hash tables needs to know how to compare the keys > and how to compare the values. There's no way to pass those additional > arguments to 'equal?', so it can't do that job. > Correct. However, it's possible given either SRFI 69 based or R6RS based hash tables to retrieve the key equality predicate. Therefore, a minimal hash-table equality function requires three arguments, the hash tables to be compared (which must have the same key equality predicate) and a value equality predicate. > Hash table implementations typically don't offer an equality test on the > hash tables themselves, SRFI 125, which is part of R7RS-large, does provide such a procedure exactly as described above, except that instead of a predicate you pass a comparator (a record containing a type test, an equality predicate, an optional ordering predicate, and a hash function), a data structure pervasive in R7RS-large and described in SRFI 128. > and I don't recall anyone ever asking for such > an operation before now. I guess that's because in the use case where > hash tables are beneficial -- when the tables are very large -- > comparing them for equality is expensive. > It's O(n), no worse than comparing lists, vectors, or strings for equality. > While hash tables are indispensible for certain use cases, they also > have very significant downsides, and I tend to avoid them whenever > possible. Their most severe downside, in my opinion, is that they are > fundamentally incompatible with a functional programming style. That's true for standard hash tables, such as are provided by Perl, Python, MRI Ruby, Common Lisp, R6RS, etc. But it is not true of hash array mapped tries (HAMTs), which are functional hash tables <https://en.wikipedia.org/wiki/Hash_array_mapped_trie>. They are standard in Haskell, Scala, Erlang, and Rubinius. Although SRFI 146 does not require the use of HAMTs to implement its hashmaps, the sample implementation does in fact provide them. > It's also not possible to efficiently create a new hash table based on > an existing one, but with one or more additional entries. > Again, this is not true of HAMTs. > There can be no sharing of storage between multiple hash tables, as can > be done with lists, association lists, or balanced trees. But not with vectors or strings. (SRFI 135 texts, which are immutable, can and do share.) In addition, lists can only safely share if you restrict yourself by never mutating them. -- John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org It's like if you meet an really old, really rich guy covered in liver spots and breathing with an oxygen tank, and you say, "I want to be rich, too, so I'm going to start walking with a cane and I'm going to act crotchety and I'm going to get liver disease. --Wil Shipley ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Comparing two hash tables for equality? 2018-08-26 23:37 ` John Cowan @ 2018-08-27 22:14 ` Aleksandar Sandic 2018-08-29 4:46 ` Mark H Weaver 0 siblings, 1 reply; 8+ messages in thread From: Aleksandar Sandic @ 2018-08-27 22:14 UTC (permalink / raw) To: John Cowan; +Cc: guile-user I did not know about HAMTs, looks interesting. It's not relevant to my use- case, but it's an interesting topic to look into. Thanks. On Monday, 27th August 2018 01:37:54 CEST you wrote: > ... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Comparing two hash tables for equality? 2018-08-27 22:14 ` Aleksandar Sandic @ 2018-08-29 4:46 ` Mark H Weaver 0 siblings, 0 replies; 8+ messages in thread From: Mark H Weaver @ 2018-08-29 4:46 UTC (permalink / raw) To: Aleksandar Sandic; +Cc: guile-user Aleksandar Sandic <asflin@gmx.de> writes: > I did not know about HAMTs, looks interesting. It's not relevant to my use- > case, but it's an interesting topic to look into. Thanks. There's an implementation of HAMTs that works with Guile in the following "Purely Functional Data Structures in Scheme" library: https://github.com/ijp/pfds I'm not sure off-hand how well optimized they are. Mark ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Comparing two hash tables for equality? 2018-08-26 22:49 ` Mark H Weaver 2018-08-26 23:37 ` John Cowan @ 2018-08-27 20:25 ` Arne Babenhauserheide 2018-08-27 21:53 ` Aleksandar Sandic 2 siblings, 0 replies; 8+ messages in thread From: Arne Babenhauserheide @ 2018-08-27 20:25 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-user, Aleksandar Sandic [-- Attachment #1: Type: text/plain, Size: 581 bytes --] Mark H Weaver <mhw@netris.org> writes: > Hi Aleksandar, > Hash table implementations typically don't offer an equality test on the > hash tables themselves, and I don't recall anyone ever asking for such > an operation before now. I already wrote diff-algorithms for Python dictionaries, which is close to implementing (equal? h H). The main advantage of hash-tables over other dictionary types is that they are very, very fast. I tried finding anything faster, but I did not. Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1076 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Comparing two hash tables for equality? 2018-08-26 22:49 ` Mark H Weaver 2018-08-26 23:37 ` John Cowan 2018-08-27 20:25 ` Arne Babenhauserheide @ 2018-08-27 21:53 ` Aleksandar Sandic 2018-08-28 7:37 ` Mark H Weaver 2 siblings, 1 reply; 8+ messages in thread From: Aleksandar Sandic @ 2018-08-27 21:53 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-user > An equality test on hash tables needs to know how to compare the keys > and how to compare the values. There's no way to pass those additional > arguments to 'equal?', so it can't do that job. It has to compare the values, but not the keys. If you take a look at my code, what it does is first compare the size of both tables; if it differs then the tables are not equal. If the number of entries is the same we can loop over the keys of one table and compare the corresponding values. If a key from table A is not present in table B, then the tables differ. Finally, if two values for a key are not 'equal?', then the tables differ as well. > Hash table implementations typically don't offer an equality test on the > hash tables themselves, and I don't recall anyone ever asking for such > an operation before now. I guess that's because in the use case where > hash tables are beneficial -- when the tables are very large -- > comparing them for equality is expensive. I was mostly interested in equality for writing unit tests, so in my case an equality test that's good enough for the job is adequate. It is not intended to be used in production code. I was just wondering if there might be a standard way instead of rolling my own. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Comparing two hash tables for equality? 2018-08-27 21:53 ` Aleksandar Sandic @ 2018-08-28 7:37 ` Mark H Weaver 0 siblings, 0 replies; 8+ messages in thread From: Mark H Weaver @ 2018-08-28 7:37 UTC (permalink / raw) To: Aleksandar Sandic; +Cc: guile-user Hi Aleksandar, Aleksandar Sandic <asflin@gmx.de> writes: >> An equality test on hash tables needs to know how to compare the keys >> and how to compare the values. There's no way to pass those additional >> arguments to 'equal?', so it can't do that job. > It has to compare the values, but not the keys. There's an implicit equality test on keys every time you perform a hash table lookup. By using 'hash-ref' and 'hash-get-handle', you are implicitly using 'equal?' to compare the keys in the hash table with the key that you're asking to look up. Guile's native hash tables are unusual in that they do not internally keep track of which equality test on keys to use. Instead, it is your responsibility to consistently use the functions corresponding to same equality test in all accesses to a given hash table. As the Guile manual states: Like the association list functions, the hash table functions come in several varieties, according to the equality test used for the keys. Plain ‘hash-’ functions use ‘equal?’, ‘hashq-’ functions use ‘eq?’, ‘hashv-’ functions use ‘eqv?’, and the ‘hashx-’ functions use an application supplied test. A single ‘make-hash-table’ creates a hash table suitable for use with any set of functions, but it’s imperative that just one set is then used consistently, or results will be unpredictable. <https://www.gnu.org/software/guile/manual/html_node/Hash-Table-Reference.html> So, your 'hash-table-equal?' predicate implicitly assumes that the hash tables passed to it were populated using 'hash-set!' or 'hash-create-handle!'. If it is applied to a hash table that was populated with the 'hashq-*!', 'hashv-*!', or 'hashx-*!' procedures, then the results will be unpredictable. My 'hash-table=?' predicate makes the same implicit assumption. Mark ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-08-29 4:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-26 10:01 Comparing two hash tables for equality? Aleksandar Sandic 2018-08-26 22:49 ` Mark H Weaver 2018-08-26 23:37 ` John Cowan 2018-08-27 22:14 ` Aleksandar Sandic 2018-08-29 4:46 ` Mark H Weaver 2018-08-27 20:25 ` Arne Babenhauserheide 2018-08-27 21:53 ` Aleksandar Sandic 2018-08-28 7:37 ` Mark H Weaver
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).