unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Comparing two hash tables for equality?
@ 2018-08-26 10:01 Aleksandar Sandic
  2018-08-26 22:49 ` Mark H Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Aleksandar Sandic @ 2018-08-26 10:01 UTC (permalink / raw)
  To: guile-user

Hello everyone,

I have been looking through the reference manual, but there does not seem to 
be a procedure for comparing two hash tables for equality. The procedure 
'equal?' only returns true if the two tables are also 'eq?':

    scheme@(guile-user)> (define h1 (make-hash-table))
    scheme@(guile-user)> (define h2 (make-hash-table))
    scheme@(guile-user)> (equal? h1 h2)
    $1 = #f
    scheme@(guile-user)> (equal? h1 h1)
    $2 = #t

Is this the intended behaviour? If so, how can I compare two hash tables for 
equality then? I have rolled my own function for the time being:

    (define (hash-table-equal? h1 h2)
      "- Scheme procedure: hash-table-equal? h1 h2
         Compare two hash tables for entry-wise equality.
         
         The arguments 'h1' and 'h2' are not equal if and only if either:
           - One of them is not a hash table
           - They do not have the same number of entries
           - Values for the same key are not 'equal?' or 'hash-equal?'
    
         Two hash tables are always 'hash-equal?' if they are 'eq?'."
      ;; First take care of the trivial cases. Then compare the length of the
      ;; tables; if the lengths don't match the tables not equal. Finally, 
loop
      ;; over the keys of one table and compare the values with those in the 
other
      ;; table.
      ;;
      ;; This algorithm traverses the first tables thrice and the second table
      ;; twice in the worst case: Once each to count the keys, once the first 
one
      ;; to collect the keys, and once both to compare all values. The loop 
will
      ;; terminate the instant a mismatch is found.
      ;;
      ;; If the first value is a hash table we call this function recursively
      ;; because hash tables are never 'equal?' (unless they are 'eq?')
      (cond
        ((not (hash-table? h1)) #f)
        ((not (hash-table? h2)) #f)
        ((eq? h1 h2) #t)
        (else
          (let ((n-keys1 (hash-fold (λ (k v keys) (1+ keys)) 0 h1))
                (n-keys2 (hash-fold (λ (k v keys) (1+ keys)) 0 h2)))
            (if (not (= n-keys1 n-keys2))
              #f
              (do ((keys (hash-fold (λ (k v keys) (cons k keys)) '() h1) (cdr 
keys))
                   (same? #t))
                  ((or (not same?) (null? keys))
                   same?)
                (let* ((key (car keys))
                       (v1 (hash-ref h1 key))
                       (v2 (hash-get-handle h2 key)))
                  (cond
                    ((not v2)                   (set! same? #f))
                    ((hash-table? v1)           (set! same? (hash-table-equal? 
v1 v2)))
                    ((not (equal? v1 (cdr v2))) (set! same? #f))))))))))






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Comparing two hash tables for equality?
  2018-08-26 10:01 Comparing two hash tables for equality? Aleksandar Sandic
@ 2018-08-26 22:49 ` Mark H Weaver
  2018-08-26 23:37   ` John Cowan
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Mark H Weaver @ 2018-08-26 22:49 UTC (permalink / raw)
  To: Aleksandar Sandic; +Cc: guile-user

Hi Aleksandar,

Aleksandar Sandic <asflin@gmx.de> writes:

> I have been looking through the reference manual, but there does not seem to 
> be a procedure for comparing two hash tables for equality. The procedure 
> 'equal?' only returns true if the two tables are also 'eq?':
>
>     scheme@(guile-user)> (define h1 (make-hash-table))
>     scheme@(guile-user)> (define h2 (make-hash-table))
>     scheme@(guile-user)> (equal? h1 h2)
>     $1 = #f
>     scheme@(guile-user)> (equal? h1 h1)
>     $2 = #t
>
> Is this the intended behaviour?

Yes.  That might seem surprising.  I have a few things to say on that,
in no particular order:

An equality test on hash tables needs to know how to compare the keys
and how to compare the values.  There's no way to pass those additional
arguments to 'equal?', so it can't do that job.

Hash table implementations typically don't offer an equality test on the
hash tables themselves, and I don't recall anyone ever asking for such
an operation before now.  I guess that's because in the use case where
hash tables are beneficial -- when the tables are very large --
comparing them for equality is expensive.

While hash tables are indispensible for certain use cases, they also
have very significant downsides, and I tend to avoid them whenever
possible.  Their most severe downside, in my opinion, is that they are
fundamentally incompatible with a functional programming style.  They
invariably force all code that uses them to be written in an imperative
style.  I cannot stress enough how important this is.

It's also not possible to efficiently create a new hash table based on
an existing one, but with one or more additional entries.  In Scheme,
you can efficiently 'cons' or 'append' some entries onto the front of an
existing list without modifying the original list.  That includes
association lists.  To do the same with a hash table, you must make a
copy of the entire hash table and then add the new entries to the copy.

There can be no sharing of storage between multiple hash tables, as can
be done with lists, association lists, or balanced trees.  Even if you
don't need sharing, hash tables also use more space than those other
data structures.

So, I would ask yourself whether the benefits of hash tables truly
outweigh the downsides in your particular use case.

If you're comfortable sharing details, I'd be curious to hear what
you're trying to accomplish here, at a higher level.  Maybe there's a
better way.

Anyway, if you really need to compare hash tables for equality, here's
some suggested code that's simpler than the code you wrote, and should
be quite a bit more efficient as well.  In particular, this code
performs no heap allocation:

--8<---------------cut here---------------start------------->8---
(use-modules (ice-9 match))

(define (hash-table-subset? val=? h1 h2)
  "H1 and H2 must be hash tables which use equal? as the equality test
for keys.  Return #t if and only if for every entry (K . V1) in H1,
there exists an entry (K . V2) in H2 such that (VAL=? V1 V2),
otherwise return #f."
  (hash-fold (lambda (k v1 equal-so-far)
               (and equal-so-far
                    (match (hash-get-handle h2 k)
                      ((_ . v2) (val=? v1 v2))
                      (#f       #f))))
             #t
             h1))

(define (hash-table=? val=? h1 h2)
  "H1 and H2 must be hash tables which use equal? as the equality test
for keys.  Return #t if H1 and H2 have the same keys (in the sense of
equal?) and each key has the same value (in the sense of val=?),
otherwise return #f."
  (and (hash-table-subset? val=? h1 h2)
       (hash-table-subset? val=? h2 h1)))

(define (equal*? a b)
  (or (equal? a b)
      (and (hash-table? a)
           (hash-table? b)
           (hash-table=? equal*? a b))))
--8<---------------cut here---------------end--------------->8---

You might not need 'equal*?' here.  I included it only to match more
closely the behavior of your code.

For most purposes, 'hash-table=?' is actually more flexible than
'equal*?', as long as the "type" of values stored in a given hash table
are known.  If the values stored in the hash table are numbers, then
'val=?' should be '='.  If the values are themselves hash tables, then
'val=?' should be (lambda (a b) (hash-table=? foo=? a b)) for some value
of 'foo=?'.  In some other cases, 'val=?' should be 'equal?'.

If you truly need hash tables whose values could contain either hash
tables or some other type, then something like 'equal*?' might be the
right tool.  However, keep in mind that 'equal?' is usually not the
desired equality test for numbers, because (equal? 1 1.0) => #false and
(equal? +0.0 -0.0) => #false.  'equal?' tests for _operational_
equivalence, which makes distinctions important for memoization and some
other purposes, whereas '=' tests for mathematical equality and is
usually what you want for numbers.

     Regards,
       Mark



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Comparing two hash tables for equality?
  2018-08-26 22:49 ` Mark H Weaver
@ 2018-08-26 23:37   ` John Cowan
  2018-08-27 22:14     ` Aleksandar Sandic
  2018-08-27 20:25   ` Arne Babenhauserheide
  2018-08-27 21:53   ` Aleksandar Sandic
  2 siblings, 1 reply; 8+ messages in thread
From: John Cowan @ 2018-08-26 23:37 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user, asflin

On Sun, Aug 26, 2018 at 6:51 PM Mark H Weaver <mhw@netris.org> wrote:


> An equality test on hash tables needs to know how to compare the keys
> and how to compare the values.  There's no way to pass those additional
> arguments to 'equal?', so it can't do that job.
>

Correct.  However, it's possible given either SRFI 69 based or R6RS based
hash tables to retrieve the key equality predicate.  Therefore, a minimal
hash-table equality function requires three arguments, the hash tables
to be compared (which must have the same key equality predicate) and
a value equality predicate.


> Hash table implementations typically don't offer an equality test on the
> hash tables themselves,


SRFI 125, which is part of R7RS-large, does provide such a procedure
exactly as described above, except that instead of a predicate you pass
a comparator (a record containing a type test, an equality predicate,
an optional ordering predicate, and a hash function), a data structure
pervasive in R7RS-large and described in SRFI 128.


> and I don't recall anyone ever asking for such
> an operation before now.  I guess that's because in the use case where
> hash tables are beneficial -- when the tables are very large --
> comparing them for equality is expensive.
>

It's O(n), no worse than comparing lists, vectors, or strings for equality.


> While hash tables are indispensible for certain use cases, they also
> have very significant downsides, and I tend to avoid them whenever
> possible.  Their most severe downside, in my opinion, is that they are
> fundamentally incompatible with a functional programming style.


That's true for standard hash tables, such as are provided by Perl,
Python, MRI Ruby, Common Lisp, R6RS, etc.  But it is not true of
hash array mapped tries (HAMTs), which are functional hash tables
<https://en.wikipedia.org/wiki/Hash_array_mapped_trie>.
They are standard in Haskell, Scala, Erlang, and Rubinius.
Although SRFI 146 does not require the use of HAMTs to implement
its hashmaps, the sample implementation does in fact provide them.


> It's also not possible to efficiently create a new hash table based on
> an existing one, but with one or more additional entries.
>

Again, this is not true of HAMTs.


> There can be no sharing of storage between multiple hash tables, as can
> be done with lists, association lists, or balanced trees.


But not with vectors or strings.  (SRFI 135 texts, which are immutable,
can and do share.)  In addition, lists can only safely share if you restrict
yourself by never mutating them.

-- 
John Cowan          http://vrici.lojban.org/~cowan        cowan@ccil.org
It's like if you meet an really old, really rich guy covered in liver
spots and breathing with an oxygen tank, and you say, "I want to be
rich, too, so I'm going to start walking with a cane and I'm going to
act crotchety and I'm going to get liver disease. --Wil Shipley


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Comparing two hash tables for equality?
  2018-08-26 22:49 ` Mark H Weaver
  2018-08-26 23:37   ` John Cowan
@ 2018-08-27 20:25   ` Arne Babenhauserheide
  2018-08-27 21:53   ` Aleksandar Sandic
  2 siblings, 0 replies; 8+ messages in thread
From: Arne Babenhauserheide @ 2018-08-27 20:25 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user, Aleksandar Sandic

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]


Mark H Weaver <mhw@netris.org> writes:

> Hi Aleksandar,
> Hash table implementations typically don't offer an equality test on the
> hash tables themselves, and I don't recall anyone ever asking for such
> an operation before now.

I already wrote diff-algorithms for Python dictionaries, which is close
to implementing (equal? h H).

The main advantage of hash-tables over other dictionary types is that
they are very, very fast. I tried finding anything faster, but I did not.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1076 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Comparing two hash tables for equality?
  2018-08-26 22:49 ` Mark H Weaver
  2018-08-26 23:37   ` John Cowan
  2018-08-27 20:25   ` Arne Babenhauserheide
@ 2018-08-27 21:53   ` Aleksandar Sandic
  2018-08-28  7:37     ` Mark H Weaver
  2 siblings, 1 reply; 8+ messages in thread
From: Aleksandar Sandic @ 2018-08-27 21:53 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

> An equality test on hash tables needs to know how to compare the keys
> and how to compare the values.  There's no way to pass those additional
> arguments to 'equal?', so it can't do that job.
It has to compare the values, but not the keys. If you take a look at my code, 
what it does is first compare the size of both tables; if it differs then the 
tables are not equal. If the number of entries is the same we can loop over 
the keys of one table and compare the corresponding values. If a key from 
table A is not present in table B, then the tables differ. Finally, if two 
values for a key are not 'equal?', then the tables differ as well.

> Hash table implementations typically don't offer an equality test on the
> hash tables themselves, and I don't recall anyone ever asking for such
> an operation before now.  I guess that's because in the use case where
> hash tables are beneficial -- when the tables are very large --
> comparing them for equality is expensive.
I was mostly interested in equality for writing unit tests, so in my case an 
equality test that's good enough for the job is adequate. It is not intended 
to be used in production code. I was just wondering if there might be a 
standard way instead of rolling my own.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Comparing two hash tables for equality?
  2018-08-26 23:37   ` John Cowan
@ 2018-08-27 22:14     ` Aleksandar Sandic
  2018-08-29  4:46       ` Mark H Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Aleksandar Sandic @ 2018-08-27 22:14 UTC (permalink / raw)
  To: John Cowan; +Cc: guile-user

I did not know about HAMTs, looks interesting. It's not relevant to my use-
case, but it's an interesting topic to look into. Thanks.


On Monday, 27th August 2018 01:37:54 CEST you wrote:
> ...





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Comparing two hash tables for equality?
  2018-08-27 21:53   ` Aleksandar Sandic
@ 2018-08-28  7:37     ` Mark H Weaver
  0 siblings, 0 replies; 8+ messages in thread
From: Mark H Weaver @ 2018-08-28  7:37 UTC (permalink / raw)
  To: Aleksandar Sandic; +Cc: guile-user

Hi Aleksandar,

Aleksandar Sandic <asflin@gmx.de> writes:

>> An equality test on hash tables needs to know how to compare the keys
>> and how to compare the values.  There's no way to pass those additional
>> arguments to 'equal?', so it can't do that job.
> It has to compare the values, but not the keys.

There's an implicit equality test on keys every time you perform a hash
table lookup.  By using 'hash-ref' and 'hash-get-handle', you are
implicitly using 'equal?' to compare the keys in the hash table with the
key that you're asking to look up.

Guile's native hash tables are unusual in that they do not internally
keep track of which equality test on keys to use.  Instead, it is your
responsibility to consistently use the functions corresponding to same
equality test in all accesses to a given hash table.  As the Guile
manual states:

  Like the association list functions, the hash table functions come in
  several varieties, according to the equality test used for the keys.
  Plain ‘hash-’ functions use ‘equal?’, ‘hashq-’ functions use ‘eq?’,
  ‘hashv-’ functions use ‘eqv?’, and the ‘hashx-’ functions use an
  application supplied test.

     A single ‘make-hash-table’ creates a hash table suitable for use
  with any set of functions, but it’s imperative that just one set is
  then used consistently, or results will be unpredictable.

  <https://www.gnu.org/software/guile/manual/html_node/Hash-Table-Reference.html>

So, your 'hash-table-equal?' predicate implicitly assumes that the hash
tables passed to it were populated using 'hash-set!' or
'hash-create-handle!'.  If it is applied to a hash table that was
populated with the 'hashq-*!', 'hashv-*!', or 'hashx-*!' procedures,
then the results will be unpredictable.

My 'hash-table=?' predicate makes the same implicit assumption.

      Mark



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Comparing two hash tables for equality?
  2018-08-27 22:14     ` Aleksandar Sandic
@ 2018-08-29  4:46       ` Mark H Weaver
  0 siblings, 0 replies; 8+ messages in thread
From: Mark H Weaver @ 2018-08-29  4:46 UTC (permalink / raw)
  To: Aleksandar Sandic; +Cc: guile-user

Aleksandar Sandic <asflin@gmx.de> writes:
> I did not know about HAMTs, looks interesting. It's not relevant to my use-
> case, but it's an interesting topic to look into. Thanks.

There's an implementation of HAMTs that works with Guile in the
following "Purely Functional Data Structures in Scheme" library:

  https://github.com/ijp/pfds

I'm not sure off-hand how well optimized they are.

      Mark



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-08-29  4:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-26 10:01 Comparing two hash tables for equality? Aleksandar Sandic
2018-08-26 22:49 ` Mark H Weaver
2018-08-26 23:37   ` John Cowan
2018-08-27 22:14     ` Aleksandar Sandic
2018-08-29  4:46       ` Mark H Weaver
2018-08-27 20:25   ` Arne Babenhauserheide
2018-08-27 21:53   ` Aleksandar Sandic
2018-08-28  7:37     ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).