unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: Aleksandar Sandic <asflin@gmx.de>
Cc: guile-user@gnu.org
Subject: Re: Comparing two hash tables for equality?
Date: Sun, 26 Aug 2018 18:49:00 -0400	[thread overview]
Message-ID: <87o9dovj4z.fsf@netris.org> (raw)
In-Reply-To: <2418985.EcWt8OQBW1@aleksandar-ixtreme-m5740> (Aleksandar Sandic's message of "Sun, 26 Aug 2018 12:01:17 +0200")

Hi Aleksandar,

Aleksandar Sandic <asflin@gmx.de> writes:

> I have been looking through the reference manual, but there does not seem to 
> be a procedure for comparing two hash tables for equality. The procedure 
> 'equal?' only returns true if the two tables are also 'eq?':
>
>     scheme@(guile-user)> (define h1 (make-hash-table))
>     scheme@(guile-user)> (define h2 (make-hash-table))
>     scheme@(guile-user)> (equal? h1 h2)
>     $1 = #f
>     scheme@(guile-user)> (equal? h1 h1)
>     $2 = #t
>
> Is this the intended behaviour?

Yes.  That might seem surprising.  I have a few things to say on that,
in no particular order:

An equality test on hash tables needs to know how to compare the keys
and how to compare the values.  There's no way to pass those additional
arguments to 'equal?', so it can't do that job.

Hash table implementations typically don't offer an equality test on the
hash tables themselves, and I don't recall anyone ever asking for such
an operation before now.  I guess that's because in the use case where
hash tables are beneficial -- when the tables are very large --
comparing them for equality is expensive.

While hash tables are indispensible for certain use cases, they also
have very significant downsides, and I tend to avoid them whenever
possible.  Their most severe downside, in my opinion, is that they are
fundamentally incompatible with a functional programming style.  They
invariably force all code that uses them to be written in an imperative
style.  I cannot stress enough how important this is.

It's also not possible to efficiently create a new hash table based on
an existing one, but with one or more additional entries.  In Scheme,
you can efficiently 'cons' or 'append' some entries onto the front of an
existing list without modifying the original list.  That includes
association lists.  To do the same with a hash table, you must make a
copy of the entire hash table and then add the new entries to the copy.

There can be no sharing of storage between multiple hash tables, as can
be done with lists, association lists, or balanced trees.  Even if you
don't need sharing, hash tables also use more space than those other
data structures.

So, I would ask yourself whether the benefits of hash tables truly
outweigh the downsides in your particular use case.

If you're comfortable sharing details, I'd be curious to hear what
you're trying to accomplish here, at a higher level.  Maybe there's a
better way.

Anyway, if you really need to compare hash tables for equality, here's
some suggested code that's simpler than the code you wrote, and should
be quite a bit more efficient as well.  In particular, this code
performs no heap allocation:

--8<---------------cut here---------------start------------->8---
(use-modules (ice-9 match))

(define (hash-table-subset? val=? h1 h2)
  "H1 and H2 must be hash tables which use equal? as the equality test
for keys.  Return #t if and only if for every entry (K . V1) in H1,
there exists an entry (K . V2) in H2 such that (VAL=? V1 V2),
otherwise return #f."
  (hash-fold (lambda (k v1 equal-so-far)
               (and equal-so-far
                    (match (hash-get-handle h2 k)
                      ((_ . v2) (val=? v1 v2))
                      (#f       #f))))
             #t
             h1))

(define (hash-table=? val=? h1 h2)
  "H1 and H2 must be hash tables which use equal? as the equality test
for keys.  Return #t if H1 and H2 have the same keys (in the sense of
equal?) and each key has the same value (in the sense of val=?),
otherwise return #f."
  (and (hash-table-subset? val=? h1 h2)
       (hash-table-subset? val=? h2 h1)))

(define (equal*? a b)
  (or (equal? a b)
      (and (hash-table? a)
           (hash-table? b)
           (hash-table=? equal*? a b))))
--8<---------------cut here---------------end--------------->8---

You might not need 'equal*?' here.  I included it only to match more
closely the behavior of your code.

For most purposes, 'hash-table=?' is actually more flexible than
'equal*?', as long as the "type" of values stored in a given hash table
are known.  If the values stored in the hash table are numbers, then
'val=?' should be '='.  If the values are themselves hash tables, then
'val=?' should be (lambda (a b) (hash-table=? foo=? a b)) for some value
of 'foo=?'.  In some other cases, 'val=?' should be 'equal?'.

If you truly need hash tables whose values could contain either hash
tables or some other type, then something like 'equal*?' might be the
right tool.  However, keep in mind that 'equal?' is usually not the
desired equality test for numbers, because (equal? 1 1.0) => #false and
(equal? +0.0 -0.0) => #false.  'equal?' tests for _operational_
equivalence, which makes distinctions important for memoization and some
other purposes, whereas '=' tests for mathematical equality and is
usually what you want for numbers.

     Regards,
       Mark



  reply	other threads:[~2018-08-26 22:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-26 10:01 Comparing two hash tables for equality? Aleksandar Sandic
2018-08-26 22:49 ` Mark H Weaver [this message]
2018-08-26 23:37   ` John Cowan
2018-08-27 22:14     ` Aleksandar Sandic
2018-08-29  4:46       ` Mark H Weaver
2018-08-27 20:25   ` Arne Babenhauserheide
2018-08-27 21:53   ` Aleksandar Sandic
2018-08-28  7:37     ` Mark H Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o9dovj4z.fsf@netris.org \
    --to=mhw@netris.org \
    --cc=asflin@gmx.de \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).