unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Hash table read syntax in guile 2.0
@ 2013-02-17  3:55 Hengqing Hu
  2013-02-17 11:59 ` Daniel Hartwig
  0 siblings, 1 reply; 16+ messages in thread
From: Hengqing Hu @ 2013-02-17  3:55 UTC (permalink / raw)
  To: guile-user

Dear list,

It seems the read syntax of a hash table has been silently changed.

Now (make-hash-table) produces
#<hash-table 172f1c0 0/31>
instead what was in 1.8
#<hash-table 0/31>

But the document is not updated yet?

I used to use the read syntax to get the length of the hash table.
It would be more convinient if a hash-length procedure could be provided.

Best Regards, Hengqing Hu



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-17  3:55 Hash table read syntax in guile 2.0 Hengqing Hu
@ 2013-02-17 11:59 ` Daniel Hartwig
  2013-02-17 13:41   ` Ludovic Courtès
  2013-02-17 14:15   ` Hengqing Hu
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Hartwig @ 2013-02-17 11:59 UTC (permalink / raw)
  To: Hengqing Hu; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 2429 bytes --]

On 17 February 2013 11:55, Hengqing Hu <hengqing.hu@gmail.com> wrote:
> Dear list,
>
> It seems the read syntax of a hash table has been silently changed.
>
> Now (make-hash-table) produces
> #<hash-table 172f1c0 0/31>
> instead what was in 1.8
> #<hash-table 0/31>
>
> But the document is not updated yet?
>

Hi

That's how hash-table values are printed, true, but it is not _read
syntax_.  Neither is it a documented format (although it does appear
in the example output of one command).  You can not rely on the format
of that string to be anything, it is just a convenience. :-)

The manual recommends to use hash-fold for counting the elements:
  (hash-fold (lambda (key value count) (1+ count)) 0 table)

You could also adapt your parser to be more flexible, looking only at
the tail of the string after the last whitespace.  Although such hacks
work, it is not advisable.


> It would be more convinient if a hash-length procedure could be provided.

Right.

The two modules (rnrs hashtables) and (srfi srfi-69) both provide
*-size.  Their interfaces are slightly different to the native
hash-tables, and the objects produced are not compatible with each
other.

Not sure why the native tables don't have an equivalent, perhaps because:
- the terms ‘size’ and ‘length’ are not so good for hash tables [1];
- it's very easy to do with hash-fold;
- usually more interesting to count particular items rather than everything; and
- there may not be any guarantees on uniqueness of keys amongst the
elements stored in hash table, so what is meant by “number of
elements” can vary between applications [2]?

Hence, hash-fold is always preferred to determine this.

However, for quick and dirty case it's simple enough to write -count,
so I attach a patch for this with total count in constant time:

 -- Scheme Procedure: hash-count pred table
 -- C Function: scm_hash_count (pred, table)
     Return the number of elements in the given hash TABLE for
     which `(PRED KEY VALUE)' returns true.  If PRED itself is
     `#f', return the total number of elements.


Regards


[1] Internally and in the documentation, ‘size’ is already associated
with the number of buckets.  This is consistent with how some academic
literature uses the term also.
[2] For example, a vhash can store multiple elements with the same key
and provides access to them.

[-- Attachment #2: 0001-add-hash-count-for-native-tables.patch --]
[-- Type: application/octet-stream, Size: 4876 bytes --]

From ec52416c2060cff11d0cb435df49207e5aa3dc7a Mon Sep 17 00:00:00 2001
From: Daniel Hartwig <mandyke@gmail.com>
Date: Sun, 17 Feb 2013 16:38:31 +0800
Subject: [PATCH] add hash-count for native tables

* libguile/hashtab.c (scm_hash_count): New function.  Count the number
  of elements in a hash table.

* doc/ref/api-compound.texi (Hash Tables): Update examples and
  reference.

* test-suite/tests/hash.test (hash-count): New test.
---
 doc/ref/api-compound.texi  |   29 +++++++++++++++++++++++++++--
 libguile/hashtab.c         |   32 ++++++++++++++++++++++++++++++++
 libguile/hashtab.h         |    1 +
 test-suite/tests/hash.test |   16 ++++++++++++++++
 4 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/doc/ref/api-compound.texi b/doc/ref/api-compound.texi
index be3d65f..4fcee29 100644
--- a/doc/ref/api-compound.texi
+++ b/doc/ref/api-compound.texi
@@ -3796,8 +3796,9 @@ key is not found.
 #f
 @end lisp
 
-There is no procedure for calculating the number of key/value-pairs in
-a hash table, but @code{hash-fold} can be used for doing exactly that.
+Interesting results can be computed by using @code{hash-fold} to work
+through each element.  This example will count the total number of
+elements:
 
 @lisp
 (hash-fold (lambda (key value seed) (+ 1 seed)) 0 h)
@@ -3805,6 +3806,24 @@ a hash table, but @code{hash-fold} can be used for doing exactly that.
 3
 @end lisp
 
+The same thing can be done with the procedure @code{hash-count}, which
+can also count the number of elements matching a particular predicate.
+For example, count the number of elements with string values:
+
+@lisp
+(hash-count (lambda (key value) (string? value)) h)
+@result{}
+2
+@end lisp
+
+To count all elements:
+
+@lisp
+(hash-count #f h)
+@result{}
+3
+@end lisp
+
 @node Hash Table Reference
 @subsubsection Hash Table Reference
 
@@ -4032,6 +4051,12 @@ For example, the following returns a count of how many keys in
 @end example
 @end deffn
 
+@deffn {Scheme Procedure} hash-count pred table
+@deffnx {C Function} scm_hash_count (pred, table)
+Return the number of elements in the given hash @var{table} that cause
+@code{(@var{pred} @var{key} @var{value})} to return true.  If @var{pred}
+itself is @code{#f}, return the total number of elements.
+@end deffn
 
 @c Local Variables:
 @c TeX-master: "guile.texi"
diff --git a/libguile/hashtab.c b/libguile/hashtab.c
index 0abc7dc..539971d 100644
--- a/libguile/hashtab.c
+++ b/libguile/hashtab.c
@@ -584,6 +584,7 @@ SCM_DEFINE (scm_doubly_weak_hash_table_p, "doubly-weak-hash-table?", 1, 0, 0,
 }
 #undef FUNC_NAME
 
+
 \f
 /* Accessing hash table entries.  */
 
@@ -1371,6 +1372,37 @@ SCM_DEFINE (scm_hash_map_to_list, "hash-map->list", 2, 0, 0,
 }
 #undef FUNC_NAME
 
+static SCM
+count_proc (void *pred, SCM key, SCM data, SCM value)
+{
+  if (scm_is_false (scm_call_2 (SCM_PACK (pred), key, data)))
+    return value;
+  else
+    return scm_oneplus(value);
+}
+
+SCM_DEFINE (scm_hash_count, "hash-count", 2, 0, 0,
+            (SCM pred, SCM table),
+            "Return the number of elements in the given hash TABLE that\n"
+            "cause `(PRED KEY VALUE)' to return true.  If PRED itself is\n"
+            "`#f', return the total number of elements.")
+#define FUNC_NAME s_scm_hash_count
+{
+  SCM init;
+
+  SCM_VALIDATE_HASHTABLE (2, table);
+
+  if (scm_is_false (pred))
+    return scm_from_ulong (SCM_HASHTABLE_N_ITEMS(table));
+
+  SCM_VALIDATE_PROC (1, pred);
+
+  init = scm_from_int (0);
+  return scm_internal_hash_fold ((scm_t_hash_fold_fn) count_proc,
+				 (void *) SCM_UNPACK (pred), init, table);
+}
+#undef FUNC_NAME
+
 \f
 
 SCM
diff --git a/libguile/hashtab.h b/libguile/hashtab.h
index 3149946..dcebcb8 100644
--- a/libguile/hashtab.h
+++ b/libguile/hashtab.h
@@ -164,6 +164,7 @@ SCM_API SCM scm_hash_fold (SCM proc, SCM init, SCM hash);
 SCM_API SCM scm_hash_for_each (SCM proc, SCM hash);
 SCM_API SCM scm_hash_for_each_handle (SCM proc, SCM hash);
 SCM_API SCM scm_hash_map_to_list (SCM proc, SCM hash);
+SCM_API SCM scm_hash_count (SCM hash, SCM pred);
 SCM_INTERNAL void scm_i_hashtable_print (SCM exp, SCM port, scm_print_state *pstate);
 SCM_INTERNAL void scm_init_hashtab (void);
 
diff --git a/test-suite/tests/hash.test b/test-suite/tests/hash.test
index bcdfe91..dfb23ac 100644
--- a/test-suite/tests/hash.test
+++ b/test-suite/tests/hash.test
@@ -292,3 +292,19 @@
    exception:wrong-type-arg
    (hashx-set! (lambda (k s) 1) (lambda (k al) #t) (make-hash-table) 'foo 'bar))
   )
+
+
+;;;
+;;; hash-count
+;;;
+
+(with-test-prefix "hash-count"
+  (let ((table (make-hash-table)))
+    (hashq-set! table 'foo "bar")
+    (hashq-set! table 'braz "zonk")
+    (hashq-create-handle! table 'frob #f)
+
+    (pass-if (equal? 3 (hash-count #f table)))
+
+    (pass-if (equal? 2 (hash-count (lambda (k v)
+                                     (string? v)) table)))))
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-17 11:59 ` Daniel Hartwig
@ 2013-02-17 13:41   ` Ludovic Courtès
  2013-02-18  1:00     ` Daniel Hartwig
  2013-02-17 14:15   ` Hengqing Hu
  1 sibling, 1 reply; 16+ messages in thread
From: Ludovic Courtès @ 2013-02-17 13:41 UTC (permalink / raw)
  To: guile-user

Hi!

Daniel Hartwig <mandyke@gmail.com> skribis:

> From ec52416c2060cff11d0cb435df49207e5aa3dc7a Mon Sep 17 00:00:00 2001
> From: Daniel Hartwig <mandyke@gmail.com>
> Date: Sun, 17 Feb 2013 16:38:31 +0800
> Subject: [PATCH] add hash-count for native tables
>
> * libguile/hashtab.c (scm_hash_count): New function.  Count the number
>   of elements in a hash table.
>
> * doc/ref/api-compound.texi (Hash Tables): Update examples and
>   reference.
>
> * test-suite/tests/hash.test (hash-count): New test.

Looks like a worthy addition.  A couple of comments:

> +To count all elements:
> +
> +@lisp
> +(hash-count #f h)

I would instead recommend (hash-count (const #t) h), and remove the
special case.  WDYT?

> +    return scm_oneplus(value);

Space before parenthesis.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-17 11:59 ` Daniel Hartwig
  2013-02-17 13:41   ` Ludovic Courtès
@ 2013-02-17 14:15   ` Hengqing Hu
  2013-02-18  2:01     ` Daniel Hartwig
  1 sibling, 1 reply; 16+ messages in thread
From: Hengqing Hu @ 2013-02-17 14:15 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-user

I don't know it's intentional or a bug,
but now object->string returns something different.

My understanding of length and fold is the same for deep lists as for
hash table.

It's great you have provided a patch so a constant time way
to know no of key/value bindings in a hash table could be more accessible..

在 2013-2-17,19:59,Daniel Hartwig <mandyke@gmail.com> 写道:

> On 17 February 2013 11:55, Hengqing Hu <hengqing.hu@gmail.com> wrote:
>> Dear list,
>> 
>> It seems the read syntax of a hash table has been silently changed.
>> 
>> Now (make-hash-table) produces
>> #<hash-table 172f1c0 0/31>
>> instead what was in 1.8
>> #<hash-table 0/31>
>> 
>> But the document is not updated yet?
>> 
> 
> Hi
> 
> That's how hash-table values are printed, true, but it is not _read
> syntax_.  Neither is it a documented format (although it does appear
> in the example output of one command).  You can not rely on the format
> of that string to be anything, it is just a convenience. :-)
> 
> The manual recommends to use hash-fold for counting the elements:
>  (hash-fold (lambda (key value count) (1+ count)) 0 table)
> 
> You could also adapt your parser to be more flexible, looking only at
> the tail of the string after the last whitespace.  Although such hacks
> work, it is not advisable.
> 
> 
>> It would be more convinient if a hash-length procedure could be provided.
> 
> Right.
> 
> The two modules (rnrs hashtables) and (srfi srfi-69) both provide
> *-size.  Their interfaces are slightly different to the native
> hash-tables, and the objects produced are not compatible with each
> other.
> 
> Not sure why the native tables don't have an equivalent, perhaps because:
> - the terms ‘size’ and ‘length’ are not so good for hash tables [1];
> - it's very easy to do with hash-fold;
> - usually more interesting to count particular items rather than everything; and
> - there may not be any guarantees on uniqueness of keys amongst the
> elements stored in hash table, so what is meant by “number of
> elements” can vary between applications [2]?
> 
> Hence, hash-fold is always preferred to determine this.
> 
> However, for quick and dirty case it's simple enough to write -count,
> so I attach a patch for this with total count in constant time:
> 
> -- Scheme Procedure: hash-count pred table
> -- C Function: scm_hash_count (pred, table)
>     Return the number of elements in the given hash TABLE for
>     which `(PRED KEY VALUE)' returns true.  If PRED itself is
>     `#f', return the total number of elements.
> 
> 
> Regards
> 
> 
> [1] Internally and in the documentation, ‘size’ is already associated
> with the number of buckets.  This is consistent with how some academic
> literature uses the term also.
> [2] For example, a vhash can store multiple elements with the same key
> and provides access to them.
> <0001-add-hash-count-for-native-tables.patch>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-17 13:41   ` Ludovic Courtès
@ 2013-02-18  1:00     ` Daniel Hartwig
  2013-02-18  2:28       ` Daniel Hartwig
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Hartwig @ 2013-02-18  1:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user

On 17 February 2013 21:41, Ludovic Courtès <ludo@gnu.org> wrote:
> Hi!
>
> Daniel Hartwig <mandyke@gmail.com> skribis:
>> +To count all elements:
>> +
>> +@lisp
>> +(hash-count #f h)
>
> I would instead recommend (hash-count (const #t) h), and remove the
> special case.  WDYT?

A clearer example, sure.  I included the special case as it runs in
constant time, I suppose changing the order and making PRED optional
would be a neater interface.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-17 14:15   ` Hengqing Hu
@ 2013-02-18  2:01     ` Daniel Hartwig
  2013-02-18  2:36       ` Hengqing Hu
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Hartwig @ 2013-02-18  2:01 UTC (permalink / raw)
  To: Hengqing Hu; +Cc: guile-user

On 17 February 2013 22:15, Hengqing Hu <hengqing.hu@gmail.com> wrote:
> I don't know it's intentional or a bug,
> but now object->string returns something different.

It was an intentional change.  Previously you could not tell apart two
distinct hash tables that happened to have the same number of elements
and bucket size.  Now you can, because the output includes an unique
object id.

>
> My understanding of length and fold is the same for deep lists as for
> hash table.

Interesting.  By deep list do you mean a list that has lists as it's
elements, like:

  (define x '((1 2 3) (4 5 6)))

or ..?

>
> It's great you have provided a patch so a constant time way
> to know no of key/value bindings in a hash table could be more accessible..

Its not an operation fundamental to hash-tables (indeed, some
implementations can not determine this in constant time), and I think
that is the main reason for its absence.  What do you use the
information for?

Regards



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  1:00     ` Daniel Hartwig
@ 2013-02-18  2:28       ` Daniel Hartwig
  2013-02-18  8:21         ` Ludovic Courtès
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Hartwig @ 2013-02-18  2:28 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

On 18 February 2013 09:00, Daniel Hartwig <mandyke@gmail.com> wrote:
> I included the special case as it runs in
> constant time, I suppose changing the order and making PRED optional
> would be a neater interface.

Actually, lets not leak this implementation detail in the API, as it
is not a natural property of hash-tables [1].  Here is the patch with
your original suggestions applied.

Regards

[1] Perhaps a touchy issue.  Hash-tables already expose some
implementation details, but lets not continue down that path.

[-- Attachment #2: 0001-add-hash-count-for-native-tables.patch --]
[-- Type: application/octet-stream, Size: 4873 bytes --]

From 3330f00f54649cdd0914b6ff03c7b7bbc38ffa8d Mon Sep 17 00:00:00 2001
From: Daniel Hartwig <mandyke@gmail.com>
Date: Sun, 17 Feb 2013 16:38:31 +0800
Subject: [PATCH] add hash-count for native tables

* libguile/hashtab.c (scm_hash_count): New function.  Count the number
  of elements in a hash table.

* doc/ref/api-compound.texi (Hash Tables): Update examples and
  reference.

* test-suite/tests/hash.test (hash-count): New test.
---
 doc/ref/api-compound.texi  |   30 ++++++++++++++++++++++++++++--
 libguile/hashtab.c         |   28 ++++++++++++++++++++++++++++
 libguile/hashtab.h         |    1 +
 test-suite/tests/hash.test |   16 ++++++++++++++++
 4 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/doc/ref/api-compound.texi b/doc/ref/api-compound.texi
index be3d65f..f7fa07d 100644
--- a/doc/ref/api-compound.texi
+++ b/doc/ref/api-compound.texi
@@ -3796,8 +3796,9 @@ key is not found.
 #f
 @end lisp
 
-There is no procedure for calculating the number of key/value-pairs in
-a hash table, but @code{hash-fold} can be used for doing exactly that.
+Interesting results can be computed by using @code{hash-fold} to work
+through each element.  This example will count the total number of
+elements:
 
 @lisp
 (hash-fold (lambda (key value seed) (+ 1 seed)) 0 h)
@@ -3805,6 +3806,24 @@ a hash table, but @code{hash-fold} can be used for doing exactly that.
 3
 @end lisp
 
+The same thing can be done with the procedure @code{hash-count}, which
+can also count the number of elements matching a particular predicate.
+For example, count the number of elements with string values:
+
+@lisp
+(hash-count (lambda (key value) (string? value)) h)
+@result{}
+2
+@end lisp
+
+Counting all the elements is a simple task using @code{const}:
+
+@lisp
+(hash-count (const #t) h)
+@result{}
+3
+@end lisp
+
 @node Hash Table Reference
 @subsubsection Hash Table Reference
 
@@ -4032,6 +4051,13 @@ For example, the following returns a count of how many keys in
 @end example
 @end deffn
 
+@deffn {Scheme Procedure} hash-count pred table
+@deffnx {C Function} scm_hash_count (pred, table)
+Return the number of elements in the given hash @var{table} that cause
+@code{(@var{pred} @var{key} @var{value})} to return true.  To quickly
+determine the total number of elements, use @code{(const #t)} for
+@var{pred}.
+@end deffn
 
 @c Local Variables:
 @c TeX-master: "guile.texi"
diff --git a/libguile/hashtab.c b/libguile/hashtab.c
index 0abc7dc..88cb199 100644
--- a/libguile/hashtab.c
+++ b/libguile/hashtab.c
@@ -584,6 +584,7 @@ SCM_DEFINE (scm_doubly_weak_hash_table_p, "doubly-weak-hash-table?", 1, 0, 0,
 }
 #undef FUNC_NAME
 
+
 \f
 /* Accessing hash table entries.  */
 
@@ -1371,6 +1372,33 @@ SCM_DEFINE (scm_hash_map_to_list, "hash-map->list", 2, 0, 0,
 }
 #undef FUNC_NAME
 
+static SCM
+count_proc (void *pred, SCM key, SCM data, SCM value)
+{
+  if (scm_is_false (scm_call_2 (SCM_PACK (pred), key, data)))
+    return value;
+  else
+    return scm_oneplus(value);
+}
+
+SCM_DEFINE (scm_hash_count, "hash-count", 2, 0, 0,
+            (SCM pred, SCM table),
+            "Return the number of elements in the given hash TABLE that\n"
+            "cause `(PRED KEY VALUE)' to return true.  To quickly determine\n"
+            "the total number of elements, use `(const #t)' for PRED.")
+#define FUNC_NAME s_scm_hash_count
+{
+  SCM init;
+
+  SCM_VALIDATE_PROC (1, pred);
+  SCM_VALIDATE_HASHTABLE (2, table);
+
+  init = scm_from_int (0);
+  return scm_internal_hash_fold ((scm_t_hash_fold_fn) count_proc,
+				 (void *) SCM_UNPACK (pred), init, table);
+}
+#undef FUNC_NAME
+
 \f
 
 SCM
diff --git a/libguile/hashtab.h b/libguile/hashtab.h
index 3149946..dcebcb8 100644
--- a/libguile/hashtab.h
+++ b/libguile/hashtab.h
@@ -164,6 +164,7 @@ SCM_API SCM scm_hash_fold (SCM proc, SCM init, SCM hash);
 SCM_API SCM scm_hash_for_each (SCM proc, SCM hash);
 SCM_API SCM scm_hash_for_each_handle (SCM proc, SCM hash);
 SCM_API SCM scm_hash_map_to_list (SCM proc, SCM hash);
+SCM_API SCM scm_hash_count (SCM hash, SCM pred);
 SCM_INTERNAL void scm_i_hashtable_print (SCM exp, SCM port, scm_print_state *pstate);
 SCM_INTERNAL void scm_init_hashtab (void);
 
diff --git a/test-suite/tests/hash.test b/test-suite/tests/hash.test
index bcdfe91..72aa0c4 100644
--- a/test-suite/tests/hash.test
+++ b/test-suite/tests/hash.test
@@ -292,3 +292,19 @@
    exception:wrong-type-arg
    (hashx-set! (lambda (k s) 1) (lambda (k al) #t) (make-hash-table) 'foo 'bar))
   )
+
+
+;;;
+;;; hash-count
+;;;
+
+(with-test-prefix "hash-count"
+  (let ((table (make-hash-table)))
+    (hashq-set! table 'foo "bar")
+    (hashq-set! table 'braz "zonk")
+    (hashq-create-handle! table 'frob #f)
+
+    (pass-if (equal? 3 (hash-count (const #t) table)))
+
+    (pass-if (equal? 2 (hash-count (lambda (k v)
+                                     (string? v)) table)))))
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  2:01     ` Daniel Hartwig
@ 2013-02-18  2:36       ` Hengqing Hu
  2013-02-18  2:47         ` Daniel Hartwig
  0 siblings, 1 reply; 16+ messages in thread
From: Hengqing Hu @ 2013-02-18  2:36 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-user

Thanks for the collaboration.

You are right, that's what I mean by a deep list.

One usage of knowing the information is to tell
whether the hash table is empty or not,
Since a hash-empty? procedure is not provided.

在 2013-2-18,10:01,Daniel Hartwig <mandyke@gmail.com> 写道:

> On 17 February 2013 22:15, Hengqing Hu <hengqing.hu@gmail.com> wrote:
>> I don't know it's intentional or a bug,
>> but now object->string returns something different.
> 
> It was an intentional change.  Previously you could not tell apart two
> distinct hash tables that happened to have the same number of elements
> and bucket size.  Now you can, because the output includes an unique
> object id.
> 
>> 
>> My understanding of length and fold is the same for deep lists as for
>> hash table.
> 
> Interesting.  By deep list do you mean a list that has lists as it's
> elements, like:
> 
>  (define x '((1 2 3) (4 5 6)))
> 
> or ..?
> 
>> 
>> It's great you have provided a patch so a constant time way
>> to know no of key/value bindings in a hash table could be more accessible..
> 
> Its not an operation fundamental to hash-tables (indeed, some
> implementations can not determine this in constant time), and I think
> that is the main reason for its absence.  What do you use the
> information for?
> 
> Regards



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  2:36       ` Hengqing Hu
@ 2013-02-18  2:47         ` Daniel Hartwig
  2013-02-18  3:14           ` Hengqing Hu
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Hartwig @ 2013-02-18  2:47 UTC (permalink / raw)
  To: Hengqing Hu; +Cc: guile-user

On 18 February 2013 10:36, Hengqing Hu <hengqing.hu@gmail.com> wrote:
> Thanks for the collaboration.
>
> You are right, that's what I mean by a deep list.

So what do you consider the length of the example deep list, is it
two, three, or six?  Length implies a particular dimension, which is
naturally the ordering of the outer list.

Scheme considers:
(length '((1 2 3) (4 5 6)))
=>
2

?

>
> One usage of knowing the information is to tell
> whether the hash table is empty or not,
> Since a hash-empty? procedure is not provided.

Yes but, why do you want to know that?  Isnt what matters whether a
particular key exists or not, and what is its value?  This is the
primary purpose of hash tables, no?

The number of active bindings in the hash table is really just a book
keeping detail.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  2:47         ` Daniel Hartwig
@ 2013-02-18  3:14           ` Hengqing Hu
  2013-02-18  3:24             ` Daniel Hartwig
  0 siblings, 1 reply; 16+ messages in thread
From: Hengqing Hu @ 2013-02-18  3:14 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-user

Though I didn't look into the implementation.

I suppose length is a constant time operation here,
if you use fold instead to get the same thing.
It would be linear time.

Knowing something is empty or not rises after I put something inside
and then take something out.
Then I dont't know whether there are still something left, so I need to examine it.
Or if something can tell me.

All is about to be able to have constant time access to specific information in
this hash table implementation.

在 2013-2-18,10:47,Daniel Hartwig <mandyke@gmail.com> 写道:

> On 18 February 2013 10:36, Hengqing Hu <hengqing.hu@gmail.com> wrote:
>> Thanks for the collaboration.
>> 
>> You are right, that's what I mean by a deep list.
> 
> So what do you consider the length of the example deep list, is it
> two, three, or six?  Length implies a particular dimension, which is
> naturally the ordering of the outer list.
> 
> Scheme considers:
> (length '((1 2 3) (4 5 6)))
> =>
> 2
> 
> ?
> 
>> 
>> One usage of knowing the information is to tell
>> whether the hash table is empty or not,
>> Since a hash-empty? procedure is not provided.
> 
> Yes but, why do you want to know that?  Isnt what matters whether a
> particular key exists or not, and what is its value?  This is the
> primary purpose of hash tables, no?
> 
> The number of active bindings in the hash table is really just a book
> keeping detail.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  3:14           ` Hengqing Hu
@ 2013-02-18  3:24             ` Daniel Hartwig
  2013-02-18  3:28               ` Hengqing Hu
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Hartwig @ 2013-02-18  3:24 UTC (permalink / raw)
  To: Hengqing Hu; +Cc: guile-user

On 18 February 2013 11:14, Hengqing Hu <hengqing.hu@gmail.com> wrote:
> Though I didn't look into the implementation.
>
> I suppose length is a constant time operation here,
> if you use fold instead to get the same thing.
> It would be linear time.

This is not guaranteed.

>
> Knowing something is empty or not rises after I put something inside
> and then take something out.
> Then I dont't know whether there are still something left, so I need to examine it.
> Or if something can tell me.

Although the point of your algorithm is vague, I believe you will find
a better one using hash-fold that does not involve checking for
emptiness.

Why do you need to know if some elements are left, and what happens if
there are?  Do you have some code I could look at?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  3:24             ` Daniel Hartwig
@ 2013-02-18  3:28               ` Hengqing Hu
  2013-02-18  5:11                 ` Daniel Hartwig
  0 siblings, 1 reply; 16+ messages in thread
From: Hengqing Hu @ 2013-02-18  3:28 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-user

Here it is:

https://github.com/hudayou/fib

Welcome your suggestions!

在 2013-2-18,11:24,Daniel Hartwig <mandyke@gmail.com> 写道:

> On 18 February 2013 11:14, Hengqing Hu <hengqing.hu@gmail.com> wrote:
>> Though I didn't look into the implementation.
>> 
>> I suppose length is a constant time operation here,
>> if you use fold instead to get the same thing.
>> It would be linear time.
> 
> This is not guaranteed.
> 
>> 
>> Knowing something is empty or not rises after I put something inside
>> and then take something out.
>> Then I dont't know whether there are still something left, so I need to examine it.
>> Or if something can tell me.
> 
> Although the point of your algorithm is vague, I believe you will find
> a better one using hash-fold that does not involve checking for
> emptiness.
> 
> Why do you need to know if some elements are left, and what happens if
> there are?  Do you have some code I could look at?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  3:28               ` Hengqing Hu
@ 2013-02-18  5:11                 ` Daniel Hartwig
  2013-02-18  5:39                   ` Hengqing Hu
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Hartwig @ 2013-02-18  5:11 UTC (permalink / raw)
  To: Hengqing Hu; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 689 bytes --]

On 18 February 2013 11:28, Hengqing Hu <hengqing.hu@gmail.com> wrote:
> Here it is:
>
> https://github.com/hudayou/fib
>
> Welcome your suggestions!

Please see the attached patch.

For remove-unrelated-blocks, building the hash-tree may be suboptimal
and I suspect could be avoided altogether, but I have not investigated
the rest of the program enough to determine this.  At the final part
it is *important* to check every element in the hash-table in case of
processing bugs, assumptions about size are dangerous to make here,
and probably save very little processing time on average.

pretty-show would not output anything when given an empty hash table,
so no need to avoid it there.

[-- Attachment #2: 0001-remove-hash-length.patch --]
[-- Type: application/octet-stream, Size: 4464 bytes --]

From 30777031aff94fbbe17dc045a447a95a681b41e5 Mon Sep 17 00:00:00 2001
From: Daniel Hartwig <mandyke@gmail.com>
Date: Mon, 18 Feb 2013 12:51:31 +0800
Subject: [PATCH] remove hash-length

* fib (hash-length, hash-empty?): Removed.

  (remove-unrelated-blocks): Minimal rework based on the assumption
  that since blocks are being removed, `hash-tree-size' will always be
  less than `hash-table-size'.  Also restructure the final removal
  from `hash-table'.

  (pretty-show): Use hash-fold and return the number of blocks.
  (display-blocks): Always call `pretty-show', using its return value
  to determine if any blocks are found.
---
 fib |   65 +++++++++++++++++++++++------------------------------------------
 1 file changed, 23 insertions(+), 42 deletions(-)

diff --git a/fib b/fib
index dedbf98..a1f2fd4 100755
--- a/fib
+++ b/fib
@@ -520,22 +520,6 @@
     hash-table
     (build-hash-table list-of-lines 1)))
 
-(define (hash-empty? hash-table)
-  (= (hash-length hash-table) 0))
-
-;; read syntax of a hash table is:
-;; #<hash-table 0/31>
-(define (hash-length hash-table)
-  (string->number
-    (car
-      (reverse
-        (string-split
-          (car
-            (string-split
-              (object->string hash-table)
-              #\/))
-          #\sp)))))
-
 (define (remove-sub-blocks hash-table)
   (define (remove-includes k1 v1)
     (for-each
@@ -569,7 +553,6 @@
 ;; remove blocks which unrelated with blocks in list-of-blocks from
 ;; hash-table
 (define (remove-unrelated-blocks list-of-blocks hash-table)
-  (define hash-table-size (hash-length hash-table))
   (define (build-interval-tree-table hash-table)
     (let ((interval-tree-table (make-hash-table)))
       (hash-for-each
@@ -599,36 +582,33 @@
         (let* ((b (car list-of-blocks))
                (hash-of-path (string-hash (block-path b)))
                (interval-tree (hash-ref interval-tree-table
-                                        hash-of-path))
-               (hash-tree-size (hash-size hash-tree)))
-          (if (< hash-tree-size hash-table-size)
-            (begin
-              (if interval-tree
-                (set! hash-tree
+                                        hash-of-path)))
+          (when interval-tree
+            (set! hash-tree
                   (interval-traverse-search (block-interval b)
                                             interval-tree
                                             hash-insert
                                             hash-tree)))
-              (bht hash-tree (cdr list-of-blocks)))
-            hash-tree)))))
-  (let ((interval-tree-table (build-interval-tree-table hash-table)))
-    (let ((hash-tree (build-hash-tree list-of-blocks interval-tree-table)))
-      (let ((hash-tree-size (hash-size hash-tree)))
-        (if (< hash-tree-size hash-table-size)
-          (for-each
-            (lambda (x)
-              (let ((k (car x)))
-                (if (not (hash-search k hash-tree))
-                  (hash-remove! hash-table k))))
-            (hash-map->list cons hash-table))))))
+          (bht hash-tree (cdr list-of-blocks))))))
+  (let* ((interval-tree-table (build-interval-tree-table hash-table))
+         (hash-tree (build-hash-tree list-of-blocks interval-tree-table)))
+    (for-each
+      (lambda (k)
+        (hash-remove! hash-table k))
+      (hash-fold (lambda (k v ks)
+                   (if (hash-search k hash-tree)
+                       ks
+                       (cons k ks)))
+                 '()
+                 hash-table)))
   hash-table)
 
 (define (pretty-show hash-table)
   (define (footprint x)
     (display x)
     (newline))
-  (hash-for-each
-    (lambda (k v)
+  (hash-fold
+    (lambda (k v n-blocks)
       (display "Block footprint:")
       (newline)
       (if (> (length v) 2)
@@ -642,15 +622,16 @@
       (newline)
       (display (block-content (car v)))
       (newline)
-      (newline))
+      (newline)
+      (1+ n-blocks))
+    0
     hash-table))
 
 (define (display-blocks file hash-table)
   (define (show-blocks hash-table)
-    (if (not (hash-empty? hash-table))
-      (begin
-        (set! exit-status 1)
-        (pretty-show hash-table))))
+    (unless (zero? (pretty-show hash-table))
+      ;; Note: most similar programs use `1' to indicate `no matches'.
+      (set! exit-status 1)))
   (let ((eated-table (eat-blocks hash-table)))
     (if diff-commit
       (let ((list-of-changed-blocks nil))
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  5:11                 ` Daniel Hartwig
@ 2013-02-18  5:39                   ` Hengqing Hu
  2013-02-18  6:29                     ` Daniel Hartwig
  0 siblings, 1 reply; 16+ messages in thread
From: Hengqing Hu @ 2013-02-18  5:39 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-user

Thanks for spending time to investigate.

One thing I'd like to point out, the intention of use hash-length
in remove-unrelated-blocks is to avoid furthur processing
if all blocks are related.

Of course you can use your left hand for the purpose of your right hand.
It does not mean having one hand is better than two hands.

在 2013-2-18,13:11,Daniel Hartwig <mandyke@gmail.com> 写道:

> On 18 February 2013 11:28, Hengqing Hu <hengqing.hu@gmail.com> wrote:
>> Here it is:
>> 
>> https://github.com/hudayou/fib
>> 
>> Welcome your suggestions!
> 
> Please see the attached patch.
> 
> For remove-unrelated-blocks, building the hash-tree may be suboptimal
> and I suspect could be avoided altogether, but I have not investigated
> the rest of the program enough to determine this.  At the final part
> it is *important* to check every element in the hash-table in case of
> processing bugs, assumptions about size are dangerous to make here,
> and probably save very little processing time on average.
> 
> pretty-show would not output anything when given an empty hash table,
> so no need to avoid it there.
> <0001-remove-hash-length.patch>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  5:39                   ` Hengqing Hu
@ 2013-02-18  6:29                     ` Daniel Hartwig
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Hartwig @ 2013-02-18  6:29 UTC (permalink / raw)
  To: Hengqing Hu; +Cc: guile-user

On 18 February 2013 13:39, Hengqing Hu <hengqing.hu@gmail.com> wrote:
> Thanks for spending time to investigate.

Sure.

>
> One thing I'd like to point out, the intention of use hash-length
> in remove-unrelated-blocks is to avoid furthur processing
> if all blocks are related.

This involves the assumption that the sizes of hash-tree and
hash-table are equal.  It is hard to verify this due to the not
trivial construction of hash-tree.

I have not profiled the code but it is clear that most of the
processing happens in build-interval-tree-table and build-hash-tree.
The final part for removing the unrelated items is nothing compared to
that.  At the small expense of sometimes doing the final loop when it
could have been avoided, you gain an procedure that is more easily
verified.

I suspect that remove-unrelated can be restructued to use two loops at
most, but don't have time further to consider it.  Anyway, enjoy.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Hash table read syntax in guile 2.0
  2013-02-18  2:28       ` Daniel Hartwig
@ 2013-02-18  8:21         ` Ludovic Courtès
  0 siblings, 0 replies; 16+ messages in thread
From: Ludovic Courtès @ 2013-02-18  8:21 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-user

Hi!

Daniel Hartwig <mandyke@gmail.com> skribis:

> From 3330f00f54649cdd0914b6ff03c7b7bbc38ffa8d Mon Sep 17 00:00:00 2001
> From: Daniel Hartwig <mandyke@gmail.com>
> Date: Sun, 17 Feb 2013 16:38:31 +0800
> Subject: [PATCH] add hash-count for native tables
>
> * libguile/hashtab.c (scm_hash_count): New function.  Count the number
>   of elements in a hash table.
>
> * doc/ref/api-compound.texi (Hash Tables): Update examples and
>   reference.
>
> * test-suite/tests/hash.test (hash-count): New test.

Looks good to me, please apply.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-02-18  8:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-17  3:55 Hash table read syntax in guile 2.0 Hengqing Hu
2013-02-17 11:59 ` Daniel Hartwig
2013-02-17 13:41   ` Ludovic Courtès
2013-02-18  1:00     ` Daniel Hartwig
2013-02-18  2:28       ` Daniel Hartwig
2013-02-18  8:21         ` Ludovic Courtès
2013-02-17 14:15   ` Hengqing Hu
2013-02-18  2:01     ` Daniel Hartwig
2013-02-18  2:36       ` Hengqing Hu
2013-02-18  2:47         ` Daniel Hartwig
2013-02-18  3:14           ` Hengqing Hu
2013-02-18  3:24             ` Daniel Hartwig
2013-02-18  3:28               ` Hengqing Hu
2013-02-18  5:11                 ` Daniel Hartwig
2013-02-18  5:39                   ` Hengqing Hu
2013-02-18  6:29                     ` Daniel Hartwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).