unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#39634: All keyowrds hash to the same value
@ 2020-02-16 18:20 Rob Browning
  2020-02-20 16:19 ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Rob Browning @ 2020-02-16 18:20 UTC (permalink / raw)
  To: 39634

[-- Attachment #1: Type: text/plain, Size: 184 bytes --]


(hash #:x most-postive-fixnum) hashes to the same value for all keyowrds
in at least 2.2 and 3.0.  Here's one potential fix for 3.0, and I'd be
happy to adjust it for 2.2 if needed.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Implement-hashing-for-keywords-i.e.-hash-x.patch --]
[-- Type: text/x-diff, Size: 1651 bytes --]

From b380102564aad053f22586eb404e99c82635a3b0 Mon Sep 17 00:00:00 2001
From: Rob Browning <rlb@defaultvalue.org>
Date: Sun, 16 Feb 2020 12:12:08 -0600
Subject: [PATCH 1/1] Implement hashing for keywords, i.e. (hash #:x ...)

Add keyword handling to (hash ...).  Previously it would just return the
same value for all keywords.

* libguile/hash.c (scm_raw_ihash): Add scm_tc7_keyword case.

* libguile/keywords.h (SCM_I_KEYWORD_HASH): New macro.
---
 libguile/hash.c     | 3 +++
 libguile/keywords.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/libguile/hash.c b/libguile/hash.c
index d6e93dae0..c51a794c9 100644
--- a/libguile/hash.c
+++ b/libguile/hash.c
@@ -35,6 +35,7 @@
 #include "chars.h"
 #include "foreign.h"
 #include "gsubr.h"
+#include "keywords.h"
 #include "numbers.h"
 #include "pairs.h"
 #include "ports.h"
@@ -307,6 +308,8 @@ scm_raw_ihash (SCM obj, size_t depth)
       return scm_i_string_hash (obj);
     case scm_tc7_symbol:
       return scm_i_symbol_hash (obj);
+    case scm_tc7_keyword:
+      return SCM_I_KEYWORD_HASH (obj);
     case scm_tc7_pointer:
       return scm_raw_ihashq ((uintptr_t) SCM_POINTER_VALUE (obj));
     case scm_tc7_wvect:
diff --git a/libguile/keywords.h b/libguile/keywords.h
index c8f480869..cb8598d8b 100644
--- a/libguile/keywords.h
+++ b/libguile/keywords.h
@@ -60,6 +60,8 @@ SCM_API void
 scm_c_bind_keyword_arguments (const char *subr, SCM rest,
                               scm_t_keyword_arguments_flags flags, ...);
 
+#define SCM_I_KEYWORD_HASH(x) scm_i_symbol_hash (SCM_CELL_OBJECT_1 (x))
+
 SCM_INTERNAL void scm_init_keywords (void);
 
 #endif  /* SCM_KEYWORDS_H */
-- 
2.24.1


[-- Attachment #3: Type: text/plain, Size: 198 bytes --]


-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#39634: All keyowrds hash to the same value
  2020-02-16 18:20 bug#39634: All keyowrds hash to the same value Rob Browning
@ 2020-02-20 16:19 ` Ludovic Courtès
  2020-02-25 20:56   ` Andy Wingo
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2020-02-20 16:19 UTC (permalink / raw)
  To: Rob Browning; +Cc: 39634, Andy Wingo

Hi Rob,

Rob Browning <rlb@defaultvalue.org> skribis:

>>From b380102564aad053f22586eb404e99c82635a3b0 Mon Sep 17 00:00:00 2001
> From: Rob Browning <rlb@defaultvalue.org>
> Date: Sun, 16 Feb 2020 12:12:08 -0600
> Subject: [PATCH 1/1] Implement hashing for keywords, i.e. (hash #:x ...)
>
> Add keyword handling to (hash ...).  Previously it would just return the
> same value for all keywords.
>
> * libguile/hash.c (scm_raw_ihash): Add scm_tc7_keyword case.
>
> * libguile/keywords.h (SCM_I_KEYWORD_HASH): New macro.

LGTM, please push!

Andy and I discussed it on IRC and despite the fact that it’s an ABI
change, we thought including the fix in 3.0.1 was probably the better
option.

Of all the scm_tc7_ values listed in ‘scm.h’, the following are not
explicitly listed (so they go to the default case that hashes the first
word):

  variable, hashtable, fluid, stringbuf, dynamic_state, frame,
  atomic_box, values, program, vm_cont, bytevector, weak_set,
  weak_table, array, bitvector, port

So for example all input file ports hash to the same value, all
3-element bytevectors hash to the same value, etc.

We can probably omit stringbuf, dynamic_state, and values, but the rest
should probably be fixed.

WDYT, Andy?

Thanks,
Ludo’.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39634: All keyowrds hash to the same value
  2020-02-20 16:19 ` Ludovic Courtès
@ 2020-02-25 20:56   ` Andy Wingo
  2020-02-25 22:13     ` lloda
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Andy Wingo @ 2020-02-25 20:56 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39634, Rob Browning

On Thu 20 Feb 2020 17:19, Ludovic Courtès <ludo@gnu.org> writes:

> Of all the scm_tc7_ values listed in ‘scm.h’, the following are not
> explicitly listed (so they go to the default case that hashes the first
> word):

Reformatting your list so I can check one by one :)

>   variable,
>   hashtable,
>   fluid,
>   dynamic_state,
>   frame,
>   atomic_box,
>   program,
>   vm_cont,
>   weak_set,
>   weak_table,
>   port

No equal? implementation, so should hashq() instead.

>   bytevector,
>   array,
>   bitvector,

These have equal? implementations, and what's more, a bitvector can
equal? an array... I think we have another bug!

  ;; Project 2d array as 1d array (scm_tc7_array)
  (define x
    (make-shared-array #2b((#t #t #t)) (lambda (i) (list 0 i)) '(0 2)))
  ;; scm_tc7_bitvector
  (define y #*111)

  (equal? x y) ;; => #t
  (equal? (hash x #xffffffff) (hash y #xffffffff)) ;; => #f
            
Similarly for 1-d scm_tc7_array versus regular vectors, bytevectors,
etc.

Fixing this will not be straightforward...  I think basically 1d arrays
need some special hashing logic so that e.g. vectors and 1d arrays hash
to the same thing.

>   stringbuf,
>   values,

These are never exposed to Scheme, and never compared using equal?
AFAIU.  No need for special cases.

Basically I think the tc7 case should default to hashq, and include
special cases for the ones that have equal? implementations or which
have read syntax.

Sound right to you?

Andy





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39634: All keyowrds hash to the same value
  2020-02-25 20:56   ` Andy Wingo
@ 2020-02-25 22:13     ` lloda
  2020-02-26 21:02     ` Ludovic Courtès
  2020-03-06 14:42     ` Ludovic Courtès
  2 siblings, 0 replies; 8+ messages in thread
From: lloda @ 2020-02-25 22:13 UTC (permalink / raw)
  To: Andy Wingo; +Cc: 39634, Ludovic Courtès, Rob Browning




> On 25 Feb 2020, at 21:56, Andy Wingo <wingo@igalia.com> wrote:
> 
> On Thu 20 Feb 2020 17:19, Ludovic Courtès <ludo@gnu.org> writes:
> 
>> Of all the scm_tc7_ values listed in ‘scm.h’, the following are not
>> explicitly listed (so they go to the default case that hashes the first
>> word):
> 
> Reformatting your list so I can check one by one :)
> 
>>  variable,
>>  hashtable,
>>  fluid,
>>  dynamic_state,
>>  frame,
>>  atomic_box,
>>  program,
>>  vm_cont,
>>  weak_set,
>>  weak_table,
>>  port
> 
> No equal? implementation, so should hashq() instead.
> 
>>  bytevector,
>>  array,
>>  bitvector,
> 
> These have equal? implementations, and what's more, a bitvector can
> equal? an array... I think we have another bug!
> 
>  ;; Project 2d array as 1d array (scm_tc7_array)
>  (define x
>    (make-shared-array #2b((#t #t #t)) (lambda (i) (list 0 i)) '(0 2)))
>  ;; scm_tc7_bitvector
>  (define y #*111)
> 
>  (equal? x y) ;; => #t
>  (equal? (hash x #xffffffff) (hash y #xffffffff)) ;; => #f
> 
> Similarly for 1-d scm_tc7_array versus regular vectors, bytevectors,
> etc.
> 
> Fixing this will not be straightforward...  I think basically 1d arrays
> need some special hashing logic so that e.g. vectors and 1d arrays hash
> to the same thing.

I cannot check at the moment but I think that use of make-shared-array is special cased to return a bitvector because the shared array and the root happen to be equivalent. So your x isn't a scm_tc7_array but a scm_tc7_bitvector. The same is true for the other vector types. You can see that if you make a shared array with bounds '(0 1) instead of '(0 2) for example, or non-unit step or non-zero lower bound or anything that cannot be represented as a root vector.

Now it is true that functionally a root vector and a 1d array with the same bounds and the same elements are equivalent even if the array has non-unit stride and so on, but we had that logic before were you could use the root vector functions on arrays and it was an absolute mess. I think there should be a logic to hash n-d arrays that extends to 1-d arrays so there's no need to make special cases. All vectors can be treated as 1-d arrays so that should work fine for those too.

Partially related, I have a series of patches on wip-vector-cleanup to make sure that the various vector implementations don't depend on arrays (as they still do on some cases) but rather the other way around, strictly. I haven't posted about it b/c it changes a few interfaces and I haven't figured out the deprecation route.

regards





> 
>>  stringbuf,
>>  values,
> 
> These are never exposed to Scheme, and never compared using equal?
> AFAIU.  No need for special cases.
> 
> Basically I think the tc7 case should default to hashq, and include
> special cases for the ones that have equal? implementations or which
> have read syntax.
> 
> Sound right to you?
> 
> Andy
> 
> 
> 






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39634: All keyowrds hash to the same value
  2020-02-25 20:56   ` Andy Wingo
  2020-02-25 22:13     ` lloda
@ 2020-02-26 21:02     ` Ludovic Courtès
  2020-03-06 14:42     ` Ludovic Courtès
  2 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2020-02-26 21:02 UTC (permalink / raw)
  To: Andy Wingo; +Cc: 39634, Rob Browning

Hi Andy!

Andy Wingo <wingo@igalia.com> skribis:

> On Thu 20 Feb 2020 17:19, Ludovic Courtès <ludo@gnu.org> writes:
>
>> Of all the scm_tc7_ values listed in ‘scm.h’, the following are not
>> explicitly listed (so they go to the default case that hashes the first
>> word):
>
> Reformatting your list so I can check one by one :)
>
>>   variable,
>>   hashtable,
>>   fluid,
>>   dynamic_state,
>>   frame,
>>   atomic_box,
>>   program,
>>   vm_cont,
>>   weak_set,
>>   weak_table,
>>   port
>
> No equal? implementation, so should hashq() instead.
>
>>   bytevector,
>>   array,
>>   bitvector,
>
> These have equal? implementations, and what's more, a bitvector can
> equal? an array... I think we have another bug!
>
>   ;; Project 2d array as 1d array (scm_tc7_array)
>   (define x
>     (make-shared-array #2b((#t #t #t)) (lambda (i) (list 0 i)) '(0 2)))
>   ;; scm_tc7_bitvector
>   (define y #*111)
>
>   (equal? x y) ;; => #t
>   (equal? (hash x #xffffffff) (hash y #xffffffff)) ;; => #f
>             
> Similarly for 1-d scm_tc7_array versus regular vectors, bytevectors,
> etc.

Oops.

> Fixing this will not be straightforward...  I think basically 1d arrays
> need some special hashing logic so that e.g. vectors and 1d arrays hash
> to the same thing.

OK.

>>   stringbuf,
>>   values,
>
> These are never exposed to Scheme, and never compared using equal?
> AFAIU.  No need for special cases.

Agreed.

> Basically I think the tc7 case should default to hashq, and include
> special cases for the ones that have equal? implementations or which
> have read syntax.
>
> Sound right to you?

It does, yes.

Thanks for looking into it,
Ludo’.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39634: All keyowrds hash to the same value
  2020-02-25 20:56   ` Andy Wingo
  2020-02-25 22:13     ` lloda
  2020-02-26 21:02     ` Ludovic Courtès
@ 2020-03-06 14:42     ` Ludovic Courtès
  2020-03-06 15:43       ` Andy Wingo
  2 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2020-03-06 14:42 UTC (permalink / raw)
  To: Andy Wingo; +Cc: 39634, Rob Browning

[-- Attachment #1: Type: text/plain, Size: 469 bytes --]

Andy Wingo <wingo@igalia.com> skribis:

>>   variable,
>>   hashtable,
>>   fluid,
>>   dynamic_state,
>>   frame,
>>   atomic_box,
>>   program,
>>   vm_cont,
>>   weak_set,
>>   weak_table,
>>   port
>
> No equal? implementation, so should hashq() instead.

Here’s a patch for these, let me know what you think!

(Of course longer term it’d be nice to have a centralized way to declare
each tc7 with its equal and hash methods.)

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1053 bytes --]

diff --git a/libguile/hash.c b/libguile/hash.c
index c51a794c9..9cb8fcedd 100644
--- a/libguile/hash.c
+++ b/libguile/hash.c
@@ -1,4 +1,4 @@
-/* Copyright 1995-1997,2000-2001,2003-2004,2006,2008-2015,2017-2018
+/* Copyright 1995-1997,2000-2001,2003-2004,2006,2008-2015,2017-2018,2020
      Free Software Foundation, Inc.
 
    This file is part of Guile.
@@ -331,6 +331,22 @@ scm_raw_ihash (SCM obj, size_t depth)
         h ^= scm_raw_ihash (scm_syntax_module (obj), depth);
         return h;
       }
+
+      /* The following tc7s have no 'equal?' implementation.  Thus, just
+         fall back to 'hashq'.  */
+    case scm_tc7_variable:
+    case scm_tc7_hashtable:
+    case scm_tc7_fluid:
+    case scm_tc7_dynamic_state:
+    case scm_tc7_frame:
+    case scm_tc7_atomic_box:
+    case scm_tc7_program:
+    case scm_tc7_vm_cont:
+    case scm_tc7_weak_set:
+    case scm_tc7_weak_table:
+    case scm_tc7_port:
+      return scm_raw_ihashq (SCM_UNPACK (obj));
+
     case scm_tcs_cons_imcar: 
     case scm_tcs_cons_nimcar:
       if (depth)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#39634: All keyowrds hash to the same value
  2020-03-06 14:42     ` Ludovic Courtès
@ 2020-03-06 15:43       ` Andy Wingo
  2020-03-06 16:19         ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Wingo @ 2020-03-06 15:43 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39634, Rob Browning

On Fri 06 Mar 2020 15:42, Ludovic Courtès <ludo@gnu.org> writes:

> Andy Wingo <wingo@igalia.com> skribis:
>
>>>   variable,
>>>   hashtable,
>>>   fluid,
>>>   dynamic_state,
>>>   frame,
>>>   atomic_box,
>>>   program,
>>>   vm_cont,
>>>   weak_set,
>>>   weak_table,
>>>   port
>>
>> No equal? implementation, so should hashq() instead.
>
> Here’s a patch for these, let me know what you think!

LGTM!

Andy





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#39634: All keyowrds hash to the same value
  2020-03-06 15:43       ` Andy Wingo
@ 2020-03-06 16:19         ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2020-03-06 16:19 UTC (permalink / raw)
  To: Andy Wingo; +Cc: 39634-done, Rob Browning

Andy Wingo <wingo@igalia.com> skribis:

> On Fri 06 Mar 2020 15:42, Ludovic Courtès <ludo@gnu.org> writes:
>
>> Andy Wingo <wingo@igalia.com> skribis:
>>
>>>>   variable,
>>>>   hashtable,
>>>>   fluid,
>>>>   dynamic_state,
>>>>   frame,
>>>>   atomic_box,
>>>>   program,
>>>>   vm_cont,
>>>>   weak_set,
>>>>   weak_table,
>>>>   port
>>>
>>> No equal? implementation, so should hashq() instead.
>>
>> Here’s a patch for these, let me know what you think!
>
> LGTM!

Pushed as c5d3b45c9f903cc9726b4bd7e3c13db56bafd587, thanks!

Ludo'.





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-06 16:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-16 18:20 bug#39634: All keyowrds hash to the same value Rob Browning
2020-02-20 16:19 ` Ludovic Courtès
2020-02-25 20:56   ` Andy Wingo
2020-02-25 22:13     ` lloda
2020-02-26 21:02     ` Ludovic Courtès
2020-03-06 14:42     ` Ludovic Courtès
2020-03-06 15:43       ` Andy Wingo
2020-03-06 16:19         ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).