unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* What's the specification for sxhash?
@ 2019-06-06  7:55 Philippe Schnoebelen
  2019-06-06 12:52 ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Schnoebelen @ 2019-06-06  7:55 UTC (permalink / raw)
  To: emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 3488 bytes --]

Some of my elisp code broke down when I ran it on a newer emacs because
the values returned by sxhash are different from what they were in the
old version.

I had assumed without thinking that sxhash'es of Lisp objects were fixed
one and for all, a bit like the shasum of some file.

Probably the doc string should be more explicit about what is/is not
guaranteed. Is consistency only guaranteed inside the current running
emacs thread/process?


Looking at the source code in src/fns.c, it seems that the change of
sxhash values was caused by the recent change allowing emacs lisp
integers of arbitrary size. These numbers are hashed differently, and
hashing itself has to truncate large integers that are now a different
data type. (However, I have also found changes in sxhash values between
v26.1 and v26.2, before the move to arbitrary large integers in v27, see
examples below.)

Looking at the source code, one would think that sxhash values are meant
to only depend on the structure of the Lisp object itself, not the
hardware or the emacs version, but of course this view may have to yield
when the C data type for Lisp objects is modified. Is this what the
developers had in mind?


Today the elisp manual reads

— Function: *sxhash-equal*obj

    This function returns a hash code for Lisp object obj. This is an
    integer which reflects the contents of obj and the other Lisp
    objects it points to.

    If two objects obj1 and obj2 are |equal|, then |(sxhash-equal
    |obj1|)| and |(sxhash-equal |obj2|)| are the same integer.

    If the two objects are not |equal|, the values returned by
    |sxhash-equal| are usually different, but not always; once in a rare
    while, by luck, you will encounter two distinct-looking objects that
    give the same result from |sxhash-equal|.

This does not address the consistency issue, and is a bit misleading
about collisions with its "once in a rare while, by luck".  Without
trying, I stumbled upon many collisions in my test. Is sxhash really
intended to avoid collisions?

Many thanks for your time,

--philippe


Here is what I obtained by running emacs -Q --batch -l  sxhash-report.el
(.el file attached below) on 3 different versions of emacs:

in GNU Emacs 26.1 (build 1, x86_64-apple-darwin18.6.0, NS appkit-1671.50
Version 10.14.5 (Build 18F132))
 of 2019-05-28
1 is hash for 1
0 is hash for 0
0 is hash for nil
11484 is hash for t
0 is hash for (0 . 0)
0 is hash for ""
2030 is hash for "ab"
517809 is hash for "abcd"
93076 is hash for (1 2 3 4 5)
1608904915 is hash for (a b c)
5457926 is hash for ab
5457938 is hash for abcd
5457950 is hash for :misc

in GNU Emacs 26.2 (build 1, x86_64-apple-darwin18.2.0, NS appkit-1671.20
Version 10.14.3 (Build 18D109))
 of 2019-04-13
1 is hash for 1
0 is hash for 0
0 is hash for nil
11616 is hash for t
0 is hash for (0 . 0)
0 is hash for ""
2030 is hash for "ab"
517809 is hash for "abcd"
93076 is hash for (1 2 3 4 5)
6307170282 is hash for (a b c)
10938216 is hash for ab
10938228 is hash for abcd
10938240 is hash for :misc

in GNU Emacs 27.0.50 (build 1, x86_64-apple-darwin18.6.0, NS
appkit-1671.50 Version 10.14.5 (Build 18F132))
 of 2019-06-06
1 is hash for 1
0 is hash for 0
0 is hash for nil
11784 is hash for t
0 is hash for (0 . 0)
0 is hash for ""
2030 is hash for "ab"
517809 is hash for "abcd"
93076 is hash for (1 2 3 4 5)
238969789 is hash for (a b c)
35167434275996 is hash for ab
35167434276008 is hash for abcd
35167434276020 is hash for :misc



[-- Attachment #1.2: Type: text/html, Size: 4791 bytes --]

[-- Attachment #2: sxhash-report.el --]
[-- Type: text/plain, Size: 317 bytes --]

;;;; File: sxhash-report.el -*- lexical-binding: t; -*-

(defvar some-lisp-objects
  (list 1 0 nil t 
	(cons 0 0)
	"" "ab" "abcd"
	'(1 2 3 4 5)  '(a b c)
	'ab 'abcd :misc
	))


(princ (format "in %s\n" (emacs-version)))
(dolist (obj some-lisp-objects)
  (princ (format "%S is hash for %S\n" (sxhash-equal obj) obj)))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the specification for sxhash?
  2019-06-06  7:55 What's the specification for sxhash? Philippe Schnoebelen
@ 2019-06-06 12:52 ` Stefan Monnier
  2019-06-06 14:45   ` Andreas Schwab
  2019-06-07  6:43   ` Philippe Schnoebelen
  0 siblings, 2 replies; 7+ messages in thread
From: Stefan Monnier @ 2019-06-06 12:52 UTC (permalink / raw)
  To: emacs-devel

> Probably the doc string should be more explicit about what is/is not
> guaranteed. Is consistency only guaranteed inside the current running
> emacs thread/process?

Yes, they're only guaranteed to work within a particular Emacs session.

In practice they're fairly stable across sessions, but they depend for
example on the size of fixnums, so they're different in a 32bit build of
Emacs and a 64bit build of Emacs.  And we don't guarantee that they
don't change from one version to another.


        Stefan




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the specification for sxhash?
  2019-06-06 12:52 ` Stefan Monnier
@ 2019-06-06 14:45   ` Andreas Schwab
  2019-06-06 15:04     ` Philippe Schnoebelen
  2019-06-07  6:43   ` Philippe Schnoebelen
  1 sibling, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2019-06-06 14:45 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On Jun 06 2019, Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> In practice they're fairly stable across sessions, but they depend for
> example on the size of fixnums, so they're different in a 32bit build of
> Emacs and a 64bit build of Emacs.  And we don't guarantee that they
> don't change from one version to another.

For some types the hash uses the address of the object, so it is likely
to differ even between runs of the same binary.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the specification for sxhash?
  2019-06-06 14:45   ` Andreas Schwab
@ 2019-06-06 15:04     ` Philippe Schnoebelen
  2019-06-06 15:14       ` Andreas Schwab
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Schnoebelen @ 2019-06-06 15:04 UTC (permalink / raw)
  To: emacs-devel

On 2019/06/06 16:45, Andreas Schwab wrote:
> For some types the hash uses the address of the object, so it is likely
> to differ even between runs of the same binary.

What types?

Just curious here. I've not noticed any change in hash values until I
changed my Emacs version so it must be exotic types that I don't use..

In fact it must be the types of objects that are only equal when they
are eq, otherwise the spec for sxhash-equal would not be respected.

--philippe



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the specification for sxhash?
  2019-06-06 15:04     ` Philippe Schnoebelen
@ 2019-06-06 15:14       ` Andreas Schwab
  2019-06-06 15:30         ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2019-06-06 15:14 UTC (permalink / raw)
  To: Philippe Schnoebelen; +Cc: emacs-devel

On Jun 06 2019, Philippe Schnoebelen <schnoebelen.ph@gmail.com> wrote:

> On 2019/06/06 16:45, Andreas Schwab wrote:
>> For some types the hash uses the address of the object, so it is likely
>> to differ even between runs of the same binary.
>
> What types?

Symbols, for example.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the specification for sxhash?
  2019-06-06 15:14       ` Andreas Schwab
@ 2019-06-06 15:30         ` Stefan Monnier
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2019-06-06 15:30 UTC (permalink / raw)
  To: emacs-devel

>>> For some types the hash uses the address of the object, so it is likely
>>> to differ even between runs of the same binary.
>> What types?
> Symbols, for example.

And lots of others.  Basically all the types for which `equal` is the
same as `eq`, such as buffers, windows, processes, frames, markers,
overlays, hash-tables, ...


        Stefan




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the specification for sxhash?
  2019-06-06 12:52 ` Stefan Monnier
  2019-06-06 14:45   ` Andreas Schwab
@ 2019-06-07  6:43   ` Philippe Schnoebelen
  1 sibling, 0 replies; 7+ messages in thread
From: Philippe Schnoebelen @ 2019-06-07  6:43 UTC (permalink / raw)
  To: emacs-devel

On 2019/06/06 14:52, Stefan Monnier wrote:
>> Probably the doc string should be more explicit about what is/is not
>> guaranteed. Is consistency only guaranteed inside the current running
>> emacs thread/process?
> Yes, they're only guaranteed to work within a particular Emacs session.
>
> In practice they're fairly stable across sessions, but they depend for
> example on the size of fixnums, so they're different in a 32bit build of
> Emacs and a 64bit build of Emacs.  And we don't guarantee that they
> don't change from one version to another.

Thanks to Andreas and Stefan for the explanations. I was rather confused
in my use of sxhash as a kind of signature for objects. I now see that
the confusion lasted because I only used it on number trees (conses and
integers).

Any comments about the collision avoidance? Many simple objects have a 0
sxhash-equal value.

--phs




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-06-07  6:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06  7:55 What's the specification for sxhash? Philippe Schnoebelen
2019-06-06 12:52 ` Stefan Monnier
2019-06-06 14:45   ` Andreas Schwab
2019-06-06 15:04     ` Philippe Schnoebelen
2019-06-06 15:14       ` Andreas Schwab
2019-06-06 15:30         ` Stefan Monnier
2019-06-07  6:43   ` Philippe Schnoebelen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).