From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Philippe Schnoebelen Newsgroups: gmane.emacs.devel Subject: What's the specification for sxhash? Date: Thu, 6 Jun 2019 09:55:00 +0200 Message-ID: <20b88c13-e045-5079-f1de-8f16eaf80d30@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------3F007F78E01D37D1420C6365" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="104193"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jun 06 09:55:21 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hYnF1-000QyM-4g for ged-emacs-devel@m.gmane.org; Thu, 06 Jun 2019 09:55:19 +0200 Original-Received: from localhost ([127.0.0.1]:56114 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYnEz-0003G8-Hc for ged-emacs-devel@m.gmane.org; Thu, 06 Jun 2019 03:55:17 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:45770) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYnEn-0003Fs-V7 for emacs-devel@gnu.org; Thu, 06 Jun 2019 03:55:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYnEm-00063z-Fa for emacs-devel@gnu.org; Thu, 06 Jun 2019 03:55:05 -0400 Original-Received: from mail-wm1-x32a.google.com ([2a00:1450:4864:20::32a]:40011) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hYnEm-000630-7b for emacs-devel@gnu.org; Thu, 06 Jun 2019 03:55:04 -0400 Original-Received: by mail-wm1-x32a.google.com with SMTP id v19so1317489wmj.5 for ; Thu, 06 Jun 2019 00:55:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:subject:message-id:date:user-agent:mime-version :content-language; bh=EIWvfP3mjqwfgdn4Pr5nkkpBQ9eG11dQCiuGFA9ge/g=; b=qbXZYsKyjcLfnovN10H4WCBJpIk6+hcBQCT7BlGca0NbYv/KJS0qZaqn8PhoZ34QyK pi77pC9TH/QEWwHm1phfYCNVkwwPl+LYgDUwVLZkZGBvDViqWVpHyiT4sYtNEDL9GS4D Q9/9+EF5yMNaByRnxt2b5gdt6oJVufRHuqa3fNZsAW+3mAGvkc9cJW/NTrONoeQpUELb c/OSrQwC3sJfbIA8QZaFtMmdYzY5OWcu+1KjuI3UQ1kmGpNbShCdREvkxlWqO2rM1B9N /ZJFuFOIGMMNN0HK6vdWuyo0cTVNzvKpiK3MvjAFQf4MUS9Ka6PJfpNyPxjn2W9/YreQ XBwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-language; bh=EIWvfP3mjqwfgdn4Pr5nkkpBQ9eG11dQCiuGFA9ge/g=; b=nw730/FzCba/iy0+LSZqQEUK8HGZMv8HTO7jnNYpFP64sQY+r3dlGBVLUm1cqT1Min fbhpRM3cfZ/mKF/vuVjH8aNmnL7KzJ5JH0j9NaNkeVWXHiGBDcCBo0d52jpTvE1s3sgX BsJc2GrFaiEYdslHT55tQm/Ie3cfb7xZd9XyAFniGJc1ivXocGcnFOMiG2ttSlwDLLOe 54lobRk4ZiH95X3/HPnO25Itgzd3gMX+5IeJECoLGw3U5UBAashVX5ERwMxG0lmGOyAb poTDxnJXWAAUmwaPqFPyWBDfbn57M5zBk2qAXWfVI+R4XluhWLVhY/VQb/totzH42mIT xpyQ== X-Gm-Message-State: APjAAAUbpBdfLO6Kjmjmjv3FOzYxDJYDQNTMB4IM/ygcpn/es5C561XT 5VczpGGe576QiiCCE6C7w8z3BzMg X-Google-Smtp-Source: APXvYqygPVkHNStKNims/oSlMbIJ6Je8o2u/Z0BPjdkry1DEKSVhr4kHe0H9hd0e7zdV7XcecmcSTQ== X-Received: by 2002:a7b:c043:: with SMTP id u3mr24284690wmc.56.1559807702248; Thu, 06 Jun 2019 00:55:02 -0700 (PDT) Original-Received: from brol.lsv.fr (brol.lsv.fr. [138.231.81.57]) by smtp.gmail.com with ESMTPSA id w3sm960238wmc.8.2019.06.06.00.55.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Jun 2019 00:55:01 -0700 (PDT) Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::32a X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:237266 Archived-At: This is a multi-part message in MIME format. --------------3F007F78E01D37D1420C6365 Content-Type: multipart/alternative; boundary="------------EFAE2BCBF3FC18D245CE6B76" --------------EFAE2BCBF3FC18D245CE6B76 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Some of my elisp code broke down when I ran it on a newer emacs because the values returned by sxhash are different from what they were in the old version. I had assumed without thinking that sxhash'es of Lisp objects were fixed one and for all, a bit like the shasum of some file. Probably the doc string should be more explicit about what is/is not guaranteed. Is consistency only guaranteed inside the current running emacs thread/process? Looking at the source code in src/fns.c, it seems that the change of sxhash values was caused by the recent change allowing emacs lisp integers of arbitrary size. These numbers are hashed differently, and hashing itself has to truncate large integers that are now a different data type. (However, I have also found changes in sxhash values between v26.1 and v26.2, before the move to arbitrary large integers in v27, see examples below.) Looking at the source code, one would think that sxhash values are meant to only depend on the structure of the Lisp object itself, not the hardware or the emacs version, but of course this view may have to yield when the C data type for Lisp objects is modified. Is this what the developers had in mind? Today the elisp manual reads — Function: *sxhash-equal*obj This function returns a hash code for Lisp object obj. This is an integer which reflects the contents of obj and the other Lisp objects it points to. If two objects obj1 and obj2 are |equal|, then |(sxhash-equal |obj1|)| and |(sxhash-equal |obj2|)| are the same integer. If the two objects are not |equal|, the values returned by |sxhash-equal| are usually different, but not always; once in a rare while, by luck, you will encounter two distinct-looking objects that give the same result from |sxhash-equal|. This does not address the consistency issue, and is a bit misleading about collisions with its "once in a rare while, by luck".  Without trying, I stumbled upon many collisions in my test. Is sxhash really intended to avoid collisions? Many thanks for your time, --philippe Here is what I obtained by running emacs -Q --batch -l  sxhash-report.el (.el file attached below) on 3 different versions of emacs: in GNU Emacs 26.1 (build 1, x86_64-apple-darwin18.6.0, NS appkit-1671.50 Version 10.14.5 (Build 18F132))  of 2019-05-28 1 is hash for 1 0 is hash for 0 0 is hash for nil 11484 is hash for t 0 is hash for (0 . 0) 0 is hash for "" 2030 is hash for "ab" 517809 is hash for "abcd" 93076 is hash for (1 2 3 4 5) 1608904915 is hash for (a b c) 5457926 is hash for ab 5457938 is hash for abcd 5457950 is hash for :misc in GNU Emacs 26.2 (build 1, x86_64-apple-darwin18.2.0, NS appkit-1671.20 Version 10.14.3 (Build 18D109))  of 2019-04-13 1 is hash for 1 0 is hash for 0 0 is hash for nil 11616 is hash for t 0 is hash for (0 . 0) 0 is hash for "" 2030 is hash for "ab" 517809 is hash for "abcd" 93076 is hash for (1 2 3 4 5) 6307170282 is hash for (a b c) 10938216 is hash for ab 10938228 is hash for abcd 10938240 is hash for :misc in GNU Emacs 27.0.50 (build 1, x86_64-apple-darwin18.6.0, NS appkit-1671.50 Version 10.14.5 (Build 18F132))  of 2019-06-06 1 is hash for 1 0 is hash for 0 0 is hash for nil 11784 is hash for t 0 is hash for (0 . 0) 0 is hash for "" 2030 is hash for "ab" 517809 is hash for "abcd" 93076 is hash for (1 2 3 4 5) 238969789 is hash for (a b c) 35167434275996 is hash for ab 35167434276008 is hash for abcd 35167434276020 is hash for :misc --------------EFAE2BCBF3FC18D245CE6B76 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit

Some of my elisp code broke down when I ran it on a newer emacs because the values returned by sxhash are different from what they were in the old version.

I had assumed without thinking that sxhash'es of Lisp objects were fixed one and for all, a bit like the shasum of some file.

Probably the doc string should be more explicit about what is/is not guaranteed. Is consistency only guaranteed inside the current running emacs thread/process?


Looking at the source code in src/fns.c, it seems that the change of sxhash values was caused by the recent change allowing emacs lisp integers of arbitrary size. These numbers are hashed differently, and hashing itself has to truncate large integers that are now a different data type. (However, I have also found changes in sxhash values between v26.1 and v26.2, before the move to arbitrary large integers in v27, see examples below.)

Looking at the source code, one would think that sxhash values are meant to only depend on the structure of the Lisp object itself, not the hardware or the emacs version, but of course this view may have to yield when the C data type for Lisp objects is modified. Is this what the developers had in mind?


Today the elisp manual reads

— Function: sxhash-equal obj

This function returns a hash code for Lisp object obj. This is an integer which reflects the contents of obj and the other Lisp objects it points to.

If two objects obj1 and obj2 are equal, then (sxhash-equal obj1) and (sxhash-equal obj2) are the same integer.

If the two objects are not equal, the values returned by sxhash-equal are usually different, but not always; once in a rare while, by luck, you will encounter two distinct-looking objects that give the same result from sxhash-equal.

This does not address the consistency issue, and is a bit misleading about collisions with its "once in a rare while, by luck".  Without trying, I stumbled upon many collisions in my test. Is sxhash really intended to avoid collisions?

Many thanks for your time,

--philippe


Here is what I obtained by running emacs -Q --batch -l  sxhash-report.el (.el file attached below) on 3 different versions of emacs:

in GNU Emacs 26.1 (build 1, x86_64-apple-darwin18.6.0, NS appkit-1671.50 Version 10.14.5 (Build 18F132))
 of 2019-05-28
1 is hash for 1
0 is hash for 0
0 is hash for nil
11484 is hash for t
0 is hash for (0 . 0)
0 is hash for ""
2030 is hash for "ab"
517809 is hash for "abcd"
93076 is hash for (1 2 3 4 5)
1608904915 is hash for (a b c)
5457926 is hash for ab
5457938 is hash for abcd
5457950 is hash for :misc

in GNU Emacs 26.2 (build 1, x86_64-apple-darwin18.2.0, NS appkit-1671.20 Version 10.14.3 (Build 18D109))
 of 2019-04-13
1 is hash for 1
0 is hash for 0
0 is hash for nil
11616 is hash for t
0 is hash for (0 . 0)
0 is hash for ""
2030 is hash for "ab"
517809 is hash for "abcd"
93076 is hash for (1 2 3 4 5)
6307170282 is hash for (a b c)
10938216 is hash for ab
10938228 is hash for abcd
10938240 is hash for :misc

in GNU Emacs 27.0.50 (build 1, x86_64-apple-darwin18.6.0, NS appkit-1671.50 Version 10.14.5 (Build 18F132))
 of 2019-06-06
1 is hash for 1
0 is hash for 0
0 is hash for nil
11784 is hash for t
0 is hash for (0 . 0)
0 is hash for ""
2030 is hash for "ab"
517809 is hash for "abcd"
93076 is hash for (1 2 3 4 5)
238969789 is hash for (a b c)
35167434275996 is hash for ab
35167434276008 is hash for abcd
35167434276020 is hash for :misc


--------------EFAE2BCBF3FC18D245CE6B76-- --------------3F007F78E01D37D1420C6365 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="sxhash-report.el" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="sxhash-report.el" Ozs7OyBGaWxlOiBzeGhhc2gtcmVwb3J0LmVsIC0qLSBsZXhpY2FsLWJpbmRpbmc6IHQ7IC0q LQoKKGRlZnZhciBzb21lLWxpc3Atb2JqZWN0cwogIChsaXN0IDEgMCBuaWwgdCAKCShjb25z IDAgMCkKCSIiICJhYiIgImFiY2QiCgknKDEgMiAzIDQgNSkgICcoYSBiIGMpCgknYWIgJ2Fi Y2QgOm1pc2MKCSkpCgoKKHByaW5jIChmb3JtYXQgImluICVzXG4iIChlbWFjcy12ZXJzaW9u KSkpCihkb2xpc3QgKG9iaiBzb21lLWxpc3Atb2JqZWN0cykKICAocHJpbmMgKGZvcm1hdCAi JVMgaXMgaGFzaCBmb3IgJVNcbiIgKHN4aGFzaC1lcXVhbCBvYmopIG9iaikpKQo= --------------3F007F78E01D37D1420C6365--