all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Christoph Buck <dev@icepic.de>
To: Zack Weinberg <zack@owlfolio.org>,
	Efraim Flashner <efraim@flashner.co.il>
Cc: help-guix@gnu.org
Subject: Re: ABI mismatch on boot on arm32 system
Date: Wed, 06 Nov 2024 11:25:29 +0100	[thread overview]
Message-ID: <87jzdg4q1y.fsf@icepic.de> (raw)
In-Reply-To: <87plnh67w5.fsf@icepic.de> (Christoph Buck's message of "Wed, 30 Oct 2024 14:24:26 +0100")

Hi Guix!

So i looked into the guile source code and, as expected, the `scm_hash`
function (see hash.c in guile) uses `unsigned long` wich is 8 bytes on
x64 and 4 bytes on arm32/i868. If `string-hash` is called with the size
parameter `n`, the hash value is limited to size by calculating the
modulo `n` of the hash value, see scm_ihash in hash.c:440, namely

> (unsigned long) scm_raw_ihash (obj, 10) % n

(The `10` can be ignored as far as i can tell). Since the hash values
are different on different platforms the modulo is different as well.

However, if one steps through the call stack of `string-hash` you can
see that the actual hash value is calculated by the
`JENKINS_LOOKUP3_HASHWORD2` macro, which contains are rather
interesting comment and a possible workaround for the abi problem,
namely

--8<---------------cut here---------------start------------->8---
/* Scheme can access symbol-hash, which exposes this value.  For    \
   cross-compilation reasons, we ensure that the high 32 bits of    \
   the hash on a 64-bit system are equal to the hash on a 32-bit    \
   system.  The low 32 bits just add more entropy.  */              \
if (sizeof (ret) == 8)                                              \
    ret = (((unsigned long) c) << 32) | b;                          \
else                                                                \
    ret = c;                                                        \
--8<---------------cut here---------------end--------------->8---

in hash.c:82.

Meaning, if executed on a x64 platform, the higher 32bit of the
resulting 64bit hash result are equal to the hash value on a 32bit
platform. A simple test case in c++ looks like this:

--8<---------------cut here---------------start------------->8---
int main(int args, char** argv)
{
    scm_init_guile();
    auto strToHash = scm_from_locale_string ("((device) (mount-point))");
    auto maxULong = scm_from_ulong(ULONG_MAX);
    auto hashResult = scm_hash(strToHash,maxULong);
    auto hashResultUL = scm_to_ulong(hashResult);
    std::cout << "Max ULONG_MAX: " << ULONG_MAX <<std::endl;
    std::cout << "Original hashResult ulong: " << hashResultUL << std::endl;

    if(sizeof(hashResultUL) == 8)
    {
        std::cout << "Corrected for 32bit: " << (hashResultUL >> 32) << std::endl;
    }
}
--8<---------------cut here---------------end--------------->8---

which results on x64 in

> Max ULONG_MAX: 18446744073709551615
> Original hashResult ulong: 10454028974864831
> Corrected for 32bit: 2434018

and on arm32 to

> Max ULONG_MAX: 4294967295
> Original hashResult ulong: 2434018

This suggest the following workaround. Always limit the hash size to
32bit even if executed on a 64bit platform (or to be more specific a
platform where ulong is 8bytes big). Do this by right shift the hash
value 32bits and don't rely on the size parameter of the `string-hash`
function.

In code it could look something like this

--8<---------------cut here---------------start------------->8---
(define (compute-abi-cookie field-specs)
    ;; Compute an "ABI cookie" for the given FIELD-SPECS.  We use
    ;; 'string-hash' because that's a better hash function that 'hash' on a
    ;; list of symbols.
    (let ((hash
           (syntax-case field-specs ()
             (((field get properties ...) ...)
              (let ((hash-value (string-hash (object->string
                                              (syntax->datum #'((field properties ...) ...))))))
                (if (= (native-word-size) 8)
                    (ash hash-value -32)
                    hash-value)))))
          (fd (syntax-case field-specs ()
                (((field get properties ...) ...)
                 (object->string
                  (syntax->datum #'((field properties ...) ...)))))))
      
      (format #t "Compute-abi-cookie: ~a~%" hash)
      hash))
--8<---------------cut here---------------end--------------->8---

where `native-word-size` is define by 

--8<---------------cut here---------------start------------->8---
(define (native-word-size)
  ((@ (system foreign) sizeof) '*))
--8<---------------cut here---------------end--------------->8---

(taken from `cross-compilation.test`). There might be a cleaner way to
formulate this, but you get the point.

This seems to work for all combinations on my machine. I tested
x64 -> arm, x64 -> i868, i868 -> x64...

I can only think of two drawbacks.

1) Lost entropy on 64 bit machines
2) Abi break because on new compilation the hash values on 64bit
   platforms will change.

1) is imho irrelevant, because it is not cryptophically important. For
2) i am not sure how important this is.

Any thoughts on this?

Might this be something worth fixing and sending a patch in?

Best regard

Christoph

-- 
Best regards

Christoph


  reply	other threads:[~2024-11-06 11:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
2024-10-16 20:05 ` Richard Sent
2024-10-20 15:15   ` Christoph Buck
2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
2024-10-20 15:23 ` Christoph Buck
2024-10-20 15:39   ` Zack Weinberg
2024-10-20 17:24     ` Christoph Buck
2024-10-21  9:55       ` Christoph Buck
2024-10-29 17:11         ` Christoph Buck
2024-10-30  7:09           ` Efraim Flashner
2024-10-30 13:24             ` Christoph Buck
2024-11-06 10:25               ` Christoph Buck [this message]
2024-11-11  7:47                 ` Christoph Buck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jzdg4q1y.fsf@icepic.de \
    --to=dev@icepic.de \
    --cc=efraim@flashner.co.il \
    --cc=help-guix@gnu.org \
    --cc=zack@owlfolio.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.