all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* ABI mismatch on boot on arm32 system
@ 2024-10-16 10:11 Christoph Buck
  2024-10-16 20:05 ` Richard Sent
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Christoph Buck @ 2024-10-16 10:11 UTC (permalink / raw)
  To: help-guix

Hi!

Currently i am trying to create an guix image which will boot on
embedded imx6 arm32 board. Following the guix manual, i was able to
create such an image. This involved adding a custom uboot version and a
kernel with custom definition file. If flashed on an sdcard, the uboot
runs and the kernel boots. However, early on boot (presumably on
executing initrd.cpio.gz), an `record-abi-mismatch-error` is thrown and
a guix recovery repl is opened

> Use 'gnu.repl' for an initrd REPL.

> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> Throw to key `record-abi-mismatch-error' with args `(abi-check "~a: record ABI mismatch; recompilation needed" (#<record-type <file-system>>) ())'.

> Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
> GNU Guile 3.0.9
> Copyright (C) 1995-2023 Free Software Foundation, Inc.

> Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
> This program is free software, and you are welcome to redistribute it
> under certain conditions; type `,show c' for details.

Unfortunatly i have absolutely no clue what the problem could be. Could
it be that the image was compiled with a differnt guile version than
executet on the image? Could this explain the abi mismatch in the
`file-system` record?

Googling for the error i found the following post on this mailing list:

> https://lists.gnu.org/archive/html/help-guix/2023-02/msg00147.html

Seems like Maxim Cournoyer had the same problem with a board with the
same socc (imx6). Unfortunatly no followup. (I mailed him in private in
case he come up with a solution. If so, i will document it here, so that
the next unlucky soul running into this error can find the solution).

I cross-compile the image on x64 with 

> guix build -f custom-board.scm --target=arm-linux-gnueabihf -v3 -c2 -M2 -K --no-grafts

where `custom-board.scm` is my image definition (i can share it if
helpfull). Option `--no-grafts` is needed due to

> https://issues.guix.gnu.org/66866

For tips on how to debug this issue further i would be very
grateful. Feels like i am very close to a bootable image.

-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
@ 2024-10-16 20:05 ` Richard Sent
  2024-10-20 15:15   ` Christoph Buck
  2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
  2024-10-20 15:23 ` Christoph Buck
  2 siblings, 1 reply; 16+ messages in thread
From: Richard Sent @ 2024-10-16 20:05 UTC (permalink / raw)
  To: Christoph Buck; +Cc: help-guix

Hi Christoph,

> Currently i am trying to create an guix image which will boot on
> embedded imx6 arm32 board. Following the guix manual, i was able to
> create such an image. This involved adding a custom uboot version and a
> kernel with custom definition file. If flashed on an sdcard, the uboot
> runs and the kernel boots. However, early on boot (presumably on
> executing initrd.cpio.gz), an `record-abi-mismatch-error` is thrown and
> a guix recovery repl is opened
>
>> Use 'gnu.repl' for an initrd REPL.
>
>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> Throw to key `record-abi-mismatch-error' with args `(abi-check "~a: record ABI mismatch; recompilation needed" (#<record-type <file-system>>) ())'.

Your issue sounds very similar to the one described in
https://issues.guix.gnu.org/61173#4.

The TL;DR (although I encourage you to read it!) is that you need the
CONFIG_BINFMT_MISC Linux kernel compilation option set, but when you use
a linux-libre-*-generic kernel that option is NOT set. If you're using
the qemu-binfmt-service, you'll fail to boot and the error you posted
will occur before you're dropped into a REPL.

Unfortunately services do not currently have any mechanism to require or
check kernel config options.

Not knowing your operating-system declaration I can't tell for sure if
that is what's going on, but I suspect what I described or something
similar is the case.

If you confirm this is in fact the problem, feel free to leave a comment
on the issue!

Best of luck.

-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
  2024-10-16 20:05 ` Richard Sent
@ 2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
  2024-10-20 15:23 ` Christoph Buck
  2 siblings, 0 replies; 16+ messages in thread
From: Denis 'GNUtoo' Carikli @ 2024-10-18 20:58 UTC (permalink / raw)
  To: Christoph Buck; +Cc: help-guix

[-- Attachment #1: Type: text/plain, Size: 1489 bytes --]

On Wed, 16 Oct 2024 12:11:30 +0200
Christoph Buck <dev@icepic.de> wrote:

> Hi!
Hi,

> Currently i am trying to create an guix image which will boot on
> embedded imx6 arm32 board. Following the guix manual, i was able to
> create such an image. This involved adding a custom uboot version and
> a kernel with custom definition file. If flashed on an sdcard, the
> uboot runs and the kernel boots. However, early on boot (presumably on
> executing initrd.cpio.gz), an `record-abi-mismatch-error` is thrown
> and a guix recovery repl is opened
> 
> > Use 'gnu.repl' for an initrd REPL.
> 
> > ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> > Throw to key `record-abi-mismatch-error' with args `(abi-check "~a:
> > record ABI mismatch; recompilation needed" (#<record-type
> > <file-system>>) ())'.
There is also the option to try to bisect the issues (there might be
more than one).

We now have a u-boot-qemu-arm package so you could for instance start
with arm64 (with u-boot-qemu-arm64 and a system definition that you
create or reuse+modify) and manage to boot a system with qemu.

Then once you managed to boot an arm64 system, you could try to
reproduce it for 32bit arm with an older guix revision (and possibly a
recent u-boot-qemu-arm that doesn't change) and then start bisecting.

I tried to do that a long time ago but I don't have fast computers and
so at some point I gave up and I never found the time to get back to it.

Denis.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-16 20:05 ` Richard Sent
@ 2024-10-20 15:15   ` Christoph Buck
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Buck @ 2024-10-20 15:15 UTC (permalink / raw)
  To: Richard Sent; +Cc: help-guix


Richard Sent <richard@freakingpenguin.com> writes:

> Hi Christoph,

Hi Richard!
>
> The TL;DR (although I encourage you to read it!) is that you need the
> CONFIG_BINFMT_MISC Linux kernel compilation option set, but when you use
> a linux-libre-*-generic kernel that option is NOT set. If you're using
> the qemu-binfmt-service, you'll fail to boot and the error you posted
> will occur before you're dropped into a REPL.

I use a custom kernel modified straight from kernel.org and indeed i didn't
enable the `CONFIG_BINFMT_MISC` setting. However, unfortunately enabling
this option **does not** solve my problem.


-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
  2024-10-16 20:05 ` Richard Sent
  2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
@ 2024-10-20 15:23 ` Christoph Buck
  2024-10-20 15:39   ` Zack Weinberg
  2 siblings, 1 reply; 16+ messages in thread
From: Christoph Buck @ 2024-10-20 15:23 UTC (permalink / raw)
  To: help-guix

Hi!

I played around a little bit more and i can indeed now successfully
boot. Instead of using cross-compilation (cli option
`--target=arm-linux-gnueabihf`) i created a build using qemu emulation
(cli option `--system=armhf-linux`). This takes ages to build, but the
resulting images is bootable without abi error. Unfortunatly this is not
a real fix because it is too slow to be a practical workaround (at least
for me).

I digged a little deeper and this is what i found out so far. In case i
am running off in a totally wrong direction, someone with more clue than
me should please stop me ;)

I think something goes wrong during crosscompilation of the guile
modules in package `module-import-compiled`. The abi error is thrown
early on boot in the `initrd.cpio.gz` ramdisk. I extracted and
decompressed the ramdisk from both builds (crosscompilation and qemu)
which contain the `module-import-compiled` guile modules. I would expect
that the *.go files from the `module-import-compiled` package of both
ramdisks are binary identical but they have different md5sums. Lets take
for example `file-systems.go`, which cause the abi error.

--8<---------------cut here---------------start------------->8---
local@host:crosscompilation-initrd/gnu/store/5ffy1h3fgikzdhfz4nkchxnibbri4ain-module-import-compiled/gnu/system$
md5sum file-systems.go
7839e9c7a0c7c6c8d9ea45566ab9f61e  file-systems.go
--8<---------------cut here---------------end--------------->8---
vs
--8<---------------cut here---------------start------------->8---

local@host:qemu-initrd/gnu/store/hvgj80xqf70mvx460pnvwmi87kqqn2bj-module-import-compiled/gnu/system$
md5sum file-systems.go
a43a7e939ae9d0cc1ce30726cb51d6d4  file-systems.go
--8<---------------cut here---------------end--------------->8---

Additional it looks like different symbols are exported depending if
cross-compilation or qemu was used. This is at least what `readelf -s file-system.go`
reports. I naively thought these files should be identical.

Additional i saw these strange errors in the build log during
crosscompilation

--8<---------------cut here---------------start------------->8---
;;; WARNING: loading compiled file
/gnu/store/5ffy1h3fgikzdhfz4nkchxnibbri4ain-module-import-compiled/gnu/build/file-systems.go
failed:
;;; In procedure load-thunk-from-memory: ELF file does not have native word size
;;; WARNING: loading compiled file
/gnu/store/5ffy1h3fgikzdhfz4nkchxnibbri4ain-module-import-compiled/gnu/system/uuid.go
failed:
;;; In procedure load-thunk-from-memory: ELF file does not have native word size
;;; WARNING: loading compiled file
/gnu/store/5ffy1h3fgikzdhfz4nkchxnibbri4ain-module-import-compiled/gnu/system/file-systems.go
failed:
;;; In procedure load-thunk-from-memory: ELF file does not have native word size
--8<---------------cut here---------------end--------------->8---

This also looks suspicious. These stem from the `check_elf_header`
function in guile. Guile warns that the class type in the elf header is
32bits if executed in a cross-compiliation context on an x64 system. But
until now i couldn't figure out, if i can ignore these warnings or if
they might cause a problem.

-- 
Best regards

Christoph

I did some further digging into this issue.
it warns if the class type in the elf header is
32bit.

Christoph Buck <dev@icepic.de> writes:

> Hi!
>
> Currently i am trying to create an guix image which will boot on
> embedded imx6 arm32 board. Following the guix manual, i was able to
> create such an image. This involved adding a custom uboot version and a
> kernel with custom definition file. If flashed on an sdcard, the uboot
> runs and the kernel boots. However, early on boot (presumably on
> executing initrd.cpio.gz), an `record-abi-mismatch-error` is thrown and
> a guix recovery repl is opened
>
>> Use 'gnu.repl' for an initrd REPL.
>
>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> Throw to key `record-abi-mismatch-error' with args `(abi-check "~a: record ABI mismatch; recompilation needed" (#<record-type <file-system>>) ())'.
>
>> Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
>> GNU Guile 3.0.9
>> Copyright (C) 1995-2023 Free Software Foundation, Inc.
>
>> Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
>> This program is free software, and you are welcome to redistribute it
>> under certain conditions; type `,show c' for details.
>
> Unfortunatly i have absolutely no clue what the problem could be. Could
> it be that the image was compiled with a differnt guile version than
> executet on the image? Could this explain the abi mismatch in the
> `file-system` record?
>
> Googling for the error i found the following post on this mailing list:
>
>> https://lists.gnu.org/archive/html/help-guix/2023-02/msg00147.html
>
> Seems like Maxim Cournoyer had the same problem with a board with the
> same socc (imx6). Unfortunatly no followup. (I mailed him in private in
> case he come up with a solution. If so, i will document it here, so that
> the next unlucky soul running into this error can find the solution).
>
> I cross-compile the image on x64 with 
>
>> guix build -f custom-board.scm --target=arm-linux-gnueabihf -v3 -c2 -M2 -K --no-grafts
>
> where `custom-board.scm` is my image definition (i can share it if
> helpfull). Option `--no-grafts` is needed due to
>
>> https://issues.guix.gnu.org/66866
>
> For tips on how to debug this issue further i would be very
> grateful. Feels like i am very close to a bootable image.

-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-20 15:23 ` Christoph Buck
@ 2024-10-20 15:39   ` Zack Weinberg
  2024-10-20 17:24     ` Christoph Buck
  0 siblings, 1 reply; 16+ messages in thread
From: Zack Weinberg @ 2024-10-20 15:39 UTC (permalink / raw)
  To: help-guix

On Sun, Oct 20, 2024, at 11:23 AM, Christoph Buck wrote:
> I think something goes wrong during crosscompilation of the guile
> modules in package `module-import-compiled`. The abi error is thrown
> early on boot in the `initrd.cpio.gz` ramdisk. I extracted and
> decompressed the ramdisk from both builds (crosscompilation and qemu)
> which contain the `module-import-compiled` guile modules. I would expect
> that the *.go files from the `module-import-compiled` package of both
> ramdisks are binary identical but they have different md5sums. Lets take
> for example `file-systems.go`, which cause the abi error.
[...]

Can you show us the *complete and unedited* output of `readelf -hlSd file-systems.go` from both the working and the broken ramdisk, please?

zw


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-20 15:39   ` Zack Weinberg
@ 2024-10-20 17:24     ` Christoph Buck
  2024-10-21  9:55       ` Christoph Buck
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Buck @ 2024-10-20 17:24 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: help-guix

[-- Attachment #1: Type: text/plain, Size: 784 bytes --]

Hi Zack!

> Can you show us the *complete and unedited* output of `readelf -hlSd
> file-systems.go` from both the working and the broken ramdisk, please?
>

Sure. See the attachments of this mail.

But i just saw that i made a mistake and compaired the module
`qemu/gnu/build/file-systems.go` to
`cross/gnu/system/file-systems.go`.

The md5sum between `qemu/gnu/system/file-systems.go` and
`cross/gnu/system/file-systems.go` is still different, but the exported
symbols are the same (see attachemt). The only difference i now can see
is in the `Start of section headers` (426376 vs 426352) and different
addresses in the subsequent output of `readelf`. Are theses expected to
be deterministic/equal?

Sorry for the confusion.

> zw

Greetings Christoph 

-- 
Best regards

Christoph


[-- Attachment #2: qemu.readelf --]
[-- Type: application/octet-stream, Size: 3734 bytes --]

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 ff 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            <unknown: ff>
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           None
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          426352 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         20
  Section header string table index: 17

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .guile.procprops  PROGBITS        00000000 06c790 000020 00      0   0  8
  [ 2] .rodata           PROGBITS        0003fe38 03fe38 0074ac 00   A  0   0  8
  [ 3] .data             PROGBITS        00050000 050000 018170 00  WA  0   0  8
  [ 4] .rtl-text         PROGBITS        00000098 000098 03fda0 00   A  0   0  8
  [ 5] .dynamic          DYNAMIC         000472e8 0472e8 000030 00   A  0   0  8
  [ 6] .strtab           STRTAB          00000000 070b98 000666 00      0   0  8
  [ 7] .symtab           SYMTAB          00000000 06f118 001a80 10      6   0  8
  [ 8] .guile.ariti[...] STRTAB          00000000 071200 00085a 00      0   0  8
  [ 9] .guile.arities    PROGBITS        00000000 068490 004253 00      8   0  8
  [10] .guile.docst[...] STRTAB          00000000 071a60 0008ac 00      0   0  8
  [11] .guile.docstrs    PROGBITS        00000000 06c6e8 0000a8 00     10   0  8
  [12] .debug_info       PROGBITS        00000000 06c7b0 000b6f 00      0   0  8
  [13] .debug_abbrev     PROGBITS        00000000 06d320 000041 00      0   0  8
  [14] .debug_str        PROGBITS        00000000 06d368 000672 00      0   0  8
  [15] .debug_loc        PROGBITS        00000000 06d9e0 000000 00      0   0  8
  [16] .debug_line       PROGBITS        00000000 06d9e0 001733 00      0   0  8
  [17] .shstrtab         STRTAB          00000000 072310 0000d3 00      0   0  8
  [18]                   PROGBITS        00000000 000000 000094 00   A  0   0  8
  [19]                   NULL            00000000 068170 000000 00      0   0  0
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x47318 0x47318 R   0x10000
  LOAD           0x050000 0x00050000 0x00050000 0x18170 0x18170 RW  0x10000
  DYNAMIC        0x0472e8 0x000472e8 0x000472e8 0x00030 0x00030 R   0x8

 Section to Segment mapping:
  Segment Sections...
   00     .rodata .rtl-text .dynamic  
   01     .data 
   02     .dynamic 

Dynamic section at offset 0x472e8 contains 6 entries:
  Tag        Type                         Name/Value
 0x37146003 (<unknown>: 37146003)        0x3000006
 0x37146002 (<unknown>: 37146002)        0x98
 0x37146000 (<unknown>: 37146000)        0x50000
 0x37146001 (<unknown>: 37146001)        0x18170
 0x0000000c (INIT)                       0x1b918
 0x00000000 (NULL)                       0x0

[-- Attachment #3: cross.readelf --]
[-- Type: application/octet-stream, Size: 3735 bytes --]

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 ff 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            <unknown: ff>
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           None
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          426376 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         20
  Section header string table index: 17

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .guile.procprops  PROGBITS        00000000 06c7a8 000020 00      0   0  8
  [ 2] .rodata           PROGBITS        0003fe60 03fe60 008b64 00   A  0   0  8
  [ 3] .data             PROGBITS        00050000 050000 018188 00  WA  0   0  8
  [ 4] .rtl-text         PROGBITS        00000098 000098 03fdc4 00   A  0   0  8
  [ 5] .dynamic          DYNAMIC         000489c8 0489c8 000030 00   A  0   0  8
  [ 6] .strtab           STRTAB          00000000 070ba8 000666 00      0   0  8
  [ 7] .symtab           SYMTAB          00000000 06f128 001a80 10      6   0  8
  [ 8] .guile.ariti[...] STRTAB          00000000 071210 000b62 00      0   0  8
  [ 9] .guile.arities    PROGBITS        00000000 0684a8 004255 00      8   0  8
  [10] .guile.docst[...] STRTAB          00000000 071d78 0008ac 00      0   0  8
  [11] .guile.docstrs    PROGBITS        00000000 06c700 0000a8 00     10   0  8
  [12] .debug_info       PROGBITS        00000000 06c7c8 000b67 00      0   0  8
  [13] .debug_abbrev     PROGBITS        00000000 06d330 000041 00      0   0  8
  [14] .debug_str        PROGBITS        00000000 06d378 000672 00      0   0  8
  [15] .debug_loc        PROGBITS        00000000 06d9f0 000000 00      0   0  8
  [16] .debug_line       PROGBITS        00000000 06d9f0 001733 00      0   0  8
  [17] .shstrtab         STRTAB          00000000 072628 0000d3 00      0   0  8
  [18]                   PROGBITS        00000000 000000 000094 00   A  0   0  8
  [19]                   NULL            00000000 068188 000000 00      0   0  0
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclupde),
  p (processor specific)

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x489f8 0x489f8 R   0x10000
  LOAD           0x050000 0x00050000 0x00050000 0x18188 0x18188 RW  0x10000
  DYNAMIC        0x0489c8 0x000489c8 0x000489c8 0x00030 0x00030 R   0x8

 Section to Segment mapping:
  Segment Sections...
   00     .rodata .rtl-text .dynamic  
   01     .data 
   02     .dynamic 

Dynamic section at offset 0x489c8 contains 6 entries:
  Tag        Type                         Name/Value
 0x37146003 (<unknown>: 37146003)        0x3000006
 0x37146002 (<unknown>: 37146002)        0x98
 0x37146000 (<unknown>: 37146000)        0x50000
 0x37146001 (<unknown>: 37146001)        0x18188
 0x0000000c (INIT)                       0x1b918
 0x00000000 (NULL)                       0x0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-20 17:24     ` Christoph Buck
@ 2024-10-21  9:55       ` Christoph Buck
  2024-10-29 17:11         ` Christoph Buck
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Buck @ 2024-10-21  9:55 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: help-guix

Hi!

I can now reproduce this error locally. Consider the following file:

--8<---------------cut here---------------start------------->8---
(define-module (abi-error)
  #:use-module (gnu system file-systems)
  #:export (bla test))

(define bla (file-system
              (device (file-system-label "my-root"))
              (mount-point "/")
              (type "ext4")))
--8<---------------cut here---------------end--------------->8---

If crosscompiled on x64 to arm32 using (this is what `compiled-modules`
in gexp.scm does, at least as far i can tell)


--8<---------------cut here---------------start------------->8---
(use-modules (system base compile))
(use-modules (system base target))

(with-target "arm-linux-gnueabihf"
  (lambda () (compile-file "abi-error.scm" #:output-file "abi-error.go")))
--8<---------------cut here---------------end--------------->8---

loading the module in an emulated arm32 guile repl fails with an abi
error:

--8<---------------cut here---------------start------------->8---
icepic@G16-Buck:~/guix$ guix shell --container --system=armhf-linux guix guile file bash which coreutils
icepic@G16-Buck ~/guix [env]$ guix repl -L .
GNU Guile 3.0.9
Copyright (C) 1995-2023 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guix-user)> ,use (abi-error)
While executing meta-command:
Throw to key `record-abi-mismatch-error' with args `(abi-check "~a: record ABI mismatch; recompilation needed" (#<record-type <file-system>>) ())'
--8<---------------cut here---------------end--------------->8---

But if compiled direclty in qemu on arm32 it works without abi-error,
see 

--8<---------------cut here---------------start------------->8---
icepic@G16-Buck:~/guix$ guix shell --container --system=armhf-linux  guix guile file bash which coreutils
icepic@G16-Buck ~/guix [env]$ guix repl compile.scm
icepic@G16-Buck ~/guix [env]$ guix repl -L .
GNU Guile 3.0.9
Copyright (C) 1995-2023 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guix-user)> ,use (abi-error)
scheme@(guix-user)> bla
$1 = #<<file-system> device: #<file-system-label "my-root"> mount-point: "/" type: "ext4" flags: () options: #f mount?: #t mount-may-fail?: #f needed-for-boot?: #f check?: #t skip-check-if-clean?: #t repair: preen create-mount-point?: #f dependencies: () shepherd-requirements: () location: ((filename . "abi-error.scm") (line . 4) (column . 12))>
--8<---------------cut here---------------end--------------->8---

where `compile.scm` is simply

--8<---------------cut here---------------start------------->8---
(use-modules (system base compile))
(use-modules (system base target))

(compile-file "abi-error.scm" #:output-file "abi-error.go")
--8<---------------cut here---------------end--------------->8---

This is not what one expect, is it?

Interestingly it works if `aarch64-linux-gnu` instead of
`arm-linux-gnueabihf` is used.

So... it looks like there is a bug in cross-compilation support for
arm32 in guile? 


-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-21  9:55       ` Christoph Buck
@ 2024-10-29 17:11         ` Christoph Buck
  2024-10-30  7:09           ` Efraim Flashner
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Buck @ 2024-10-29 17:11 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: help-guix

Hi!

In case anybody is reading along here. I digged deeper and found
something rather interessting :P

From my understanding by reading through the records.scm from guix (and
please note that im a total scheme newbee), the abi check works by
calculation a string-hash over the record field names and storing the
hash as hidden field in the record. During runtime this string-hash is
computed again and compared to the compiled hash. If they don't
match, the abi is broken because a field was added or removed.

The hash is calculated in the `compute-abi-cookie` procedure in the
records.scm.

I extended the procedure with the following debug outputs

--8<---------------cut here---------------start------------->8---
(define (compute-abi-cookie field-specs)
  ;; Compute an "ABI cookie" for the given FIELD-SPECS.  We use
  ;; 'string-hash' because that's a better hash function that 'hash' on a
  ;; list of symbols.
  (let ((hash
         (syntax-case field-specs ()
           (((field get properties ...) ...)
            (string-hash (object->string
                          (syntax->datum #'((field properties ...) ...)))
                         ;; (bla)
                         (cond-expand
                          (guile-3 (target-most-positive-fixnum))
                          (else most-positive-fixnum))
                         ))))
        (fd (syntax-case field-specs ()
              (((field get properties ...) ...)
               (object->string
                (syntax->datum #'((field properties ...) ...)))))))
    
    (format #t "Compute-abi-cookie: ~a~%" hash)
    (format #t "field-specs: ~a~%" field-specs)
    (format #t "fd: ~a~%" fd)
    (format #t "hashsize ~a~%: " (cond-expand
                                  (guile-3 (target-most-positive-fixnum))
                                  (else most-positive-fixnum)))
    hash))
--8<---------------cut here---------------end--------------->8---

Now, if i compile a simple test record

--8<---------------cut here---------------start------------->8---

(define-record-type* <test-system> test-system
  make-test-system
  test-system?
  (device test-system-device) 
  (mount-point test-system-mount-point))

(define test-abi (test-system
                  (device "my-root")
                  (mount-point "/")))
                  
--8<---------------cut here---------------end--------------->8---

on x64 using guile cross-compiling (in a `guix shell --container guix
guile` environment) using the call

--8<---------------cut here---------------start------------->8---
(with-target "arm-linux-gnueabihf" (lambda () (compile-file "test-abi.scm")))
--8<---------------cut here---------------end--------------->8---

the following outputs are generated:

--8<---------------cut here---------------start------------->8---
Compute-abi-cookie: 212719825
field-specs: ((#<syntax device> #<syntax test-system-device>) (#<syntax mount-point> #<syntax test-system-mount-point>))
fd: ((device) (mount-point))
hashsize 536870911
--8<---------------cut here---------------end--------------->8---

The abi cookie is computed by calculating the string hash over
"((device) (mount-point))" while limiting the size of the hash by
536870911. One can manually check this by calling

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-hash "((device) (mount-point))" 536870911)
$1 = 212719825
--8<---------------cut here---------------end--------------->8---

in the repl. 

Now, if i do the same in a qemu arm32 environment (using `guix shell
--container guix guile --system=armhf-linux`), a different hash is
printed, even though the hash is calculated over the same string, see:

--8<---------------cut here---------------start------------->8---
Compute-abi-cookie: 2434018
field-specs: ((#<syntax device> #<syntax test-system-device>) (#<syntax mount-point> #<syntax test-system-mount-point>))
fd: ((device) (mount-point))
hashsize 536870911
--8<---------------cut here---------------end--------------->8---

You can verify this in the repl as well:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-hash "((device) (mount-point))" 536870911)
$1 = 2434018
--8<---------------cut here---------------end--------------->8---

My first intuition after seeing the source of `compute-abi-cookie` was,
that maybe the `target-most-positive-fixnum` results in an wrong value
when called in a cross-compile context. But as you can see, this is not
the case. Instead, the `string-hash` calculates a different hash
even thought the input values are the same.

Now, i am not even sure if one can expect that hash functions running on
different architectures result in the same hash if the input is the
same. If not, then the implementation in guix record.scm would be
buggy. If one expects that the hash of `string-hash` for the same input
must be the same regardless of the architecture, then this would hint to
a bug in the `string-hash` function in guile for arm32.

Any inputs and thoughts regarding this issue would be appreciated.

Greetings

Christoph 



-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-29 17:11         ` Christoph Buck
@ 2024-10-30  7:09           ` Efraim Flashner
  2024-10-30 13:24             ` Christoph Buck
  0 siblings, 1 reply; 16+ messages in thread
From: Efraim Flashner @ 2024-10-30  7:09 UTC (permalink / raw)
  To: Christoph Buck; +Cc: Zack Weinberg, help-guix

[-- Attachment #1: Type: text/plain, Size: 6043 bytes --]

On Tue, Oct 29, 2024 at 06:11:27PM +0100, Christoph Buck wrote:
> Hi!
> 
> In case anybody is reading along here. I digged deeper and found
> something rather interessting :P

Thank you! From where I'm sitting it's much easier (for me) to suggest
things than to try and setup your environment.

> From my understanding by reading through the records.scm from guix (and
> please note that im a total scheme newbee), the abi check works by
> calculation a string-hash over the record field names and storing the
> hash as hidden field in the record. During runtime this string-hash is
> computed again and compared to the compiled hash. If they don't
> match, the abi is broken because a field was added or removed.
> 
> The hash is calculated in the `compute-abi-cookie` procedure in the
> records.scm.
> 
> I extended the procedure with the following debug outputs
> 
> --8<---------------cut here---------------start------------->8---
> (define (compute-abi-cookie field-specs)
>   ;; Compute an "ABI cookie" for the given FIELD-SPECS.  We use
>   ;; 'string-hash' because that's a better hash function that 'hash' on a
>   ;; list of symbols.
>   (let ((hash
>          (syntax-case field-specs ()
>            (((field get properties ...) ...)
>             (string-hash (object->string
>                           (syntax->datum #'((field properties ...) ...)))
>                          ;; (bla)
>                          (cond-expand
>                           (guile-3 (target-most-positive-fixnum))
>                           (else most-positive-fixnum))
>                          ))))
>         (fd (syntax-case field-specs ()
>               (((field get properties ...) ...)
>                (object->string
>                 (syntax->datum #'((field properties ...) ...)))))))
>     
>     (format #t "Compute-abi-cookie: ~a~%" hash)
>     (format #t "field-specs: ~a~%" field-specs)
>     (format #t "fd: ~a~%" fd)
>     (format #t "hashsize ~a~%: " (cond-expand
>                                   (guile-3 (target-most-positive-fixnum))
>                                   (else most-positive-fixnum)))
>     hash))
> --8<---------------cut here---------------end--------------->8---
> 
> Now, if i compile a simple test record
> 
> --8<---------------cut here---------------start------------->8---
> 
> (define-record-type* <test-system> test-system
>   make-test-system
>   test-system?
>   (device test-system-device) 
>   (mount-point test-system-mount-point))
> 
> (define test-abi (test-system
>                   (device "my-root")
>                   (mount-point "/")))
>                   
> --8<---------------cut here---------------end--------------->8---
> 
> on x64 using guile cross-compiling (in a `guix shell --container guix
> guile` environment) using the call
> 
> --8<---------------cut here---------------start------------->8---
> (with-target "arm-linux-gnueabihf" (lambda () (compile-file "test-abi.scm")))
> --8<---------------cut here---------------end--------------->8---
> 
> the following outputs are generated:
> 
> --8<---------------cut here---------------start------------->8---
> Compute-abi-cookie: 212719825
> field-specs: ((#<syntax device> #<syntax test-system-device>) (#<syntax mount-point> #<syntax test-system-mount-point>))
> fd: ((device) (mount-point))
> hashsize 536870911
> --8<---------------cut here---------------end--------------->8---
> 
> The abi cookie is computed by calculating the string hash over
> "((device) (mount-point))" while limiting the size of the hash by
> 536870911. One can manually check this by calling
> 
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (string-hash "((device) (mount-point))" 536870911)
> $1 = 212719825
> --8<---------------cut here---------------end--------------->8---
> 
> in the repl. 
> 
> Now, if i do the same in a qemu arm32 environment (using `guix shell
> --container guix guile --system=armhf-linux`), a different hash is
> printed, even though the hash is calculated over the same string, see:
> 
> --8<---------------cut here---------------start------------->8---
> Compute-abi-cookie: 2434018
> field-specs: ((#<syntax device> #<syntax test-system-device>) (#<syntax mount-point> #<syntax test-system-mount-point>))
> fd: ((device) (mount-point))
> hashsize 536870911
> --8<---------------cut here---------------end--------------->8---
> 
> You can verify this in the repl as well:
> 
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (string-hash "((device) (mount-point))" 536870911)
> $1 = 2434018
> --8<---------------cut here---------------end--------------->8---
> 
> My first intuition after seeing the source of `compute-abi-cookie` was,
> that maybe the `target-most-positive-fixnum` results in an wrong value
> when called in a cross-compile context. But as you can see, this is not
> the case. Instead, the `string-hash` calculates a different hash
> even thought the input values are the same.
> 
> Now, i am not even sure if one can expect that hash functions running on
> different architectures result in the same hash if the input is the
> same. If not, then the implementation in guix record.scm would be
> buggy. If one expects that the hash of `string-hash` for the same input
> must be the same regardless of the architecture, then this would hint to
> a bug in the `string-hash` function in guile for arm32.
> 
> Any inputs and thoughts regarding this issue would be appreciated.
> 

Can you run it again, but with i686 -> armhf, and x86_64 -> i686?

My curiosity includes i686 -> x86_64, but I suspect it won't tell us
anything we won't learn from the previous tests.

-- 
Efraim Flashner   <efraim@flashner.co.il>   רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-30  7:09           ` Efraim Flashner
@ 2024-10-30 13:24             ` Christoph Buck
  2024-11-06 10:25               ` Christoph Buck
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Buck @ 2024-10-30 13:24 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: help-guix



Efraim Flashner <efraim@flashner.co.il> writes:

> Can you run it again, but with i686 -> armhf, and x86_64 -> i686?
>

Hi Efraim!

Sure. No problem. Here we go:

cross compiled x86_64/i686 = 212719825 hash vs qemu i686  = 2434018 hash
cross compiled i686/arm* =  2434018 hash vs qemu arm = 2434018 hash

* This combination is run in qemu as well and then cross compiled using
  `with-target` because i don't have a native i686 architecture.

> My curiosity includes i686 -> x86_64, but I suspect it won't tell us
> anything we won't learn from the previous tests.

Unfortnuatly this combination crashes:

cross compiled i686/x86_x64* = 

--8<---------------cut here---------------start------------->8---
icepic@G16-Buck ~/guix/raspberry/touchscreen/abi-error/test/test-abi [env]$ ./compile.sh
Backtrace:
In ice-9/boot-9.scm:
  2595:24 19 (call-with-deferred-observers _)
  3424:24 18 (_)
   222:17 17 (map1 (((test-systems))))
  3327:17 16 (resolve-interface (test-systems) #:select _ #:hide _ # ?)
In ice-9/threads.scm:
    390:8 15 (_ _)
In ice-9/boot-9.scm:
  3253:13 14 (_)
In ice-9/threads.scm:
    390:8 13 (_ _)
In ice-9/boot-9.scm:
  3544:20 12 (_)
   2836:4 11 (save-module-excursion _)
  3564:26 10 (_)
In unknown file:
           9 (primitive-load-path "test-systems" #<procedure ad7a0 a?>)
In ice-9/eval.scm:
   721:20  8 (primitive-eval (define-record-type* <test-system> # # ?))
In ice-9/psyntax.scm:
  1229:36  7 (expand-top-sequence (#<syntax:test-systems.scm:10:0 ?>) ?)
  1121:20  6 (parse _ (("placeholder" placeholder)) ((top) #(# # ?)) ?)
  1342:32  5 (syntax-type (#<syntax define-record-type*> #<synta?> ?) ?)
  1562:32  4 (expand-macro #<procedure bcc50 at ice-9/eval.scm:333:?> ?)
In ice-9/eval.scm:
   293:34  3 (_ #(#(#(#(#(#(#(#(#<directory ?> ?) ?) ?) ?) ?) ?) ?) ?))
   298:34  2 (_ #(#(#<directory (abi-records) 178c80>) ((#<s?> ?) ?)))
In unknown file:
           1 (string-hash "((device) (mount-point))" # #<undefined> #)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Value out of range 1 to< 4294967295: 2305843009213693951
icepic@G16-Buck ~/guix/raspberry/touchscreen/abi-error/test/test-abi [env]$
--8<---------------cut here---------------end--------------->8---

Without further looking into this, i would hypothesize that during cross
compilation to x64 on i868, `target-most-positive-fixnum` returns a
number > 32bit which `string-hash` don't handle correct if executed on
i868.

To recap: For me, it looks like as if `string-hash` is not implemented
in a plaform independent way but uses a platform specific data type size
somewhere in its implementation. As long as bit mode(x64,x32) during
cross-compilation and execution is the same, it works (crosscompilation
on x64 and execution on arm64 is ok as well as crosscompilation on i868
and exection on arm32, see above). As soon as the bit mode is different,
the resulting hashes are different or the exeuction crashes.

I can debug into guiles `string-hash` function to find out where the
difference comes from. But i first need to figure out how to setup gdb
correctly.

But for me it sound like that a sensible bugfix for this issue (and i
think it is a bug/issue ;) ) would be using a platform independent
hashing algorithm implemented in plain guile without relying on native c
functions. If during compilation calucataed hashes are stored in the
compiled module one must assume that the hashes are always the same
independent of the architecture. Cross compilation will not work
reliable as soons as this assumption is broken, because compilation and
execution might happend on different platforms.

-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-10-30 13:24             ` Christoph Buck
@ 2024-11-06 10:25               ` Christoph Buck
  2024-11-11  7:47                 ` Christoph Buck
  2024-12-04 11:50                 ` Maxim Cournoyer
  0 siblings, 2 replies; 16+ messages in thread
From: Christoph Buck @ 2024-11-06 10:25 UTC (permalink / raw)
  To: Zack Weinberg, Efraim Flashner; +Cc: help-guix

Hi Guix!

So i looked into the guile source code and, as expected, the `scm_hash`
function (see hash.c in guile) uses `unsigned long` wich is 8 bytes on
x64 and 4 bytes on arm32/i868. If `string-hash` is called with the size
parameter `n`, the hash value is limited to size by calculating the
modulo `n` of the hash value, see scm_ihash in hash.c:440, namely

> (unsigned long) scm_raw_ihash (obj, 10) % n

(The `10` can be ignored as far as i can tell). Since the hash values
are different on different platforms the modulo is different as well.

However, if one steps through the call stack of `string-hash` you can
see that the actual hash value is calculated by the
`JENKINS_LOOKUP3_HASHWORD2` macro, which contains are rather
interesting comment and a possible workaround for the abi problem,
namely

--8<---------------cut here---------------start------------->8---
/* Scheme can access symbol-hash, which exposes this value.  For    \
   cross-compilation reasons, we ensure that the high 32 bits of    \
   the hash on a 64-bit system are equal to the hash on a 32-bit    \
   system.  The low 32 bits just add more entropy.  */              \
if (sizeof (ret) == 8)                                              \
    ret = (((unsigned long) c) << 32) | b;                          \
else                                                                \
    ret = c;                                                        \
--8<---------------cut here---------------end--------------->8---

in hash.c:82.

Meaning, if executed on a x64 platform, the higher 32bit of the
resulting 64bit hash result are equal to the hash value on a 32bit
platform. A simple test case in c++ looks like this:

--8<---------------cut here---------------start------------->8---
int main(int args, char** argv)
{
    scm_init_guile();
    auto strToHash = scm_from_locale_string ("((device) (mount-point))");
    auto maxULong = scm_from_ulong(ULONG_MAX);
    auto hashResult = scm_hash(strToHash,maxULong);
    auto hashResultUL = scm_to_ulong(hashResult);
    std::cout << "Max ULONG_MAX: " << ULONG_MAX <<std::endl;
    std::cout << "Original hashResult ulong: " << hashResultUL << std::endl;

    if(sizeof(hashResultUL) == 8)
    {
        std::cout << "Corrected for 32bit: " << (hashResultUL >> 32) << std::endl;
    }
}
--8<---------------cut here---------------end--------------->8---

which results on x64 in

> Max ULONG_MAX: 18446744073709551615
> Original hashResult ulong: 10454028974864831
> Corrected for 32bit: 2434018

and on arm32 to

> Max ULONG_MAX: 4294967295
> Original hashResult ulong: 2434018

This suggest the following workaround. Always limit the hash size to
32bit even if executed on a 64bit platform (or to be more specific a
platform where ulong is 8bytes big). Do this by right shift the hash
value 32bits and don't rely on the size parameter of the `string-hash`
function.

In code it could look something like this

--8<---------------cut here---------------start------------->8---
(define (compute-abi-cookie field-specs)
    ;; Compute an "ABI cookie" for the given FIELD-SPECS.  We use
    ;; 'string-hash' because that's a better hash function that 'hash' on a
    ;; list of symbols.
    (let ((hash
           (syntax-case field-specs ()
             (((field get properties ...) ...)
              (let ((hash-value (string-hash (object->string
                                              (syntax->datum #'((field properties ...) ...))))))
                (if (= (native-word-size) 8)
                    (ash hash-value -32)
                    hash-value)))))
          (fd (syntax-case field-specs ()
                (((field get properties ...) ...)
                 (object->string
                  (syntax->datum #'((field properties ...) ...)))))))
      
      (format #t "Compute-abi-cookie: ~a~%" hash)
      hash))
--8<---------------cut here---------------end--------------->8---

where `native-word-size` is define by 

--8<---------------cut here---------------start------------->8---
(define (native-word-size)
  ((@ (system foreign) sizeof) '*))
--8<---------------cut here---------------end--------------->8---

(taken from `cross-compilation.test`). There might be a cleaner way to
formulate this, but you get the point.

This seems to work for all combinations on my machine. I tested
x64 -> arm, x64 -> i868, i868 -> x64...

I can only think of two drawbacks.

1) Lost entropy on 64 bit machines
2) Abi break because on new compilation the hash values on 64bit
   platforms will change.

1) is imho irrelevant, because it is not cryptophically important. For
2) i am not sure how important this is.

Any thoughts on this?

Might this be something worth fixing and sending a patch in?

Best regard

Christoph

-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-11-06 10:25               ` Christoph Buck
@ 2024-11-11  7:47                 ` Christoph Buck
  2024-12-04 11:50                 ` Maxim Cournoyer
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Buck @ 2024-11-11  7:47 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Efraim Flashner, help-guix

Hi!

I submitted a patch which fixes the issue. See

> https://issues.guix.gnu.org/74296

Feedback is appreciated!


Christoph

-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-11-06 10:25               ` Christoph Buck
  2024-11-11  7:47                 ` Christoph Buck
@ 2024-12-04 11:50                 ` Maxim Cournoyer
  2024-12-07 13:44                   ` Christoph Buck
  1 sibling, 1 reply; 16+ messages in thread
From: Maxim Cournoyer @ 2024-12-04 11:50 UTC (permalink / raw)
  To: Christoph Buck; +Cc: Zack Weinberg, Efraim Flashner, help-guix

Hi Christoph,

Sorry for my delayed reply.

Christoph Buck <dev@icepic.de> writes:

> Hi Guix!
>
> So i looked into the guile source code and, as expected, the `scm_hash`
> function (see hash.c in guile) uses `unsigned long` wich is 8 bytes on
> x64 and 4 bytes on arm32/i868. If `string-hash` is called with the size
> parameter `n`, the hash value is limited to size by calculating the
> modulo `n` of the hash value, see scm_ihash in hash.c:440, namely
>
>> (unsigned long) scm_raw_ihash (obj, 10) % n
>
> (The `10` can be ignored as far as i can tell). Since the hash values
> are different on different platforms the modulo is different as well.
>
> However, if one steps through the call stack of `string-hash` you can
> see that the actual hash value is calculated by the
> `JENKINS_LOOKUP3_HASHWORD2` macro, which contains are rather
> interesting comment and a possible workaround for the abi problem,
> namely
>
> /* Scheme can access symbol-hash, which exposes this value.  For    \
>    cross-compilation reasons, we ensure that the high 32 bits of    \
>    the hash on a 64-bit system are equal to the hash on a 32-bit    \
>    system.  The low 32 bits just add more entropy.  */              \
> if (sizeof (ret) == 8)                                              \
>     ret = (((unsigned long) c) << 32) | b;                          \
> else                                                                \
>     ret = c;                                                        \
>
>
> in hash.c:82.
>
> Meaning, if executed on a x64 platform, the higher 32bit of the
> resulting 64bit hash result are equal to the hash value on a 32bit
> platform. A simple test case in c++ looks like this:
>
> int main(int args, char** argv)
> {
>     scm_init_guile();
>     auto strToHash = scm_from_locale_string ("((device) (mount-point))");
>     auto maxULong = scm_from_ulong(ULONG_MAX);
>     auto hashResult = scm_hash(strToHash,maxULong);
>     auto hashResultUL = scm_to_ulong(hashResult);
>     std::cout << "Max ULONG_MAX: " << ULONG_MAX <<std::endl;
>     std::cout << "Original hashResult ulong: " << hashResultUL << std::endl;
>
>     if(sizeof(hashResultUL) == 8)
>     {
>         std::cout << "Corrected for 32bit: " << (hashResultUL >> 32) << std::endl;
>     }
> }
>
>
> which results on x64 in
>
>> Max ULONG_MAX: 18446744073709551615
>> Original hashResult ulong: 10454028974864831
>> Corrected for 32bit: 2434018
>
> and on arm32 to
>
>> Max ULONG_MAX: 4294967295
>> Original hashResult ulong: 2434018
>
> This suggest the following workaround. Always limit the hash size to
> 32bit even if executed on a 64bit platform (or to be more specific a
> platform where ulong is 8bytes big). Do this by right shift the hash
> value 32bits and don't rely on the size parameter of the `string-hash`
> function.
>
> In code it could look something like this
>
> (define (compute-abi-cookie field-specs)
>     ;; Compute an "ABI cookie" for the given FIELD-SPECS.  We use
>     ;; 'string-hash' because that's a better hash function that 'hash' on a
>     ;; list of symbols.
>     (let ((hash
>            (syntax-case field-specs ()
>              (((field get properties ...) ...)
>               (let ((hash-value (string-hash (object->string
>                                               (syntax->datum #'((field properties ...) ...))))))
>                 (if (= (native-word-size) 8)
>                     (ash hash-value -32)
>                     hash-value)))))
>           (fd (syntax-case field-specs ()
>                 (((field get properties ...) ...)
>                  (object->string
>                   (syntax->datum #'((field properties ...) ...)))))))
>       
>       (format #t "Compute-abi-cookie: ~a~%" hash)
>       hash))
>
>
> where `native-word-size` is define by 
>
> (define (native-word-size)
>   ((@ (system foreign) sizeof) '*))

This is a thorough investigation, and the above fix/workaround that can
be applied to Guix is the cherry on the cake!  Thank you for producing
it.

> (taken from `cross-compilation.test`). There might be a cleaner way to
> formulate this, but you get the point.
>
> This seems to work for all combinations on my machine. I tested
> x64 -> arm, x64 -> i868, i868 -> x64...
>
> I can only think of two drawbacks.
>
> 1) Lost entropy on 64 bit machines
> 2) Abi break because on new compilation the hash values on 64bit
>    platforms will change.
>
> 1) is imho irrelevant, because it is not cryptophically important. For
> 2) i am not sure how important this is.
>
> Any thoughts on this?

I don't think 1 or 2 are too important; about 2 for example, we break
the ABI every time we add or remove a new record field.

> Might this be something worth fixing and sending a patch in?

Totally, if you haven't done so already.

-- 
Thanks,
Maxim


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-12-04 11:50                 ` Maxim Cournoyer
@ 2024-12-07 13:44                   ` Christoph Buck
  2024-12-09  0:44                     ` Maxim Cournoyer
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Buck @ 2024-12-07 13:44 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Zack Weinberg, Efraim Flashner, help-guix

Hi Maxim!

> Sorry for my delayed reply.

No problem.

> Totally, if you haven't done so already.

I already did. See https://issues.guix.gnu.org/74296

And thanks to Ludovic the patch is already merged.


-- 
Best regards

Christoph


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ABI mismatch on boot on arm32 system
  2024-12-07 13:44                   ` Christoph Buck
@ 2024-12-09  0:44                     ` Maxim Cournoyer
  0 siblings, 0 replies; 16+ messages in thread
From: Maxim Cournoyer @ 2024-12-09  0:44 UTC (permalink / raw)
  To: Christoph Buck; +Cc: Zack Weinberg, Efraim Flashner, help-guix

Hi Christoph,

Christoph Buck <dev@icepic.de> writes:

> Hi Maxim!
>
>> Sorry for my delayed reply.
>
> No problem.
>
>> Totally, if you haven't done so already.
>
> I already did. See https://issues.guix.gnu.org/74296
>
> And thanks to Ludovic the patch is already merged.

Excellent :-).  I'm glad the result of your excellent investigation and
fix made its way to the code base.

Cheers!

-- 
Maxim


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-12-09  0:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
2024-10-16 20:05 ` Richard Sent
2024-10-20 15:15   ` Christoph Buck
2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
2024-10-20 15:23 ` Christoph Buck
2024-10-20 15:39   ` Zack Weinberg
2024-10-20 17:24     ` Christoph Buck
2024-10-21  9:55       ` Christoph Buck
2024-10-29 17:11         ` Christoph Buck
2024-10-30  7:09           ` Efraim Flashner
2024-10-30 13:24             ` Christoph Buck
2024-11-06 10:25               ` Christoph Buck
2024-11-11  7:47                 ` Christoph Buck
2024-12-04 11:50                 ` Maxim Cournoyer
2024-12-07 13:44                   ` Christoph Buck
2024-12-09  0:44                     ` Maxim Cournoyer

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.