* ABI mismatch on boot on arm32 system
@ 2024-10-16 10:11 Christoph Buck
2024-10-16 20:05 ` Richard Sent
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Christoph Buck @ 2024-10-16 10:11 UTC (permalink / raw)
To: help-guix
Hi!
Currently i am trying to create an guix image which will boot on
embedded imx6 arm32 board. Following the guix manual, i was able to
create such an image. This involved adding a custom uboot version and a
kernel with custom definition file. If flashed on an sdcard, the uboot
runs and the kernel boots. However, early on boot (presumably on
executing initrd.cpio.gz), an `record-abi-mismatch-error` is thrown and
a guix recovery repl is opened
> Use 'gnu.repl' for an initrd REPL.
> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> Throw to key `record-abi-mismatch-error' with args `(abi-check "~a: record ABI mismatch; recompilation needed" (#<record-type <file-system>>) ())'.
> Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue.
> GNU Guile 3.0.9
> Copyright (C) 1995-2023 Free Software Foundation, Inc.
> Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
> This program is free software, and you are welcome to redistribute it
> under certain conditions; type `,show c' for details.
Unfortunatly i have absolutely no clue what the problem could be. Could
it be that the image was compiled with a differnt guile version than
executet on the image? Could this explain the abi mismatch in the
`file-system` record?
Googling for the error i found the following post on this mailing list:
> https://lists.gnu.org/archive/html/help-guix/2023-02/msg00147.html
Seems like Maxim Cournoyer had the same problem with a board with the
same socc (imx6). Unfortunatly no followup. (I mailed him in private in
case he come up with a solution. If so, i will document it here, so that
the next unlucky soul running into this error can find the solution).
I cross-compile the image on x64 with
> guix build -f custom-board.scm --target=arm-linux-gnueabihf -v3 -c2 -M2 -K --no-grafts
where `custom-board.scm` is my image definition (i can share it if
helpfull). Option `--no-grafts` is needed due to
> https://issues.guix.gnu.org/66866
For tips on how to debug this issue further i would be very
grateful. Feels like i am very close to a bootable image.
--
Best regards
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
@ 2024-10-16 20:05 ` Richard Sent
2024-10-20 15:15 ` Christoph Buck
2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
2024-10-20 15:23 ` Christoph Buck
2 siblings, 1 reply; 13+ messages in thread
From: Richard Sent @ 2024-10-16 20:05 UTC (permalink / raw)
To: Christoph Buck; +Cc: help-guix
Hi Christoph,
> Currently i am trying to create an guix image which will boot on
> embedded imx6 arm32 board. Following the guix manual, i was able to
> create such an image. This involved adding a custom uboot version and a
> kernel with custom definition file. If flashed on an sdcard, the uboot
> runs and the kernel boots. However, early on boot (presumably on
> executing initrd.cpio.gz), an `record-abi-mismatch-error` is thrown and
> a guix recovery repl is opened
>
>> Use 'gnu.repl' for an initrd REPL.
>
>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> Throw to key `record-abi-mismatch-error' with args `(abi-check "~a: record ABI mismatch; recompilation needed" (#<record-type <file-system>>) ())'.
Your issue sounds very similar to the one described in
https://issues.guix.gnu.org/61173#4.
The TL;DR (although I encourage you to read it!) is that you need the
CONFIG_BINFMT_MISC Linux kernel compilation option set, but when you use
a linux-libre-*-generic kernel that option is NOT set. If you're using
the qemu-binfmt-service, you'll fail to boot and the error you posted
will occur before you're dropped into a REPL.
Unfortunately services do not currently have any mechanism to require or
check kernel config options.
Not knowing your operating-system declaration I can't tell for sure if
that is what's going on, but I suspect what I described or something
similar is the case.
If you confirm this is in fact the problem, feel free to leave a comment
on the issue!
Best of luck.
--
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
2024-10-16 20:05 ` Richard Sent
@ 2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
2024-10-20 15:23 ` Christoph Buck
2 siblings, 0 replies; 13+ messages in thread
From: Denis 'GNUtoo' Carikli @ 2024-10-18 20:58 UTC (permalink / raw)
To: Christoph Buck; +Cc: help-guix
[-- Attachment #1: Type: text/plain, Size: 1489 bytes --]
On Wed, 16 Oct 2024 12:11:30 +0200
Christoph Buck <dev@icepic.de> wrote:
> Hi!
Hi,
> Currently i am trying to create an guix image which will boot on
> embedded imx6 arm32 board. Following the guix manual, i was able to
> create such an image. This involved adding a custom uboot version and
> a kernel with custom definition file. If flashed on an sdcard, the
> uboot runs and the kernel boots. However, early on boot (presumably on
> executing initrd.cpio.gz), an `record-abi-mismatch-error` is thrown
> and a guix recovery repl is opened
>
> > Use 'gnu.repl' for an initrd REPL.
>
> > ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> > Throw to key `record-abi-mismatch-error' with args `(abi-check "~a:
> > record ABI mismatch; recompilation needed" (#<record-type
> > <file-system>>) ())'.
There is also the option to try to bisect the issues (there might be
more than one).
We now have a u-boot-qemu-arm package so you could for instance start
with arm64 (with u-boot-qemu-arm64 and a system definition that you
create or reuse+modify) and manage to boot a system with qemu.
Then once you managed to boot an arm64 system, you could try to
reproduce it for 32bit arm with an older guix revision (and possibly a
recent u-boot-qemu-arm that doesn't change) and then start bisecting.
I tried to do that a long time ago but I don't have fast computers and
so at some point I gave up and I never found the time to get back to it.
Denis.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-16 20:05 ` Richard Sent
@ 2024-10-20 15:15 ` Christoph Buck
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Buck @ 2024-10-20 15:15 UTC (permalink / raw)
To: Richard Sent; +Cc: help-guix
Richard Sent <richard@freakingpenguin.com> writes:
> Hi Christoph,
Hi Richard!
>
> The TL;DR (although I encourage you to read it!) is that you need the
> CONFIG_BINFMT_MISC Linux kernel compilation option set, but when you use
> a linux-libre-*-generic kernel that option is NOT set. If you're using
> the qemu-binfmt-service, you'll fail to boot and the error you posted
> will occur before you're dropped into a REPL.
I use a custom kernel modified straight from kernel.org and indeed i didn't
enable the `CONFIG_BINFMT_MISC` setting. However, unfortunately enabling
this option **does not** solve my problem.
--
Best regards
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
2024-10-16 20:05 ` Richard Sent
2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
@ 2024-10-20 15:23 ` Christoph Buck
2024-10-20 15:39 ` Zack Weinberg
2 siblings, 1 reply; 13+ messages in thread
From: Christoph Buck @ 2024-10-20 15:23 UTC (permalink / raw)
To: help-guix
Hi!
I played around a little bit more and i can indeed now successfully
boot. Instead of using cross-compilation (cli option
`--target=arm-linux-gnueabihf`) i created a build using qemu emulation
(cli option `--system=armhf-linux`). This takes ages to build, but the
resulting images is bootable without abi error. Unfortunatly this is not
a real fix because it is too slow to be a practical workaround (at least
for me).
I digged a little deeper and this is what i found out so far. In case i
am running off in a totally wrong direction, someone with more clue than
me should please stop me ;)
I think something goes wrong during crosscompilation of the guile
modules in package `module-import-compiled`. The abi error is thrown
early on boot in the `initrd.cpio.gz` ramdisk. I extracted and
decompressed the ramdisk from both builds (crosscompilation and qemu)
which contain the `module-import-compiled` guile modules. I would expect
that the *.go files from the `module-import-compiled` package of both
ramdisks are binary identical but they have different md5sums. Lets take
for example `file-systems.go`, which cause the abi error.
--8<---------------cut here---------------start------------->8---
local@host:crosscompilation-initrd/gnu/store/5ffy1h3fgikzdhfz4nkchxnibbri4ain-module-import-compiled/gnu/system$
md5sum file-systems.go
7839e9c7a0c7c6c8d9ea45566ab9f61e file-systems.go
--8<---------------cut here---------------end--------------->8---
vs
--8<---------------cut here---------------start------------->8---
local@host:qemu-initrd/gnu/store/hvgj80xqf70mvx460pnvwmi87kqqn2bj-module-import-compiled/gnu/system$
md5sum file-systems.go
a43a7e939ae9d0cc1ce30726cb51d6d4 file-systems.go
--8<---------------cut here---------------end--------------->8---
Additional it looks like different symbols are exported depending if
cross-compilation or qemu was used. This is at least what `readelf -s file-system.go`
reports. I naively thought these files should be identical.
Additional i saw these strange errors in the build log during
crosscompilation
--8<---------------cut here---------------start------------->8---
;;; WARNING: loading compiled file
/gnu/store/5ffy1h3fgikzdhfz4nkchxnibbri4ain-module-import-compiled/gnu/build/file-systems.go
failed:
;;; In procedure load-thunk-from-memory: ELF file does not have native word size
;;; WARNING: loading compiled file
/gnu/store/5ffy1h3fgikzdhfz4nkchxnibbri4ain-module-import-compiled/gnu/system/uuid.go
failed:
;;; In procedure load-thunk-from-memory: ELF file does not have native word size
;;; WARNING: loading compiled file
/gnu/store/5ffy1h3fgikzdhfz4nkchxnibbri4ain-module-import-compiled/gnu/system/file-systems.go
failed:
;;; In procedure load-thunk-from-memory: ELF file does not have native word size
--8<---------------cut here---------------end--------------->8---
This also looks suspicious. These stem from the `check_elf_header`
function in guile. Guile warns that the class type in the elf header is
32bits if executed in a cross-compiliation context on an x64 system. But
until now i couldn't figure out, if i can ignore these warnings or if
they might cause a problem.
--
Best regards
Christoph
I did some further digging into this issue.
it warns if the class type in the elf header is
32bit.
Christoph Buck <dev@icepic.de> writes:
> Hi!
>
> Currently i am trying to create an guix image which will boot on
> embedded imx6 arm32 board. Following the guix manual, i was able to
> create such an image. This involved adding a custom uboot version and a
> kernel with custom definition file. If flashed on an sdcard, the uboot
> runs and the kernel boots. However, early on boot (presumably on
> executing initrd.cpio.gz), an `record-abi-mismatch-error` is thrown and
> a guix recovery repl is opened
>
>> Use 'gnu.repl' for an initrd REPL.
>
>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> Throw to key `record-abi-mismatch-error' with args `(abi-check "~a: record ABI mismatch; recompilation needed" (#<record-type <file-system>>) ())'.
>
>> Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue.
>> GNU Guile 3.0.9
>> Copyright (C) 1995-2023 Free Software Foundation, Inc.
>
>> Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
>> This program is free software, and you are welcome to redistribute it
>> under certain conditions; type `,show c' for details.
>
> Unfortunatly i have absolutely no clue what the problem could be. Could
> it be that the image was compiled with a differnt guile version than
> executet on the image? Could this explain the abi mismatch in the
> `file-system` record?
>
> Googling for the error i found the following post on this mailing list:
>
>> https://lists.gnu.org/archive/html/help-guix/2023-02/msg00147.html
>
> Seems like Maxim Cournoyer had the same problem with a board with the
> same socc (imx6). Unfortunatly no followup. (I mailed him in private in
> case he come up with a solution. If so, i will document it here, so that
> the next unlucky soul running into this error can find the solution).
>
> I cross-compile the image on x64 with
>
>> guix build -f custom-board.scm --target=arm-linux-gnueabihf -v3 -c2 -M2 -K --no-grafts
>
> where `custom-board.scm` is my image definition (i can share it if
> helpfull). Option `--no-grafts` is needed due to
>
>> https://issues.guix.gnu.org/66866
>
> For tips on how to debug this issue further i would be very
> grateful. Feels like i am very close to a bootable image.
--
Best regards
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-20 15:23 ` Christoph Buck
@ 2024-10-20 15:39 ` Zack Weinberg
2024-10-20 17:24 ` Christoph Buck
0 siblings, 1 reply; 13+ messages in thread
From: Zack Weinberg @ 2024-10-20 15:39 UTC (permalink / raw)
To: help-guix
On Sun, Oct 20, 2024, at 11:23 AM, Christoph Buck wrote:
> I think something goes wrong during crosscompilation of the guile
> modules in package `module-import-compiled`. The abi error is thrown
> early on boot in the `initrd.cpio.gz` ramdisk. I extracted and
> decompressed the ramdisk from both builds (crosscompilation and qemu)
> which contain the `module-import-compiled` guile modules. I would expect
> that the *.go files from the `module-import-compiled` package of both
> ramdisks are binary identical but they have different md5sums. Lets take
> for example `file-systems.go`, which cause the abi error.
[...]
Can you show us the *complete and unedited* output of `readelf -hlSd file-systems.go` from both the working and the broken ramdisk, please?
zw
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-20 15:39 ` Zack Weinberg
@ 2024-10-20 17:24 ` Christoph Buck
2024-10-21 9:55 ` Christoph Buck
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Buck @ 2024-10-20 17:24 UTC (permalink / raw)
To: Zack Weinberg; +Cc: help-guix
[-- Attachment #1: Type: text/plain, Size: 784 bytes --]
Hi Zack!
> Can you show us the *complete and unedited* output of `readelf -hlSd
> file-systems.go` from both the working and the broken ramdisk, please?
>
Sure. See the attachments of this mail.
But i just saw that i made a mistake and compaired the module
`qemu/gnu/build/file-systems.go` to
`cross/gnu/system/file-systems.go`.
The md5sum between `qemu/gnu/system/file-systems.go` and
`cross/gnu/system/file-systems.go` is still different, but the exported
symbols are the same (see attachemt). The only difference i now can see
is in the `Start of section headers` (426376 vs 426352) and different
addresses in the subsequent output of `readelf`. Are theses expected to
be deterministic/equal?
Sorry for the confusion.
> zw
Greetings Christoph
--
Best regards
Christoph
[-- Attachment #2: qemu.readelf --]
[-- Type: application/octet-stream, Size: 3734 bytes --]
ELF Header:
Magic: 7f 45 4c 46 01 01 01 ff 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: <unknown: ff>
ABI Version: 0
Type: DYN (Shared object file)
Machine: None
Version: 0x1
Entry point address: 0x0
Start of program headers: 52 (bytes into file)
Start of section headers: 426352 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 20
Section header string table index: 17
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .guile.procprops PROGBITS 00000000 06c790 000020 00 0 0 8
[ 2] .rodata PROGBITS 0003fe38 03fe38 0074ac 00 A 0 0 8
[ 3] .data PROGBITS 00050000 050000 018170 00 WA 0 0 8
[ 4] .rtl-text PROGBITS 00000098 000098 03fda0 00 A 0 0 8
[ 5] .dynamic DYNAMIC 000472e8 0472e8 000030 00 A 0 0 8
[ 6] .strtab STRTAB 00000000 070b98 000666 00 0 0 8
[ 7] .symtab SYMTAB 00000000 06f118 001a80 10 6 0 8
[ 8] .guile.ariti[...] STRTAB 00000000 071200 00085a 00 0 0 8
[ 9] .guile.arities PROGBITS 00000000 068490 004253 00 8 0 8
[10] .guile.docst[...] STRTAB 00000000 071a60 0008ac 00 0 0 8
[11] .guile.docstrs PROGBITS 00000000 06c6e8 0000a8 00 10 0 8
[12] .debug_info PROGBITS 00000000 06c7b0 000b6f 00 0 0 8
[13] .debug_abbrev PROGBITS 00000000 06d320 000041 00 0 0 8
[14] .debug_str PROGBITS 00000000 06d368 000672 00 0 0 8
[15] .debug_loc PROGBITS 00000000 06d9e0 000000 00 0 0 8
[16] .debug_line PROGBITS 00000000 06d9e0 001733 00 0 0 8
[17] .shstrtab STRTAB 00000000 072310 0000d3 00 0 0 8
[18] PROGBITS 00000000 000000 000094 00 A 0 0 8
[19] NULL 00000000 068170 000000 00 0 0 0
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x47318 0x47318 R 0x10000
LOAD 0x050000 0x00050000 0x00050000 0x18170 0x18170 RW 0x10000
DYNAMIC 0x0472e8 0x000472e8 0x000472e8 0x00030 0x00030 R 0x8
Section to Segment mapping:
Segment Sections...
00 .rodata .rtl-text .dynamic
01 .data
02 .dynamic
Dynamic section at offset 0x472e8 contains 6 entries:
Tag Type Name/Value
0x37146003 (<unknown>: 37146003) 0x3000006
0x37146002 (<unknown>: 37146002) 0x98
0x37146000 (<unknown>: 37146000) 0x50000
0x37146001 (<unknown>: 37146001) 0x18170
0x0000000c (INIT) 0x1b918
0x00000000 (NULL) 0x0
[-- Attachment #3: cross.readelf --]
[-- Type: application/octet-stream, Size: 3735 bytes --]
ELF Header:
Magic: 7f 45 4c 46 01 01 01 ff 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: <unknown: ff>
ABI Version: 0
Type: DYN (Shared object file)
Machine: None
Version: 0x1
Entry point address: 0x0
Start of program headers: 52 (bytes into file)
Start of section headers: 426376 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 20
Section header string table index: 17
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .guile.procprops PROGBITS 00000000 06c7a8 000020 00 0 0 8
[ 2] .rodata PROGBITS 0003fe60 03fe60 008b64 00 A 0 0 8
[ 3] .data PROGBITS 00050000 050000 018188 00 WA 0 0 8
[ 4] .rtl-text PROGBITS 00000098 000098 03fdc4 00 A 0 0 8
[ 5] .dynamic DYNAMIC 000489c8 0489c8 000030 00 A 0 0 8
[ 6] .strtab STRTAB 00000000 070ba8 000666 00 0 0 8
[ 7] .symtab SYMTAB 00000000 06f128 001a80 10 6 0 8
[ 8] .guile.ariti[...] STRTAB 00000000 071210 000b62 00 0 0 8
[ 9] .guile.arities PROGBITS 00000000 0684a8 004255 00 8 0 8
[10] .guile.docst[...] STRTAB 00000000 071d78 0008ac 00 0 0 8
[11] .guile.docstrs PROGBITS 00000000 06c700 0000a8 00 10 0 8
[12] .debug_info PROGBITS 00000000 06c7c8 000b67 00 0 0 8
[13] .debug_abbrev PROGBITS 00000000 06d330 000041 00 0 0 8
[14] .debug_str PROGBITS 00000000 06d378 000672 00 0 0 8
[15] .debug_loc PROGBITS 00000000 06d9f0 000000 00 0 0 8
[16] .debug_line PROGBITS 00000000 06d9f0 001733 00 0 0 8
[17] .shstrtab STRTAB 00000000 072628 0000d3 00 0 0 8
[18] PROGBITS 00000000 000000 000094 00 A 0 0 8
[19] NULL 00000000 068188 000000 00 0 0 0
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclupde),
p (processor specific)
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x489f8 0x489f8 R 0x10000
LOAD 0x050000 0x00050000 0x00050000 0x18188 0x18188 RW 0x10000
DYNAMIC 0x0489c8 0x000489c8 0x000489c8 0x00030 0x00030 R 0x8
Section to Segment mapping:
Segment Sections...
00 .rodata .rtl-text .dynamic
01 .data
02 .dynamic
Dynamic section at offset 0x489c8 contains 6 entries:
Tag Type Name/Value
0x37146003 (<unknown>: 37146003) 0x3000006
0x37146002 (<unknown>: 37146002) 0x98
0x37146000 (<unknown>: 37146000) 0x50000
0x37146001 (<unknown>: 37146001) 0x18188
0x0000000c (INIT) 0x1b918
0x00000000 (NULL) 0x0
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-20 17:24 ` Christoph Buck
@ 2024-10-21 9:55 ` Christoph Buck
2024-10-29 17:11 ` Christoph Buck
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Buck @ 2024-10-21 9:55 UTC (permalink / raw)
To: Zack Weinberg; +Cc: help-guix
Hi!
I can now reproduce this error locally. Consider the following file:
--8<---------------cut here---------------start------------->8---
(define-module (abi-error)
#:use-module (gnu system file-systems)
#:export (bla test))
(define bla (file-system
(device (file-system-label "my-root"))
(mount-point "/")
(type "ext4")))
--8<---------------cut here---------------end--------------->8---
If crosscompiled on x64 to arm32 using (this is what `compiled-modules`
in gexp.scm does, at least as far i can tell)
--8<---------------cut here---------------start------------->8---
(use-modules (system base compile))
(use-modules (system base target))
(with-target "arm-linux-gnueabihf"
(lambda () (compile-file "abi-error.scm" #:output-file "abi-error.go")))
--8<---------------cut here---------------end--------------->8---
loading the module in an emulated arm32 guile repl fails with an abi
error:
--8<---------------cut here---------------start------------->8---
icepic@G16-Buck:~/guix$ guix shell --container --system=armhf-linux guix guile file bash which coreutils
icepic@G16-Buck ~/guix [env]$ guix repl -L .
GNU Guile 3.0.9
Copyright (C) 1995-2023 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guix-user)> ,use (abi-error)
While executing meta-command:
Throw to key `record-abi-mismatch-error' with args `(abi-check "~a: record ABI mismatch; recompilation needed" (#<record-type <file-system>>) ())'
--8<---------------cut here---------------end--------------->8---
But if compiled direclty in qemu on arm32 it works without abi-error,
see
--8<---------------cut here---------------start------------->8---
icepic@G16-Buck:~/guix$ guix shell --container --system=armhf-linux guix guile file bash which coreutils
icepic@G16-Buck ~/guix [env]$ guix repl compile.scm
icepic@G16-Buck ~/guix [env]$ guix repl -L .
GNU Guile 3.0.9
Copyright (C) 1995-2023 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guix-user)> ,use (abi-error)
scheme@(guix-user)> bla
$1 = #<<file-system> device: #<file-system-label "my-root"> mount-point: "/" type: "ext4" flags: () options: #f mount?: #t mount-may-fail?: #f needed-for-boot?: #f check?: #t skip-check-if-clean?: #t repair: preen create-mount-point?: #f dependencies: () shepherd-requirements: () location: ((filename . "abi-error.scm") (line . 4) (column . 12))>
--8<---------------cut here---------------end--------------->8---
where `compile.scm` is simply
--8<---------------cut here---------------start------------->8---
(use-modules (system base compile))
(use-modules (system base target))
(compile-file "abi-error.scm" #:output-file "abi-error.go")
--8<---------------cut here---------------end--------------->8---
This is not what one expect, is it?
Interestingly it works if `aarch64-linux-gnu` instead of
`arm-linux-gnueabihf` is used.
So... it looks like there is a bug in cross-compilation support for
arm32 in guile?
--
Best regards
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-21 9:55 ` Christoph Buck
@ 2024-10-29 17:11 ` Christoph Buck
2024-10-30 7:09 ` Efraim Flashner
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Buck @ 2024-10-29 17:11 UTC (permalink / raw)
To: Zack Weinberg; +Cc: help-guix
Hi!
In case anybody is reading along here. I digged deeper and found
something rather interessting :P
From my understanding by reading through the records.scm from guix (and
please note that im a total scheme newbee), the abi check works by
calculation a string-hash over the record field names and storing the
hash as hidden field in the record. During runtime this string-hash is
computed again and compared to the compiled hash. If they don't
match, the abi is broken because a field was added or removed.
The hash is calculated in the `compute-abi-cookie` procedure in the
records.scm.
I extended the procedure with the following debug outputs
--8<---------------cut here---------------start------------->8---
(define (compute-abi-cookie field-specs)
;; Compute an "ABI cookie" for the given FIELD-SPECS. We use
;; 'string-hash' because that's a better hash function that 'hash' on a
;; list of symbols.
(let ((hash
(syntax-case field-specs ()
(((field get properties ...) ...)
(string-hash (object->string
(syntax->datum #'((field properties ...) ...)))
;; (bla)
(cond-expand
(guile-3 (target-most-positive-fixnum))
(else most-positive-fixnum))
))))
(fd (syntax-case field-specs ()
(((field get properties ...) ...)
(object->string
(syntax->datum #'((field properties ...) ...)))))))
(format #t "Compute-abi-cookie: ~a~%" hash)
(format #t "field-specs: ~a~%" field-specs)
(format #t "fd: ~a~%" fd)
(format #t "hashsize ~a~%: " (cond-expand
(guile-3 (target-most-positive-fixnum))
(else most-positive-fixnum)))
hash))
--8<---------------cut here---------------end--------------->8---
Now, if i compile a simple test record
--8<---------------cut here---------------start------------->8---
(define-record-type* <test-system> test-system
make-test-system
test-system?
(device test-system-device)
(mount-point test-system-mount-point))
(define test-abi (test-system
(device "my-root")
(mount-point "/")))
--8<---------------cut here---------------end--------------->8---
on x64 using guile cross-compiling (in a `guix shell --container guix
guile` environment) using the call
--8<---------------cut here---------------start------------->8---
(with-target "arm-linux-gnueabihf" (lambda () (compile-file "test-abi.scm")))
--8<---------------cut here---------------end--------------->8---
the following outputs are generated:
--8<---------------cut here---------------start------------->8---
Compute-abi-cookie: 212719825
field-specs: ((#<syntax device> #<syntax test-system-device>) (#<syntax mount-point> #<syntax test-system-mount-point>))
fd: ((device) (mount-point))
hashsize 536870911
--8<---------------cut here---------------end--------------->8---
The abi cookie is computed by calculating the string hash over
"((device) (mount-point))" while limiting the size of the hash by
536870911. One can manually check this by calling
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-hash "((device) (mount-point))" 536870911)
$1 = 212719825
--8<---------------cut here---------------end--------------->8---
in the repl.
Now, if i do the same in a qemu arm32 environment (using `guix shell
--container guix guile --system=armhf-linux`), a different hash is
printed, even though the hash is calculated over the same string, see:
--8<---------------cut here---------------start------------->8---
Compute-abi-cookie: 2434018
field-specs: ((#<syntax device> #<syntax test-system-device>) (#<syntax mount-point> #<syntax test-system-mount-point>))
fd: ((device) (mount-point))
hashsize 536870911
--8<---------------cut here---------------end--------------->8---
You can verify this in the repl as well:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-hash "((device) (mount-point))" 536870911)
$1 = 2434018
--8<---------------cut here---------------end--------------->8---
My first intuition after seeing the source of `compute-abi-cookie` was,
that maybe the `target-most-positive-fixnum` results in an wrong value
when called in a cross-compile context. But as you can see, this is not
the case. Instead, the `string-hash` calculates a different hash
even thought the input values are the same.
Now, i am not even sure if one can expect that hash functions running on
different architectures result in the same hash if the input is the
same. If not, then the implementation in guix record.scm would be
buggy. If one expects that the hash of `string-hash` for the same input
must be the same regardless of the architecture, then this would hint to
a bug in the `string-hash` function in guile for arm32.
Any inputs and thoughts regarding this issue would be appreciated.
Greetings
Christoph
--
Best regards
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-29 17:11 ` Christoph Buck
@ 2024-10-30 7:09 ` Efraim Flashner
2024-10-30 13:24 ` Christoph Buck
0 siblings, 1 reply; 13+ messages in thread
From: Efraim Flashner @ 2024-10-30 7:09 UTC (permalink / raw)
To: Christoph Buck; +Cc: Zack Weinberg, help-guix
[-- Attachment #1: Type: text/plain, Size: 6043 bytes --]
On Tue, Oct 29, 2024 at 06:11:27PM +0100, Christoph Buck wrote:
> Hi!
>
> In case anybody is reading along here. I digged deeper and found
> something rather interessting :P
Thank you! From where I'm sitting it's much easier (for me) to suggest
things than to try and setup your environment.
> From my understanding by reading through the records.scm from guix (and
> please note that im a total scheme newbee), the abi check works by
> calculation a string-hash over the record field names and storing the
> hash as hidden field in the record. During runtime this string-hash is
> computed again and compared to the compiled hash. If they don't
> match, the abi is broken because a field was added or removed.
>
> The hash is calculated in the `compute-abi-cookie` procedure in the
> records.scm.
>
> I extended the procedure with the following debug outputs
>
> --8<---------------cut here---------------start------------->8---
> (define (compute-abi-cookie field-specs)
> ;; Compute an "ABI cookie" for the given FIELD-SPECS. We use
> ;; 'string-hash' because that's a better hash function that 'hash' on a
> ;; list of symbols.
> (let ((hash
> (syntax-case field-specs ()
> (((field get properties ...) ...)
> (string-hash (object->string
> (syntax->datum #'((field properties ...) ...)))
> ;; (bla)
> (cond-expand
> (guile-3 (target-most-positive-fixnum))
> (else most-positive-fixnum))
> ))))
> (fd (syntax-case field-specs ()
> (((field get properties ...) ...)
> (object->string
> (syntax->datum #'((field properties ...) ...)))))))
>
> (format #t "Compute-abi-cookie: ~a~%" hash)
> (format #t "field-specs: ~a~%" field-specs)
> (format #t "fd: ~a~%" fd)
> (format #t "hashsize ~a~%: " (cond-expand
> (guile-3 (target-most-positive-fixnum))
> (else most-positive-fixnum)))
> hash))
> --8<---------------cut here---------------end--------------->8---
>
> Now, if i compile a simple test record
>
> --8<---------------cut here---------------start------------->8---
>
> (define-record-type* <test-system> test-system
> make-test-system
> test-system?
> (device test-system-device)
> (mount-point test-system-mount-point))
>
> (define test-abi (test-system
> (device "my-root")
> (mount-point "/")))
>
> --8<---------------cut here---------------end--------------->8---
>
> on x64 using guile cross-compiling (in a `guix shell --container guix
> guile` environment) using the call
>
> --8<---------------cut here---------------start------------->8---
> (with-target "arm-linux-gnueabihf" (lambda () (compile-file "test-abi.scm")))
> --8<---------------cut here---------------end--------------->8---
>
> the following outputs are generated:
>
> --8<---------------cut here---------------start------------->8---
> Compute-abi-cookie: 212719825
> field-specs: ((#<syntax device> #<syntax test-system-device>) (#<syntax mount-point> #<syntax test-system-mount-point>))
> fd: ((device) (mount-point))
> hashsize 536870911
> --8<---------------cut here---------------end--------------->8---
>
> The abi cookie is computed by calculating the string hash over
> "((device) (mount-point))" while limiting the size of the hash by
> 536870911. One can manually check this by calling
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (string-hash "((device) (mount-point))" 536870911)
> $1 = 212719825
> --8<---------------cut here---------------end--------------->8---
>
> in the repl.
>
> Now, if i do the same in a qemu arm32 environment (using `guix shell
> --container guix guile --system=armhf-linux`), a different hash is
> printed, even though the hash is calculated over the same string, see:
>
> --8<---------------cut here---------------start------------->8---
> Compute-abi-cookie: 2434018
> field-specs: ((#<syntax device> #<syntax test-system-device>) (#<syntax mount-point> #<syntax test-system-mount-point>))
> fd: ((device) (mount-point))
> hashsize 536870911
> --8<---------------cut here---------------end--------------->8---
>
> You can verify this in the repl as well:
>
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> (string-hash "((device) (mount-point))" 536870911)
> $1 = 2434018
> --8<---------------cut here---------------end--------------->8---
>
> My first intuition after seeing the source of `compute-abi-cookie` was,
> that maybe the `target-most-positive-fixnum` results in an wrong value
> when called in a cross-compile context. But as you can see, this is not
> the case. Instead, the `string-hash` calculates a different hash
> even thought the input values are the same.
>
> Now, i am not even sure if one can expect that hash functions running on
> different architectures result in the same hash if the input is the
> same. If not, then the implementation in guix record.scm would be
> buggy. If one expects that the hash of `string-hash` for the same input
> must be the same regardless of the architecture, then this would hint to
> a bug in the `string-hash` function in guile for arm32.
>
> Any inputs and thoughts regarding this issue would be appreciated.
>
Can you run it again, but with i686 -> armhf, and x86_64 -> i686?
My curiosity includes i686 -> x86_64, but I suspect it won't tell us
anything we won't learn from the previous tests.
--
Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-30 7:09 ` Efraim Flashner
@ 2024-10-30 13:24 ` Christoph Buck
2024-11-06 10:25 ` Christoph Buck
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Buck @ 2024-10-30 13:24 UTC (permalink / raw)
To: Zack Weinberg; +Cc: help-guix
Efraim Flashner <efraim@flashner.co.il> writes:
> Can you run it again, but with i686 -> armhf, and x86_64 -> i686?
>
Hi Efraim!
Sure. No problem. Here we go:
cross compiled x86_64/i686 = 212719825 hash vs qemu i686 = 2434018 hash
cross compiled i686/arm* = 2434018 hash vs qemu arm = 2434018 hash
* This combination is run in qemu as well and then cross compiled using
`with-target` because i don't have a native i686 architecture.
> My curiosity includes i686 -> x86_64, but I suspect it won't tell us
> anything we won't learn from the previous tests.
Unfortnuatly this combination crashes:
cross compiled i686/x86_x64* =
--8<---------------cut here---------------start------------->8---
icepic@G16-Buck ~/guix/raspberry/touchscreen/abi-error/test/test-abi [env]$ ./compile.sh
Backtrace:
In ice-9/boot-9.scm:
2595:24 19 (call-with-deferred-observers _)
3424:24 18 (_)
222:17 17 (map1 (((test-systems))))
3327:17 16 (resolve-interface (test-systems) #:select _ #:hide _ # ?)
In ice-9/threads.scm:
390:8 15 (_ _)
In ice-9/boot-9.scm:
3253:13 14 (_)
In ice-9/threads.scm:
390:8 13 (_ _)
In ice-9/boot-9.scm:
3544:20 12 (_)
2836:4 11 (save-module-excursion _)
3564:26 10 (_)
In unknown file:
9 (primitive-load-path "test-systems" #<procedure ad7a0 a?>)
In ice-9/eval.scm:
721:20 8 (primitive-eval (define-record-type* <test-system> # # ?))
In ice-9/psyntax.scm:
1229:36 7 (expand-top-sequence (#<syntax:test-systems.scm:10:0 ?>) ?)
1121:20 6 (parse _ (("placeholder" placeholder)) ((top) #(# # ?)) ?)
1342:32 5 (syntax-type (#<syntax define-record-type*> #<synta?> ?) ?)
1562:32 4 (expand-macro #<procedure bcc50 at ice-9/eval.scm:333:?> ?)
In ice-9/eval.scm:
293:34 3 (_ #(#(#(#(#(#(#(#(#<directory ?> ?) ?) ?) ?) ?) ?) ?) ?))
298:34 2 (_ #(#(#<directory (abi-records) 178c80>) ((#<s?> ?) ?)))
In unknown file:
1 (string-hash "((device) (mount-point))" # #<undefined> #)
In ice-9/boot-9.scm:
1685:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Value out of range 1 to< 4294967295: 2305843009213693951
icepic@G16-Buck ~/guix/raspberry/touchscreen/abi-error/test/test-abi [env]$
--8<---------------cut here---------------end--------------->8---
Without further looking into this, i would hypothesize that during cross
compilation to x64 on i868, `target-most-positive-fixnum` returns a
number > 32bit which `string-hash` don't handle correct if executed on
i868.
To recap: For me, it looks like as if `string-hash` is not implemented
in a plaform independent way but uses a platform specific data type size
somewhere in its implementation. As long as bit mode(x64,x32) during
cross-compilation and execution is the same, it works (crosscompilation
on x64 and execution on arm64 is ok as well as crosscompilation on i868
and exection on arm32, see above). As soon as the bit mode is different,
the resulting hashes are different or the exeuction crashes.
I can debug into guiles `string-hash` function to find out where the
difference comes from. But i first need to figure out how to setup gdb
correctly.
But for me it sound like that a sensible bugfix for this issue (and i
think it is a bug/issue ;) ) would be using a platform independent
hashing algorithm implemented in plain guile without relying on native c
functions. If during compilation calucataed hashes are stored in the
compiled module one must assume that the hashes are always the same
independent of the architecture. Cross compilation will not work
reliable as soons as this assumption is broken, because compilation and
execution might happend on different platforms.
--
Best regards
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-10-30 13:24 ` Christoph Buck
@ 2024-11-06 10:25 ` Christoph Buck
2024-11-11 7:47 ` Christoph Buck
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Buck @ 2024-11-06 10:25 UTC (permalink / raw)
To: Zack Weinberg, Efraim Flashner; +Cc: help-guix
Hi Guix!
So i looked into the guile source code and, as expected, the `scm_hash`
function (see hash.c in guile) uses `unsigned long` wich is 8 bytes on
x64 and 4 bytes on arm32/i868. If `string-hash` is called with the size
parameter `n`, the hash value is limited to size by calculating the
modulo `n` of the hash value, see scm_ihash in hash.c:440, namely
> (unsigned long) scm_raw_ihash (obj, 10) % n
(The `10` can be ignored as far as i can tell). Since the hash values
are different on different platforms the modulo is different as well.
However, if one steps through the call stack of `string-hash` you can
see that the actual hash value is calculated by the
`JENKINS_LOOKUP3_HASHWORD2` macro, which contains are rather
interesting comment and a possible workaround for the abi problem,
namely
--8<---------------cut here---------------start------------->8---
/* Scheme can access symbol-hash, which exposes this value. For \
cross-compilation reasons, we ensure that the high 32 bits of \
the hash on a 64-bit system are equal to the hash on a 32-bit \
system. The low 32 bits just add more entropy. */ \
if (sizeof (ret) == 8) \
ret = (((unsigned long) c) << 32) | b; \
else \
ret = c; \
--8<---------------cut here---------------end--------------->8---
in hash.c:82.
Meaning, if executed on a x64 platform, the higher 32bit of the
resulting 64bit hash result are equal to the hash value on a 32bit
platform. A simple test case in c++ looks like this:
--8<---------------cut here---------------start------------->8---
int main(int args, char** argv)
{
scm_init_guile();
auto strToHash = scm_from_locale_string ("((device) (mount-point))");
auto maxULong = scm_from_ulong(ULONG_MAX);
auto hashResult = scm_hash(strToHash,maxULong);
auto hashResultUL = scm_to_ulong(hashResult);
std::cout << "Max ULONG_MAX: " << ULONG_MAX <<std::endl;
std::cout << "Original hashResult ulong: " << hashResultUL << std::endl;
if(sizeof(hashResultUL) == 8)
{
std::cout << "Corrected for 32bit: " << (hashResultUL >> 32) << std::endl;
}
}
--8<---------------cut here---------------end--------------->8---
which results on x64 in
> Max ULONG_MAX: 18446744073709551615
> Original hashResult ulong: 10454028974864831
> Corrected for 32bit: 2434018
and on arm32 to
> Max ULONG_MAX: 4294967295
> Original hashResult ulong: 2434018
This suggest the following workaround. Always limit the hash size to
32bit even if executed on a 64bit platform (or to be more specific a
platform where ulong is 8bytes big). Do this by right shift the hash
value 32bits and don't rely on the size parameter of the `string-hash`
function.
In code it could look something like this
--8<---------------cut here---------------start------------->8---
(define (compute-abi-cookie field-specs)
;; Compute an "ABI cookie" for the given FIELD-SPECS. We use
;; 'string-hash' because that's a better hash function that 'hash' on a
;; list of symbols.
(let ((hash
(syntax-case field-specs ()
(((field get properties ...) ...)
(let ((hash-value (string-hash (object->string
(syntax->datum #'((field properties ...) ...))))))
(if (= (native-word-size) 8)
(ash hash-value -32)
hash-value)))))
(fd (syntax-case field-specs ()
(((field get properties ...) ...)
(object->string
(syntax->datum #'((field properties ...) ...)))))))
(format #t "Compute-abi-cookie: ~a~%" hash)
hash))
--8<---------------cut here---------------end--------------->8---
where `native-word-size` is define by
--8<---------------cut here---------------start------------->8---
(define (native-word-size)
((@ (system foreign) sizeof) '*))
--8<---------------cut here---------------end--------------->8---
(taken from `cross-compilation.test`). There might be a cleaner way to
formulate this, but you get the point.
This seems to work for all combinations on my machine. I tested
x64 -> arm, x64 -> i868, i868 -> x64...
I can only think of two drawbacks.
1) Lost entropy on 64 bit machines
2) Abi break because on new compilation the hash values on 64bit
platforms will change.
1) is imho irrelevant, because it is not cryptophically important. For
2) i am not sure how important this is.
Any thoughts on this?
Might this be something worth fixing and sending a patch in?
Best regard
Christoph
--
Best regards
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ABI mismatch on boot on arm32 system
2024-11-06 10:25 ` Christoph Buck
@ 2024-11-11 7:47 ` Christoph Buck
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Buck @ 2024-11-11 7:47 UTC (permalink / raw)
To: Zack Weinberg; +Cc: Efraim Flashner, help-guix
Hi!
I submitted a patch which fixes the issue. See
> https://issues.guix.gnu.org/74296
Feedback is appreciated!
Christoph
--
Best regards
Christoph
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-11-11 8:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16 10:11 ABI mismatch on boot on arm32 system Christoph Buck
2024-10-16 20:05 ` Richard Sent
2024-10-20 15:15 ` Christoph Buck
2024-10-18 20:58 ` Denis 'GNUtoo' Carikli
2024-10-20 15:23 ` Christoph Buck
2024-10-20 15:39 ` Zack Weinberg
2024-10-20 17:24 ` Christoph Buck
2024-10-21 9:55 ` Christoph Buck
2024-10-29 17:11 ` Christoph Buck
2024-10-30 7:09 ` Efraim Flashner
2024-10-30 13:24 ` Christoph Buck
2024-11-06 10:25 ` Christoph Buck
2024-11-11 7:47 ` Christoph Buck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).