* bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!)
@ 2024-04-19 10:54 Tony Garnock-Jones
2024-04-19 12:14 ` bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions Tony Garnock-Jones
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-19 10:54 UTC (permalink / raw)
To: 70474
Hello all,
I'm seeing some very strange behaviour from `atomic-box-swap!` (but not
`atomic-box-compare-and-swap!`) on Guile 3.0.9 from Homebrew on OSX
Sonoma using an M3 Pro cpu. The issue does not seem to manifest on
x86_64. Could it be some interaction between Guile and M3 CPUs?
Or am I just doing something very silly that shouldn't work at all and
just happens to look like it works on x86_64?
Here's the program that fails. It will run for a few hundred million
rounds and then yield "q null in get". Note that using CAS seems to
work, but plain old swap doesn't.
;;--
;; Eventually this fails with "q null in get" if `atomic-box-swap!` is
;; used where marked (*) below. It takes usually between hundreds of
;; millions and a few billion increments to fail.
;;
;; It does NOT fail if the line marked (*) is commented out and the line
;; below it mentioning `atomic-box-compare-and-swap!` is uncommented and
;; used instead.
;;
;; The failure happens on OSX Sonoma 14.4.1 on a MacBook Pro running an
;; M3 Pro CPU using Guile version 3.0.9 from Homebrew as of 2024-04-17.
;;
;; It does NOT happen on AMD x86_64 Debian linux with Guile 3.0.9 from
;; Debian packaging.
(use-modules (ice-9 atomic))
(define r (make-atomic-box '(0)))
(let loop ()
(let ((v (let ((q (atomic-box-ref r)))
(when (null? q) (error "q null in get"))
(unless (eq? (atomic-box-compare-and-swap! r q (cdr q)) q)
(error "CAS failed in get"))
(car q))))
(when (zero? (remainder v 10000000)) (write v) (newline))
(unless (null?
(atomic-box-swap! r (list (+ v 1))) ;; (*)
;; (atomic-box-compare-and-swap! r '() (list (+ v 1)))
)
(error "swap failed in put"))
(loop)))
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions
2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
@ 2024-04-19 12:14 ` Tony Garnock-Jones
2024-04-19 13:19 ` bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Christopher Baines
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-19 12:14 UTC (permalink / raw)
To: 70474
A small bit of extra information: it's not just one machine; the problem
also manifests on an M1 running OSX 14.1.1. In addition, it happens with
newer Guile versions including `guile-next` from `aconchillo`'s Homebrew
tap and including a version I built from Guile git main just now.
Regards,
Tony
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!)
2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
2024-04-19 12:14 ` bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions Tony Garnock-Jones
@ 2024-04-19 13:19 ` Christopher Baines
2024-04-19 20:46 ` bug#70474: [PATCH 1/2] Including the cast makes Apple clang 15.0.0 happy; without it, clang is sad Tony Garnock-Jones
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Christopher Baines @ 2024-04-19 13:19 UTC (permalink / raw)
To: Tony Garnock-Jones; +Cc: 70474
[-- Attachment #1: Type: text/plain, Size: 849 bytes --]
Tony Garnock-Jones <tonyg@leastfixedpoint.com> writes:
> I'm seeing some very strange behaviour from `atomic-box-swap!` (but
> not `atomic-box-compare-and-swap!`) on Guile 3.0.9 from Homebrew on
> OSX Sonoma using an M3 Pro cpu. The issue does not seem to manifest on
> x86_64. Could it be some interaction between Guile and M3 CPUs?
>
> Or am I just doing something very silly that shouldn't work at all and
> just happens to look like it works on x86_64?
>
> Here's the program that fails. It will run for a few hundred million
> rounds and then yield "q null in get". Note that using CAS seems to
> work, but plain old swap doesn't.
There are known issue(s) with Guile JIT and atomics on ARM
(e.g. [1]). If the problem doesn't appear when disabling JIT, then
you're probably seeing the same issue.
1: https://github.com/wingo/fibers/issues/83
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#70474: [PATCH 1/2] Including the cast makes Apple clang 15.0.0 happy; without it, clang is sad
2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
2024-04-19 12:14 ` bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions Tony Garnock-Jones
2024-04-19 13:19 ` bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Christopher Baines
@ 2024-04-19 20:46 ` Tony Garnock-Jones
2024-04-19 20:48 ` bug#70474: [PATCH 2/2] Replace aarch64 CAS and atomic-swap generated JIT code with CASAL and SWPAL instructions Tony Garnock-Jones
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-19 20:46 UTC (permalink / raw)
To: 70474
I'm not sure why, exactly, but I needed this to get builds to work on
OSX Sonoma at all.
---
libguile/scmsigs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libguile/scmsigs.c b/libguile/scmsigs.c
index 7fd3fd8f1..be96dbd5c 100644
--- a/libguile/scmsigs.c
+++ b/libguile/scmsigs.c
@@ -302,7 +302,7 @@ scm_i_signals_post_fork ()
}
#if SCM_USE_PTHREAD_THREADS
- once = SCM_I_PTHREAD_ONCE_INIT;
+ once = (scm_i_pthread_once_t) SCM_I_PTHREAD_ONCE_INIT;
#endif
if (active)
scm_i_ensure_signal_delivery_thread ();
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#70474: [PATCH 2/2] Replace aarch64 CAS and atomic-swap generated JIT code with CASAL and SWPAL instructions
2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
` (2 preceding siblings ...)
2024-04-19 20:46 ` bug#70474: [PATCH 1/2] Including the cast makes Apple clang 15.0.0 happy; without it, clang is sad Tony Garnock-Jones
@ 2024-04-19 20:48 ` Tony Garnock-Jones
2024-04-22 7:52 ` bug#70474: Just adding DMB doesn't help Tony Garnock-Jones
2024-04-22 8:18 ` bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction Tony Garnock-Jones
5 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-19 20:48 UTC (permalink / raw)
To: 70474
This appears to make the problem go away. I'm new at working with
`lightening` so I'm not confident I've covered everything that needs
covering, particularly wrt the implementation of cas_atomic. But perhaps
this can be a foundation for someone who knows more than I do to work from.
---
libguile/lightening/lightening/aarch64-cpu.c | 41 ++++++--------------
1 file changed, 11 insertions(+), 30 deletions(-)
diff --git a/libguile/lightening/lightening/aarch64-cpu.c
b/libguile/lightening/lightening/aarch64-cpu.c
index 13aa351e9..30766652f 100644
--- a/libguile/lightening/lightening/aarch64-cpu.c
+++ b/libguile/lightening/lightening/aarch64-cpu.c
@@ -223,8 +223,8 @@ oxxrs(jit_state_t *_jit, int32_t Op,
#define A64_UMULH 0x9bc07c00
#define A64_LDAR 0xc8dffc00
#define A64_STLR 0xc89ffc00
-#define A64_LDAXR 0xc85ffc00
-#define A64_STLXR 0xc800fc00
+#define A64_SWPAL 0xf8e08000
+#define A64_CASAL 0xc8e0fc00
#define A64_STRBI 0x39000000
#define A64_LDRBI 0x39400000
#define A64_LDRSBI 0x39800000
@@ -664,15 +664,15 @@ STLR(jit_state_t *_jit, int32_t Rt, int32_t Rn)
}
static void
-LDAXR(jit_state_t *_jit, int32_t Rt, int32_t Rn) +SWPAL(jit_state_t
*_jit, int32_t Rt, int32_t Rn, int32_t Rs)
{
- return o_xx(_jit, A64_LDAXR, Rt, Rn);
+ return oxxx(_jit, A64_SWPAL, Rt, Rn, Rs);
}
static void
-STLXR(jit_state_t *_jit, int32_t Rt, int32_t Rn, int32_t Rm)
+CASAL(jit_state_t *_jit, int32_t Rt, int32_t Rn, int32_t Rs)
{
- return oxxx(_jit, A64_STLXR, Rt, Rn, Rm);
+ return oxxx(_jit, A64_CASAL, Rt, Rn, Rs);
}
static void
@@ -2532,36 +2532,17 @@ str_atomic(jit_state_t *_jit, int32_t loc,
int32_t val)
static void
swap_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t val)
{
- void *retry = jit_address(_jit);
- int32_t result = jit_gpr_regno(get_temp_gpr(_jit));
- int32_t val_or_tmp = dst == val ? jit_gpr_regno(get_temp_gpr(_jit)) :
val;
- movr(_jit, val_or_tmp, val);
- LDAXR(_jit, dst, loc);
- STLXR(_jit, val_or_tmp, loc, result);
- jit_patch_there(_jit, bnei(_jit, result, 0), retry);
- if (dst == val) unget_temp_gpr(_jit);
- unget_temp_gpr(_jit);
+ SWPAL(_jit, dst, loc, val);
}
static void
cas_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t expected,
int32_t desired)
{
- int32_t dst_or_tmp;
- if (dst == loc || dst == expected || dst == expected)
- dst_or_tmp = jit_gpr_regno(get_temp_gpr(_jit));
- else
- dst_or_tmp = dst;
- void *retry = jit_address(_jit);
- LDAXR(_jit, dst_or_tmp, loc);
- jit_reloc_t bad = bner(_jit, dst_or_tmp, expected);
- int result = jit_gpr_regno(get_temp_gpr(_jit));
- STLXR(_jit, desired, loc, result);
- jit_patch_there(_jit, bnei(_jit, result, 0), retry);
- unget_temp_gpr(_jit);
- jit_patch_here(_jit, bad);
- movr(_jit, dst, dst_or_tmp);
- unget_temp_gpr(_jit);
+ if (dst != expected) {
+ movr(_jit, dst, expected);
+ }
+ CASAL(_jit, desired, loc, dst);
}
static void
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#70474: Just adding DMB doesn't help
2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
` (3 preceding siblings ...)
2024-04-19 20:48 ` bug#70474: [PATCH 2/2] Replace aarch64 CAS and atomic-swap generated JIT code with CASAL and SWPAL instructions Tony Garnock-Jones
@ 2024-04-22 7:52 ` Tony Garnock-Jones
2024-04-22 8:18 ` bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction Tony Garnock-Jones
5 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-22 7:52 UTC (permalink / raw)
To: 70474
[-- Attachment #1: Type: text/plain, Size: 472 bytes --]
As an alternative to changing the JIT to produce SWPAL/CASAL, because I
wasn't sure if *all* aarch64 targets support these, I tried adding DMB
ISH or DMB SY to the end of the generated code sequences. Surprisingly,
this did not fix the issue! So there's perhaps something fishy about the
LDAXR-STLXR sequences themselves?
So for now I'll stick on my own machine with SWPAL/CASAL, since this
does seem to work well enough to let both my own code and fibers run.
Tony
[-- Attachment #2: add-dmb-does-not-help.patch --]
[-- Type: text/plain, Size: 1921 bytes --]
diff --git a/libguile/lightening/lightening/aarch64-cpu.c b/libguile/lightening/lightening/aarch64-cpu.c
index 13aa351e9..bff583e33 100644
--- a/libguile/lightening/lightening/aarch64-cpu.c
+++ b/libguile/lightening/lightening/aarch64-cpu.c
@@ -225,6 +225,7 @@ oxxrs(jit_state_t *_jit, int32_t Op,
#define A64_STLR 0xc89ffc00
#define A64_LDAXR 0xc85ffc00
#define A64_STLXR 0xc800fc00
+#define A64_DMB 0xd50330bf
#define A64_STRBI 0x39000000
#define A64_LDRBI 0x39400000
#define A64_LDRSBI 0x39800000
@@ -675,6 +676,31 @@ STLXR(jit_state_t *_jit, int32_t Rt, int32_t Rn, int32_t Rm)
return oxxx(_jit, A64_STLXR, Rt, Rn, Rm);
}
+static void
+DMB(jit_state_t *_jit, int32_t CRm)
+{
+ uint32_t inst = A64_DMB;
+ inst = write_unsigned_bitfield(inst, CRm, 4, 8);
+ emit_u32_with_pool(_jit, inst);
+}
+
+static void
+DMB_ISH(jit_state_t *_jit)
+{
+ DMB(_jit, 11);
+ // ^ 11 = ISH, "Inner Shareable". This is what Java apparently uses
+ // See
+ // - https://gist.github.com/RaasAhsan/8e3554a41e07068536425ca0de46c9e8
+ // - https://mail.openjdk.org/pipermail/hotspot-dev/2021-March/049694.html
+ // - https://bugs.openjdk.org/browse/JDK-8262519
+}
+
+static void
+DMB_SY(jit_state_t *_jit)
+{
+ DMB(_jit, 15);
+}
+
static void
LDRSB(jit_state_t *_jit, int32_t Rt, int32_t Rn, int32_t Rm)
{
@@ -2541,6 +2567,7 @@ swap_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t val)
jit_patch_there(_jit, bnei(_jit, result, 0), retry);
if (dst == val) unget_temp_gpr(_jit);
unget_temp_gpr(_jit);
+ DMB_SY(_jit);
}
static void
@@ -2562,6 +2589,7 @@ cas_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t expected,
jit_patch_here(_jit, bad);
movr(_jit, dst, dst_or_tmp);
unget_temp_gpr(_jit);
+ DMB_SY(_jit);
}
static void
^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction.
2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
` (4 preceding siblings ...)
2024-04-22 7:52 ` bug#70474: Just adding DMB doesn't help Tony Garnock-Jones
@ 2024-04-22 8:18 ` Tony Garnock-Jones
2024-04-22 11:23 ` Tony Garnock-Jones
5 siblings, 1 reply; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-22 8:18 UTC (permalink / raw)
To: 70474
Oh man. This little patch all by itself makes the problem behaviour go
away. No switching to SWPAL/CASAL, just tightening the spinloop. (And no
changes at all to the CAS code, so nothing to do with the fibers bug I
guess.)
With the patch, the spinloop goes LDAXR-STLXR-CBNZ (which is what GCC
does when SWPAL isn't there) instead of potentially MOV-LDAXR-STLXR-CBNZ
(which isn't).
Could the machine really be so sensitive to the target of the CBNZ?
---
libguile/lightening/lightening/aarch64-cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libguile/lightening/lightening/aarch64-cpu.c
b/libguile/lightening/lightening/aarch64-cpu.c
index 13aa351e9..4df712a0e 100644
--- a/libguile/lightening/lightening/aarch64-cpu.c
+++ b/libguile/lightening/lightening/aarch64-cpu.c
@@ -2532,10 +2532,10 @@ str_atomic(jit_state_t *_jit, int32_t loc,
int32_t val)
static void
swap_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t val)
{
- void *retry = jit_address(_jit);
int32_t result = jit_gpr_regno(get_temp_gpr(_jit));
int32_t val_or_tmp = dst == val ? jit_gpr_regno(get_temp_gpr(_jit))
: val;
movr(_jit, val_or_tmp, val);
+ void *retry = jit_address(_jit);
LDAXR(_jit, dst, loc);
STLXR(_jit, val_or_tmp, loc, result);
jit_patch_there(_jit, bnei(_jit, result, 0), retry);
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction.
2024-04-22 8:18 ` bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction Tony Garnock-Jones
@ 2024-04-22 11:23 ` Tony Garnock-Jones
0 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-22 11:23 UTC (permalink / raw)
To: 70474
Andy Wingo in IRC pointed out that the reason the patch appears to work
is that the `movr` isn't idempotent! By the time it comes round again,
`val` has already been overwritten by LDAXR in the case that `dst == val`.
On 22/04/2024 10:18, Tony Garnock-Jones wrote:
> Oh man. This little patch all by itself makes the problem behaviour go
> away. No switching to SWPAL/CASAL, just tightening the spinloop. (And no
> changes at all to the CAS code, so nothing to do with the fibers bug I
> guess.)
>
> With the patch, the spinloop goes LDAXR-STLXR-CBNZ (which is what GCC
> does when SWPAL isn't there) instead of potentially MOV-LDAXR-STLXR-CBNZ
> (which isn't).
>
> Could the machine really be so sensitive to the target of the CBNZ?
>
> ---
> libguile/lightening/lightening/aarch64-cpu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libguile/lightening/lightening/aarch64-cpu.c
> b/libguile/lightening/lightening/aarch64-cpu.c
> index 13aa351e9..4df712a0e 100644
> --- a/libguile/lightening/lightening/aarch64-cpu.c
> +++ b/libguile/lightening/lightening/aarch64-cpu.c
> @@ -2532,10 +2532,10 @@ str_atomic(jit_state_t *_jit, int32_t loc,
> int32_t val)
> static void
> swap_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t val)
> {
> - void *retry = jit_address(_jit);
> int32_t result = jit_gpr_regno(get_temp_gpr(_jit));
> int32_t val_or_tmp = dst == val ? jit_gpr_regno(get_temp_gpr(_jit))
> : val;
> movr(_jit, val_or_tmp, val);
> + void *retry = jit_address(_jit);
> LDAXR(_jit, dst, loc);
> STLXR(_jit, val_or_tmp, loc, result);
> jit_patch_there(_jit, bnei(_jit, result, 0), retry);
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-04-22 11:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
2024-04-19 12:14 ` bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions Tony Garnock-Jones
2024-04-19 13:19 ` bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Christopher Baines
2024-04-19 20:46 ` bug#70474: [PATCH 1/2] Including the cast makes Apple clang 15.0.0 happy; without it, clang is sad Tony Garnock-Jones
2024-04-19 20:48 ` bug#70474: [PATCH 2/2] Replace aarch64 CAS and atomic-swap generated JIT code with CASAL and SWPAL instructions Tony Garnock-Jones
2024-04-22 7:52 ` bug#70474: Just adding DMB doesn't help Tony Garnock-Jones
2024-04-22 8:18 ` bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction Tony Garnock-Jones
2024-04-22 11:23 ` Tony Garnock-Jones
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).