unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!)
@ 2024-04-19 10:54 Tony Garnock-Jones
  2024-04-19 12:14 ` bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions Tony Garnock-Jones
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-19 10:54 UTC (permalink / raw)
  To: 70474

Hello all,

I'm seeing some very strange behaviour from `atomic-box-swap!` (but not 
`atomic-box-compare-and-swap!`) on Guile 3.0.9 from Homebrew on OSX 
Sonoma using an M3 Pro cpu. The issue does not seem to manifest on 
x86_64. Could it be some interaction between Guile and M3 CPUs?

Or am I just doing something very silly that shouldn't work at all and 
just happens to look like it works on x86_64?

Here's the program that fails. It will run for a few hundred million 
rounds and then yield "q null in get". Note that using CAS seems to 
work, but plain old swap doesn't.

;;--

;; Eventually this fails with "q null in get" if `atomic-box-swap!` is
;; used where marked (*) below. It takes usually between hundreds of
;; millions and a few billion increments to fail.
;;
;; It does NOT fail if the line marked (*) is commented out and the line
;; below it mentioning `atomic-box-compare-and-swap!` is uncommented and
;; used instead.
;;
;; The failure happens on OSX Sonoma 14.4.1 on a MacBook Pro running an
;; M3 Pro CPU using Guile version 3.0.9 from Homebrew as of 2024-04-17.
;;
;; It does NOT happen on AMD x86_64 Debian linux with Guile 3.0.9 from
;; Debian packaging.

(use-modules (ice-9 atomic))

(define r (make-atomic-box '(0)))

(let loop ()
   (let ((v (let ((q (atomic-box-ref r)))
              (when (null? q) (error "q null in get"))
              (unless (eq? (atomic-box-compare-and-swap! r q (cdr q)) q)
                (error "CAS failed in get"))
              (car q))))

     (when (zero? (remainder v 10000000)) (write v) (newline))

     (unless (null?
              (atomic-box-swap! r (list (+ v 1))) ;; (*)
              ;; (atomic-box-compare-and-swap! r '() (list (+ v 1)))
              )
       (error "swap failed in put"))

     (loop)))





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions
  2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
@ 2024-04-19 12:14 ` Tony Garnock-Jones
  2024-04-19 13:19 ` bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Christopher Baines
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-19 12:14 UTC (permalink / raw)
  To: 70474

A small bit of extra information: it's not just one machine; the problem 
also manifests on an M1 running OSX 14.1.1. In addition, it happens with 
newer Guile versions including `guile-next` from `aconchillo`'s Homebrew 
tap and including a version I built from Guile git main just now.

Regards,
   Tony





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!)
  2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
  2024-04-19 12:14 ` bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions Tony Garnock-Jones
@ 2024-04-19 13:19 ` Christopher Baines
  2024-04-19 20:46 ` bug#70474: [PATCH 1/2] Including the cast makes Apple clang 15.0.0 happy; without it, clang is sad Tony Garnock-Jones
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Christopher Baines @ 2024-04-19 13:19 UTC (permalink / raw)
  To: Tony Garnock-Jones; +Cc: 70474

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

Tony Garnock-Jones <tonyg@leastfixedpoint.com> writes:

> I'm seeing some very strange behaviour from `atomic-box-swap!` (but
> not `atomic-box-compare-and-swap!`) on Guile 3.0.9 from Homebrew on
> OSX Sonoma using an M3 Pro cpu. The issue does not seem to manifest on
> x86_64. Could it be some interaction between Guile and M3 CPUs?
>
> Or am I just doing something very silly that shouldn't work at all and
> just happens to look like it works on x86_64?
>
> Here's the program that fails. It will run for a few hundred million
> rounds and then yield "q null in get". Note that using CAS seems to
> work, but plain old swap doesn't.

There are known issue(s) with Guile JIT and atomics on ARM
(e.g. [1]). If the problem doesn't appear when disabling JIT, then
you're probably seeing the same issue.

1: https://github.com/wingo/fibers/issues/83

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#70474: [PATCH 1/2] Including the cast makes Apple clang 15.0.0 happy; without it, clang is sad
  2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
  2024-04-19 12:14 ` bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions Tony Garnock-Jones
  2024-04-19 13:19 ` bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Christopher Baines
@ 2024-04-19 20:46 ` Tony Garnock-Jones
  2024-04-19 20:48 ` bug#70474: [PATCH 2/2] Replace aarch64 CAS and atomic-swap generated JIT code with CASAL and SWPAL instructions Tony Garnock-Jones
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-19 20:46 UTC (permalink / raw)
  To: 70474

I'm not sure why, exactly, but I needed this to get builds to work on 
OSX Sonoma at all.

---
  libguile/scmsigs.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libguile/scmsigs.c b/libguile/scmsigs.c
index 7fd3fd8f1..be96dbd5c 100644
--- a/libguile/scmsigs.c
+++ b/libguile/scmsigs.c
@@ -302,7 +302,7 @@ scm_i_signals_post_fork ()
      }
   #if SCM_USE_PTHREAD_THREADS
-  once = SCM_I_PTHREAD_ONCE_INIT;
+  once = (scm_i_pthread_once_t) SCM_I_PTHREAD_ONCE_INIT;
  #endif
    if (active)
      scm_i_ensure_signal_delivery_thread ();
-- 
2.44.0






^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#70474: [PATCH 2/2] Replace aarch64 CAS and atomic-swap generated JIT code with CASAL and SWPAL instructions
  2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
                   ` (2 preceding siblings ...)
  2024-04-19 20:46 ` bug#70474: [PATCH 1/2] Including the cast makes Apple clang 15.0.0 happy; without it, clang is sad Tony Garnock-Jones
@ 2024-04-19 20:48 ` Tony Garnock-Jones
  2024-04-22  7:52 ` bug#70474: Just adding DMB doesn't help Tony Garnock-Jones
  2024-04-22  8:18 ` bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction Tony Garnock-Jones
  5 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-19 20:48 UTC (permalink / raw)
  To: 70474

This appears to make the problem go away. I'm new at working with 
`lightening` so I'm not confident I've covered everything that needs 
covering, particularly wrt the implementation of cas_atomic. But perhaps 
this can be a foundation for someone who knows more than I do to work from.

---
  libguile/lightening/lightening/aarch64-cpu.c | 41 ++++++--------------
  1 file changed, 11 insertions(+), 30 deletions(-)

diff --git a/libguile/lightening/lightening/aarch64-cpu.c 
b/libguile/lightening/lightening/aarch64-cpu.c
index 13aa351e9..30766652f 100644
--- a/libguile/lightening/lightening/aarch64-cpu.c
+++ b/libguile/lightening/lightening/aarch64-cpu.c
@@ -223,8 +223,8 @@ oxxrs(jit_state_t *_jit, int32_t Op,
  #define A64_UMULH                     0x9bc07c00
  #define A64_LDAR                      0xc8dffc00
  #define A64_STLR                      0xc89ffc00
-#define A64_LDAXR                     0xc85ffc00
-#define A64_STLXR                     0xc800fc00
+#define A64_SWPAL                     0xf8e08000
+#define A64_CASAL                     0xc8e0fc00
  #define A64_STRBI                     0x39000000
  #define A64_LDRBI                     0x39400000
  #define A64_LDRSBI                    0x39800000
@@ -664,15 +664,15 @@ STLR(jit_state_t *_jit, int32_t Rt, int32_t Rn)
  }
   static void
-LDAXR(jit_state_t *_jit, int32_t Rt, int32_t Rn) +SWPAL(jit_state_t 
*_jit, int32_t Rt, int32_t Rn, int32_t Rs)
  {
-  return o_xx(_jit, A64_LDAXR, Rt, Rn);
+  return oxxx(_jit, A64_SWPAL, Rt, Rn, Rs);
  }
   static void
-STLXR(jit_state_t *_jit, int32_t Rt, int32_t Rn, int32_t Rm)
+CASAL(jit_state_t *_jit, int32_t Rt, int32_t Rn, int32_t Rs)
  {
-  return oxxx(_jit, A64_STLXR, Rt, Rn, Rm);
+  return oxxx(_jit, A64_CASAL, Rt, Rn, Rs);
  }
   static void
@@ -2532,36 +2532,17 @@ str_atomic(jit_state_t *_jit, int32_t loc, 
int32_t val)
  static void
  swap_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t val)
  {
-  void *retry = jit_address(_jit);
-  int32_t result = jit_gpr_regno(get_temp_gpr(_jit));
-  int32_t val_or_tmp = dst == val ? jit_gpr_regno(get_temp_gpr(_jit)) : 
val;
-  movr(_jit, val_or_tmp, val);
-  LDAXR(_jit, dst, loc);
-  STLXR(_jit, val_or_tmp, loc, result);
-  jit_patch_there(_jit, bnei(_jit, result, 0), retry);
-  if (dst == val) unget_temp_gpr(_jit);
-  unget_temp_gpr(_jit);
+  SWPAL(_jit, dst, loc, val);
  }
   static void
  cas_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t expected,
             int32_t desired)
  {
-  int32_t dst_or_tmp;
-  if (dst == loc || dst == expected || dst == expected)
-    dst_or_tmp = jit_gpr_regno(get_temp_gpr(_jit));
-  else
-    dst_or_tmp = dst;
-  void *retry = jit_address(_jit);
-  LDAXR(_jit, dst_or_tmp, loc);
-  jit_reloc_t bad = bner(_jit, dst_or_tmp, expected);
-  int result = jit_gpr_regno(get_temp_gpr(_jit));
-  STLXR(_jit, desired, loc, result);
-  jit_patch_there(_jit, bnei(_jit, result, 0), retry);
-  unget_temp_gpr(_jit);
-  jit_patch_here(_jit, bad);
-  movr(_jit, dst, dst_or_tmp);
-  unget_temp_gpr(_jit);
+  if (dst != expected) {
+    movr(_jit, dst, expected);
+  }
+  CASAL(_jit, desired, loc, dst);
  }
   static void
-- 
2.44.0






^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#70474: Just adding DMB doesn't help
  2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
                   ` (3 preceding siblings ...)
  2024-04-19 20:48 ` bug#70474: [PATCH 2/2] Replace aarch64 CAS and atomic-swap generated JIT code with CASAL and SWPAL instructions Tony Garnock-Jones
@ 2024-04-22  7:52 ` Tony Garnock-Jones
  2024-04-22  8:18 ` bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction Tony Garnock-Jones
  5 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-22  7:52 UTC (permalink / raw)
  To: 70474

[-- Attachment #1: Type: text/plain, Size: 472 bytes --]

As an alternative to changing the JIT to produce SWPAL/CASAL, because I 
wasn't sure if *all* aarch64 targets support these, I tried adding DMB 
ISH or DMB SY to the end of the generated code sequences. Surprisingly, 
this did not fix the issue! So there's perhaps something fishy about the 
LDAXR-STLXR sequences themselves?

So for now I'll stick on my own machine with SWPAL/CASAL, since this 
does seem to work well enough to let both my own code and fibers run.

Tony

[-- Attachment #2: add-dmb-does-not-help.patch --]
[-- Type: text/plain, Size: 1921 bytes --]

diff --git a/libguile/lightening/lightening/aarch64-cpu.c b/libguile/lightening/lightening/aarch64-cpu.c
index 13aa351e9..bff583e33 100644
--- a/libguile/lightening/lightening/aarch64-cpu.c
+++ b/libguile/lightening/lightening/aarch64-cpu.c
@@ -225,6 +225,7 @@ oxxrs(jit_state_t *_jit, int32_t Op,
 #define A64_STLR                      0xc89ffc00
 #define A64_LDAXR                     0xc85ffc00
 #define A64_STLXR                     0xc800fc00
+#define A64_DMB                       0xd50330bf
 #define A64_STRBI                     0x39000000
 #define A64_LDRBI                     0x39400000
 #define A64_LDRSBI                    0x39800000
@@ -675,6 +676,31 @@ STLXR(jit_state_t *_jit, int32_t Rt, int32_t Rn, int32_t Rm)
   return oxxx(_jit, A64_STLXR, Rt, Rn, Rm);
 }
 
+static void
+DMB(jit_state_t *_jit, int32_t CRm)
+{
+  uint32_t inst = A64_DMB;
+  inst = write_unsigned_bitfield(inst, CRm, 4, 8);
+  emit_u32_with_pool(_jit, inst);
+}
+
+static void
+DMB_ISH(jit_state_t *_jit)
+{
+  DMB(_jit, 11);
+  // ^ 11 = ISH, "Inner Shareable". This is what Java apparently uses
+  // See
+  //  - https://gist.github.com/RaasAhsan/8e3554a41e07068536425ca0de46c9e8
+  //  - https://mail.openjdk.org/pipermail/hotspot-dev/2021-March/049694.html
+  //  - https://bugs.openjdk.org/browse/JDK-8262519
+}
+
+static void
+DMB_SY(jit_state_t *_jit)
+{
+  DMB(_jit, 15);
+}
+
 static void
 LDRSB(jit_state_t *_jit, int32_t Rt, int32_t Rn, int32_t Rm) 
 {
@@ -2541,6 +2567,7 @@ swap_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t val)
   jit_patch_there(_jit, bnei(_jit, result, 0), retry);
   if (dst == val) unget_temp_gpr(_jit);
   unget_temp_gpr(_jit);
+  DMB_SY(_jit);
 }
 
 static void
@@ -2562,6 +2589,7 @@ cas_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t expected,
   jit_patch_here(_jit, bad);
   movr(_jit, dst, dst_or_tmp);
   unget_temp_gpr(_jit);
+  DMB_SY(_jit);
 }
 
 static void

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction.
  2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
                   ` (4 preceding siblings ...)
  2024-04-22  7:52 ` bug#70474: Just adding DMB doesn't help Tony Garnock-Jones
@ 2024-04-22  8:18 ` Tony Garnock-Jones
  2024-04-22 11:23   ` Tony Garnock-Jones
  5 siblings, 1 reply; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-22  8:18 UTC (permalink / raw)
  To: 70474

Oh man. This little patch all by itself makes the problem behaviour go 
away. No switching to SWPAL/CASAL, just tightening the spinloop. (And no 
changes at all to the CAS code, so nothing to do with the fibers bug I 
guess.)

With the patch, the spinloop goes LDAXR-STLXR-CBNZ (which is what GCC 
does when SWPAL isn't there) instead of potentially MOV-LDAXR-STLXR-CBNZ 
(which isn't).

Could the machine really be so sensitive to the target of the CBNZ?

---
  libguile/lightening/lightening/aarch64-cpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libguile/lightening/lightening/aarch64-cpu.c 
b/libguile/lightening/lightening/aarch64-cpu.c
index 13aa351e9..4df712a0e 100644
--- a/libguile/lightening/lightening/aarch64-cpu.c
+++ b/libguile/lightening/lightening/aarch64-cpu.c
@@ -2532,10 +2532,10 @@ str_atomic(jit_state_t *_jit, int32_t loc, 
int32_t val)
  static void
  swap_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t val)
  {
-  void *retry = jit_address(_jit);
    int32_t result = jit_gpr_regno(get_temp_gpr(_jit));
    int32_t val_or_tmp = dst == val ? jit_gpr_regno(get_temp_gpr(_jit)) 
: val;
    movr(_jit, val_or_tmp, val);
+  void *retry = jit_address(_jit);
    LDAXR(_jit, dst, loc);
    STLXR(_jit, val_or_tmp, loc, result);
    jit_patch_there(_jit, bnei(_jit, result, 0), retry);
-- 
2.44.0






^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction.
  2024-04-22  8:18 ` bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction Tony Garnock-Jones
@ 2024-04-22 11:23   ` Tony Garnock-Jones
  0 siblings, 0 replies; 8+ messages in thread
From: Tony Garnock-Jones @ 2024-04-22 11:23 UTC (permalink / raw)
  To: 70474

Andy Wingo in IRC pointed out that the reason the patch appears to work 
is that the `movr` isn't idempotent! By the time it comes round again, 
`val` has already been overwritten by LDAXR in the case that `dst == val`.

On 22/04/2024 10:18, Tony Garnock-Jones wrote:
> Oh man. This little patch all by itself makes the problem behaviour go 
> away. No switching to SWPAL/CASAL, just tightening the spinloop. (And no 
> changes at all to the CAS code, so nothing to do with the fibers bug I 
> guess.)
> 
> With the patch, the spinloop goes LDAXR-STLXR-CBNZ (which is what GCC 
> does when SWPAL isn't there) instead of potentially MOV-LDAXR-STLXR-CBNZ 
> (which isn't).
> 
> Could the machine really be so sensitive to the target of the CBNZ?
> 
> ---
>   libguile/lightening/lightening/aarch64-cpu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libguile/lightening/lightening/aarch64-cpu.c 
> b/libguile/lightening/lightening/aarch64-cpu.c
> index 13aa351e9..4df712a0e 100644
> --- a/libguile/lightening/lightening/aarch64-cpu.c
> +++ b/libguile/lightening/lightening/aarch64-cpu.c
> @@ -2532,10 +2532,10 @@ str_atomic(jit_state_t *_jit, int32_t loc, 
> int32_t val)
>   static void
>   swap_atomic(jit_state_t *_jit, int32_t dst, int32_t loc, int32_t val)
>   {
> -  void *retry = jit_address(_jit);
>     int32_t result = jit_gpr_regno(get_temp_gpr(_jit));
>     int32_t val_or_tmp = dst == val ? jit_gpr_regno(get_temp_gpr(_jit)) 
> : val;
>     movr(_jit, val_or_tmp, val);
> +  void *retry = jit_address(_jit);
>     LDAXR(_jit, dst, loc);
>     STLXR(_jit, val_or_tmp, loc, result);
>     jit_patch_there(_jit, bnei(_jit, result, 0), retry);





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-04-22 11:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19 10:54 bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Tony Garnock-Jones
2024-04-19 12:14 ` bug#70474: Also manifests on an M1 running 14.1.1 and with newer Guile versions Tony Garnock-Jones
2024-04-19 13:19 ` bug#70474: Possible bug with `atomic-box-swap!` on OSX/M3 (?!?!) Christopher Baines
2024-04-19 20:46 ` bug#70474: [PATCH 1/2] Including the cast makes Apple clang 15.0.0 happy; without it, clang is sad Tony Garnock-Jones
2024-04-19 20:48 ` bug#70474: [PATCH 2/2] Replace aarch64 CAS and atomic-swap generated JIT code with CASAL and SWPAL instructions Tony Garnock-Jones
2024-04-22  7:52 ` bug#70474: Just adding DMB doesn't help Tony Garnock-Jones
2024-04-22  8:18 ` bug#70474: [PATCH] Move the spin loop target to the LDAXR instruction Tony Garnock-Jones
2024-04-22 11:23   ` Tony Garnock-Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).