unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l
@ 2018-09-20 17:13 Михаил Бахтерев
  2018-09-21 21:26 ` Mark H Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Михаил Бахтерев @ 2018-09-20 17:13 UTC (permalink / raw)
  To: 32786

[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]

Greetings.

I attach the code sample, which expected behaviour is:

  1. to periodically restart thread, which executes «sleep 1s» with
  system* call;

  2. and then to check if it should repeat the execution.
  
The flag, which controls that re-execution is managed by another thread
through atomic-box primitive.

The code does not run as expected: the sleep executing sleep starts
once, and change seems to remain invisible to main thread.

The code works as expected, when:

  1. (sleep 1) is used instead (system* "sleep" "1s");
  2. running the code on x86_64 cpus.

The use of affinity mask

  taskset 0x01 guile test.scm

does not help.

- MB. Respectfully

P.S. Additional information:

$ guile --version  
guile (GNU Guile) 2.2.4
Copyright (C) 2018 Free Software Foundation, Inc.

License LGPLv3+: GNU LGPL 3 or later <http://gnu.org/licenses/lgpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ sh config.guess
armv7l-unknown-linux-gnueabihf

$ pacman -Qi guile 
Name            : guile
Version         : 2.2.4-1
Description     : Portable, embeddable Scheme implementation written in C
Architecture    : armv7h
URL             : https://www.gnu.org/software/guile/
Licenses        : GPL
Groups          : None
Provides        : None
Depends On      : gmp  libltdl  ncurses  texinfo  libunistring  gc  libffi
Optional Deps   : None
Required By     : make
Optional For    : gnutls
Conflicts With  : None
Replaces        : None
Installed Size  : 40.83 MiB
Packager        : Arch Linux ARM Build System <builder+xu9@archlinuxarm.org>
Build Date      : Tue 03 Jul 2018 04:47:53 PM +05
Install Date    : Mon 09 Jul 2018 01:02:48 PM +05
Install Reason  : Installed as a dependency for another package
Install Script  : No
Validated By    : Signature

[-- Attachment #2: test.scm --]
[-- Type: application/vnd.lotus-screencam, Size: 1431 bytes --]

[-- Attachment #3: cpuinfo --]
[-- Type: text/plain, Size: 1058 bytes --]

processor	: 0
model name	: ARMv7 Processor rev 10 (v7l)
BogoMIPS	: 7.54
Features	: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

processor	: 1
model name	: ARMv7 Processor rev 10 (v7l)
BogoMIPS	: 7.54
Features	: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

processor	: 2
model name	: ARMv7 Processor rev 10 (v7l)
BogoMIPS	: 7.54
Features	: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

processor	: 3
model name	: ARMv7 Processor rev 10 (v7l)
BogoMIPS	: 7.54
Features	: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

Hardware	: Freescale i.MX6 Quad/DualLite (Device Tree)
Revision	: 0000
Serial		: 0000000000000000

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l
  2018-09-20 17:13 bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Михаил Бахтерев
@ 2018-09-21 21:26 ` Mark H Weaver
  2018-09-22  4:39   ` Михаил Бахтерев
  0 siblings, 1 reply; 8+ messages in thread
From: Mark H Weaver @ 2018-09-21 21:26 UTC (permalink / raw)
  To: Михаил Бахтерев
  Cc: 32786

Hi,

Михаил Бахтерев <mob@k.imm.uran.ru> writes:

> I attach the code sample, which expected behaviour is:
>
>   1. to periodically restart thread, which executes «sleep 1s» with
>   system* call;
>
>   2. and then to check if it should repeat the execution.
>   
> The flag, which controls that re-execution is managed by another thread
> through atomic-box primitive.
>
> The code does not run as expected: the sleep executing sleep starts
> once, and change seems to remain invisible to main thread.

Can you describe in more detail the behavior you are seeing?  What
messages do you see on stderr from your 'dump' calls when it gets stuck?
What state is the atomic box left in?

I see a couple of problems with the code below:

* Two threads write concurrently to the same port, namely
  (current-error-port).  You should use thread synchronization to ensure
  that operations to a given port are serialized.

* There's a race condition when the atomic box is in the #:accepted
  state.  If the main thread finds the box in that state, it changes the
  state to #:need-to-sleep, which will apparently cause the other thread
  to sleep again.

I'm not sure if these problems could explain what you're seeing.

     Regards,
       Mark


> The code works as expected, when:
>
>   1. (sleep 1) is used instead (system* "sleep" "1s");
>   2. running the code on x86_64 cpus.
>
> The use of affinity mask
>
>   taskset 0x01 guile test.scm
>
> does not help.
>
> - MB. Respectfully
>
> P.S. Additional information:
>
> $ guile --version  
> guile (GNU Guile) 2.2.4
> Copyright (C) 2018 Free Software Foundation, Inc.
>
> License LGPLv3+: GNU LGPL 3 or later <http://gnu.org/licenses/lgpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
> $ sh config.guess
> armv7l-unknown-linux-gnueabihf
>
> $ pacman -Qi guile 
> Name            : guile
> Version         : 2.2.4-1
> Description     : Portable, embeddable Scheme implementation written in C
> Architecture    : armv7h
> URL             : https://www.gnu.org/software/guile/
> Licenses        : GPL
> Groups          : None
> Provides        : None
> Depends On      : gmp  libltdl  ncurses  texinfo  libunistring  gc  libffi
> Optional Deps   : None
> Required By     : make
> Optional For    : gnutls
> Conflicts With  : None
> Replaces        : None
> Installed Size  : 40.83 MiB
> Packager        : Arch Linux ARM Build System <builder+xu9@archlinuxarm.org>
> Build Date      : Tue 03 Jul 2018 04:47:53 PM +05
> Install Date    : Mon 09 Jul 2018 01:02:48 PM +05
> Install Reason  : Installed as a dependency for another package
> Install Script  : No
> Validated By    : Signature
>
> (use-modules (ice-9 atomic)
>              (ice-9 threads))
>
> (define dump
>   (let ((p (current-error-port)))
>     (lambda (fmt . args) (apply format p fmt args))))
>
> (define state (make-atomic-box #:nothing-to-do))
>
> (define (expect e v)
>   (when (not (eq? e v))
>     (error "Protocol violation; (expecting . got):" (cons e v)))) 
>
> (define (sleep-loop)
>   (expect #:need-to-sleep (atomic-box-ref state))
>
>   (dump "R: Going to (sleep 1)~%")
>   (atomic-box-set! state #:accepted)
>   
>   ; SOMETHING WRONG IS HERE
>   (system* "sleep" "1s")
>   ; (sleep 1)
>
>   (let ((v (atomic-box-compare-and-swap! state #:accepted #:nothing-to-do)))
>     (when (not (eq? #:accepted v))
>       (dump "S: Repeating sleep~%")
>       (sleep-loop)))
>
>   (dump "R: sleep-loop finished~%"))
>
> (define (main-loop)
>   (define (run-thread)
>     (dump "M: new sleep thread~%")
>     (atomic-box-set! state #:need-to-sleep)
>     (call-with-new-thread sleep-loop)) 
>
>   (dump "M: (sleep 3)~%")
>   (sleep 3)
>
>   (dump "M: protocol exchange~%")
>   (let ((st (atomic-box-ref state)))
>     (case st 
>       ((#:nothing-to-do) (run-thread))
>       ((#:accepted) (let ((v (atomic-box-compare-and-swap! state #:accepted #:need-to-sleep)))
>                       (when (not (eq? #:accepted v))
>                         (expect #:accepted v)
>                         (run-thread))))
>       (else (expect #:need-to-sleep st))))
>
>   (main-loop))
>
> (main-loop)





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l
  2018-09-21 21:26 ` Mark H Weaver
@ 2018-09-22  4:39   ` Михаил Бахтерев
  2018-09-22  4:50     ` Михаил Бахтерев
  2018-09-27  5:49     ` Mark H Weaver
  0 siblings, 2 replies; 8+ messages in thread
From: Михаил Бахтерев @ 2018-09-22  4:39 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 32786

On Fri, Sep 21, 2018 at 05:26:27PM -0400, Mark H Weaver wrote:
> Hi,
> 
> Can you describe in more detail the behavior you are seeing?  What
> messages do you see on stderr from your 'dump' calls when it gets stuck?
> What state is the atomic box left in?
> 
> I see a couple of problems with the code below:
> 
> * Two threads write concurrently to the same port, namely
>   (current-error-port).  You should use thread synchronization to ensure
>   that operations to a given port are serialized.
> 
> * There's a race condition when the atomic box is in the #:accepted
>   state.  If the main thread finds the box in that state, it changes the
>   state to #:need-to-sleep, which will apparently cause the other thread
>   to sleep again.
> 
> I'm not sure if these problems could explain what you're seeing.
> 
>      Regards,
>        Mark
> 

The code is supposed to model the following behaviour.

The main thread tracks some events (in the example: the end of (sleep
3)). If one or multiple events occurred some long-lasting external
command should be executed. Long-lasting means that multiple events may
occur during execution.

To perform that the main thread should:

  1. either to run new work thread (sleep-loop procedure), which will
  call (system* ...);

  2. or to communicate the fact of events occurrences to the work
  thread, which is supposed to detect this and just repeat the command.

If multiple events have occurred during command execution the command
should be re-executed only once.

The event occurrence is communicated through simple protocol with three
states: #:nothing-to-do < #:accepted < #:need-to-sleep. The main thread
pulls state up, and work thread pulls it down. When the main thread
detects event, and the state is #:accepted, it means, that work thread
accepted the previous job, and executing it now, so the main thread
pulls the state up to need-to-sleep, and the work thread re-executes
command (sleeps again in the test case), when the previous external
command run has finished. This is intended behaviour.

I expect the program to print some garbage, showing that work thread is
restarted periodically. Something like that (it is result of using
(sleep 1) instead of (system* "sleep" "1s")):

$ guile test.scm
M: (sleep 3)
M: protocol exchange
M: new sleep thread
M: (sleep 3)

R: Going to (sleep 1)
R: sleep-loop finished
M: protocol exchange
M: new sleep thread
M: (sleep 3)
: (sleep 3)R: Going to (sleep 1)

R: sleep-loop finished
M: protocol exchange
M: new sleep thread
M: (sleep 3)

R: Going to (sleep 1)
R: sleep-loop finished
M: protocol exchange
M: new sleep thread
M: (sleep 3)

: (sleep 3)R: Going to (sleep 1)
R: sleep-loop finished

But what i get with (system* "sleep" "1s") is this:

$ guile test.scm
M: (sleep 3)
M: protocol exchange
M: new sleep thread
M: (sleep 3)
R: Going to (sleep 1)
R: sleep-loop finished
M: protocol exchange
M: (sleep 3)
M: protocol exchange
M: (sleep 3)
M: protocol exchange
M: (sleep 3)

So, the state of the atomic-box is stuck in #:need-to-sleep somehow.

- MB. Respectfully





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l
  2018-09-22  4:39   ` Михаил Бахтерев
@ 2018-09-22  4:50     ` Михаил Бахтерев
  2018-09-27  5:49     ` Mark H Weaver
  1 sibling, 0 replies; 8+ messages in thread
From: Михаил Бахтерев @ 2018-09-22  4:50 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 32786

By the way, if i change (system* "sleep" "1s") to the (system* "true"),
nothing changes. The log of execution is the same:

$ guile test.scm
M: (sleep 3)
M: protocol exchange
M: new sleep thread
M: (sleep 3)
R: Going to (sleep 1)
R: sleep-loop finished
M: protocol exchange
M: (sleep 3)
M: protocol exchange
M: (sleep 3)

But 3 seconds are more than enough to finish the work thread and pull
the state down to the #:nothing-to-do, which should cause the main
thread re-run work thread.

- MB. Respectfully

On Sat, Sep 22, 2018 at 09:39:38AM +0500, Михаил Бахтерев wrote:
> On Fri, Sep 21, 2018 at 05:26:27PM -0400, Mark H Weaver wrote:
> > Hi,
> > 
> > Can you describe in more detail the behavior you are seeing?  What
> > messages do you see on stderr from your 'dump' calls when it gets stuck?
> > What state is the atomic box left in?
> > 
> > I see a couple of problems with the code below:
> > 
> > * Two threads write concurrently to the same port, namely
> >   (current-error-port).  You should use thread synchronization to ensure
> >   that operations to a given port are serialized.
> > 
> > * There's a race condition when the atomic box is in the #:accepted
> >   state.  If the main thread finds the box in that state, it changes the
> >   state to #:need-to-sleep, which will apparently cause the other thread
> >   to sleep again.
> > 
> > I'm not sure if these problems could explain what you're seeing.
> > 
> >      Regards,
> >        Mark
> > 
> 
> The code is supposed to model the following behaviour.
> 
> The main thread tracks some events (in the example: the end of (sleep
> 3)). If one or multiple events occurred some long-lasting external
> command should be executed. Long-lasting means that multiple events may
> occur during execution.
> 
> To perform that the main thread should:
> 
>   1. either to run new work thread (sleep-loop procedure), which will
>   call (system* ...);
> 
>   2. or to communicate the fact of events occurrences to the work
>   thread, which is supposed to detect this and just repeat the command.
> 
> If multiple events have occurred during command execution the command
> should be re-executed only once.
> 
> The event occurrence is communicated through simple protocol with three
> states: #:nothing-to-do < #:accepted < #:need-to-sleep. The main thread
> pulls state up, and work thread pulls it down. When the main thread
> detects event, and the state is #:accepted, it means, that work thread
> accepted the previous job, and executing it now, so the main thread
> pulls the state up to need-to-sleep, and the work thread re-executes
> command (sleeps again in the test case), when the previous external
> command run has finished. This is intended behaviour.
> 
> I expect the program to print some garbage, showing that work thread is
> restarted periodically. Something like that (it is result of using
> (sleep 1) instead of (system* "sleep" "1s")):
> 
> $ guile test.scm
> M: (sleep 3)
> M: protocol exchange
> M: new sleep thread
> M: (sleep 3)
> 
> R: Going to (sleep 1)
> R: sleep-loop finished
> M: protocol exchange
> M: new sleep thread
> M: (sleep 3)
> : (sleep 3)R: Going to (sleep 1)
> 
> R: sleep-loop finished
> M: protocol exchange
> M: new sleep thread
> M: (sleep 3)
> 
> R: Going to (sleep 1)
> R: sleep-loop finished
> M: protocol exchange
> M: new sleep thread
> M: (sleep 3)
> 
> : (sleep 3)R: Going to (sleep 1)
> R: sleep-loop finished
> 
> But what i get with (system* "sleep" "1s") is this:
> 
> $ guile test.scm
> M: (sleep 3)
> M: protocol exchange
> M: new sleep thread
> M: (sleep 3)
> R: Going to (sleep 1)
> R: sleep-loop finished
> M: protocol exchange
> M: (sleep 3)
> M: protocol exchange
> M: (sleep 3)
> M: protocol exchange
> M: (sleep 3)
> 
> So, the state of the atomic-box is stuck in #:need-to-sleep somehow.
> 
> - MB. Respectfully





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l
  2018-09-22  4:39   ` Михаил Бахтерев
  2018-09-22  4:50     ` Михаил Бахтерев
@ 2018-09-27  5:49     ` Mark H Weaver
  2018-09-27  6:18       ` Михаил Бахтерев
  2018-10-05 19:43       ` Михаил Бахтерев
  1 sibling, 2 replies; 8+ messages in thread
From: Mark H Weaver @ 2018-09-27  5:49 UTC (permalink / raw)
  To: Михаил Бахтерев
  Cc: 32786

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

Hi,

Thanks for the additional details.  I was able to reproduce the bug, and
I believe I now see the problem.

'atomic-box-compare-and-swap!' is implemented using
'atomic_compare_exchange_weak' (if available), but neglects to handle
the case where 'atomic_compare_exchange_weak' may spuriously fail.  In
that case, the box is left unchanged, although the observed value was
equal to the expected value.

What's happening here is that the 'atomic-box-compare-and-swap!' in
'sleep-loop' fails spuriously, leaving the box in state #:accepted
although it returns #:accepted as its result.  When the main loop
discovers this, it changes the state to #:need-to-sleep, although the
thread has already ended.

To confirm this hypothesis, I added a print statement to the main loop
showing the state of the box that it observed during the protocol
exchange, and indeed it sees #:accepted the first time it checks, and
#:need-to-sleep in all later iterations.

I've attached a proposed patch that I hope will fix this problem.  If
you'd be willing to test it, I'd be grateful, but otherwise I'll try to
test it in the next week or so.  My access to armv7l boxes is somewhat
limited.

Thanks for this report.

      Mark



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: [PATCH] UNTESTED: Fix 'atomic-box-compare-and-swap!' --]
[-- Type: text/x-patch, Size: 3395 bytes --]

From 958d37686a9ac65f48cb2ca32a341cf182c24b5a Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@netris.org>
Date: Thu, 27 Sep 2018 01:00:11 -0400
Subject: [PATCH] UNTESTED: Fix 'atomic-box-compare-and-swap!'.

Fixes <https://bugs.gnu.org/32786>.

'scm_atomic_compare_and_swap_scm' uses 'atomic_compare_exchange_weak' if
it's available on the host platform.  'atomic_compare_exchange_weak' may
fail spuriously, leaving the atomic object unchanged even when it
contained the expected value.  'atomic-box-compare-and-swap!' uses
'scm_atomic_compare_and_swap_scm' without checking its return value, and
therefore may return the expected value while leaving the box unchanged.
This contradicts the documentation, which explicitly states that "you
can know if the swap worked by checking if the return value is 'eq?' to
EXPECTED.".

* libguile/atomic.c (scm_atomic_box_compare_and_swap_x): If
'scm_atomic_compare_and_swap_scm' returns false and the observed value
is equal to 'expected', then try again.
* libguile/vm-engine.c (atomic-box-compare-and-swap!): Ditto.
---
 libguile/atomic.c    | 13 +++++++++----
 libguile/vm-engine.c | 13 ++++++++-----
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/libguile/atomic.c b/libguile/atomic.c
index 950874030..504643912 100644
--- a/libguile/atomic.c
+++ b/libguile/atomic.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2016 Free Software Foundation, Inc.
+/* Copyright (C) 2016, 2018 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -95,10 +95,15 @@ SCM_DEFINE (scm_atomic_box_compare_and_swap_x,
             "if the return value is @code{eq?} to @var{expected}.")
 #define FUNC_NAME s_scm_atomic_box_compare_and_swap_x
 {
+  SCM result = expected;
+
   SCM_VALIDATE_ATOMIC_BOX (1, box);
-  scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box),
-                                   &expected, desired);
-  return expected;
+  while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box),
+                                           &result, desired)
+         && scm_is_eq (result, expected))
+    {
+    }
+  return result;
 }
 #undef FUNC_NAME
 
diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
index 19ff3e498..9650e33eb 100644
--- a/libguile/vm-engine.c
+++ b/libguile/vm-engine.c
@@ -3868,16 +3868,19 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
     {
       scm_t_uint16 dst, box;
       scm_t_uint32 expected, desired;
-      SCM scm_box, scm_expected;
+      SCM scm_box, scm_expected, scm_result;
       UNPACK_12_12 (op, dst, box);
       UNPACK_24 (ip[1], expected);
       UNPACK_24 (ip[2], desired);
       scm_box = SP_REF (box);
       VM_VALIDATE_ATOMIC_BOX (scm_box, "atomic-box-compare-and-swap!");
-      scm_expected = SP_REF (expected);
-      scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box),
-                                       &scm_expected, SP_REF (desired));
-      SP_SET (dst, scm_expected);
+      scm_result = scm_expected = SP_REF (expected);
+      while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box),
+                                               &scm_result, SP_REF (desired))
+             && scm_is_eq (scm_result, scm_expected))
+        {
+        }
+      SP_SET (dst, scm_result);
       NEXT (3);
     }
 
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l
  2018-09-27  5:49     ` Mark H Weaver
@ 2018-09-27  6:18       ` Михаил Бахтерев
  2018-10-05 19:43       ` Михаил Бахтерев
  1 sibling, 0 replies; 8+ messages in thread
From: Михаил Бахтерев @ 2018-09-27  6:18 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 32786

Thanks for clarification! I'll be able to test the patch in a couple of
days.

- MB. Respectfully

On Thu, Sep 27, 2018 at 01:49:23AM -0400, Mark H Weaver wrote:
> Hi,
> 
> Thanks for the additional details.  I was able to reproduce the bug, and
> I believe I now see the problem.
> 
> 'atomic-box-compare-and-swap!' is implemented using
> 'atomic_compare_exchange_weak' (if available), but neglects to handle
> the case where 'atomic_compare_exchange_weak' may spuriously fail.  In
> that case, the box is left unchanged, although the observed value was
> equal to the expected value.
> 
> What's happening here is that the 'atomic-box-compare-and-swap!' in
> 'sleep-loop' fails spuriously, leaving the box in state #:accepted
> although it returns #:accepted as its result.  When the main loop
> discovers this, it changes the state to #:need-to-sleep, although the
> thread has already ended.
> 
> To confirm this hypothesis, I added a print statement to the main loop
> showing the state of the box that it observed during the protocol
> exchange, and indeed it sees #:accepted the first time it checks, and
> #:need-to-sleep in all later iterations.
> 
> I've attached a proposed patch that I hope will fix this problem.  If
> you'd be willing to test it, I'd be grateful, but otherwise I'll try to
> test it in the next week or so.  My access to armv7l boxes is somewhat
> limited.
> 
> Thanks for this report.
> 
>       Mark
> 
> 






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l
  2018-09-27  5:49     ` Mark H Weaver
  2018-09-27  6:18       ` Михаил Бахтерев
@ 2018-10-05 19:43       ` Михаил Бахтерев
  2018-10-05 23:22         ` Mark H Weaver
  1 sibling, 1 reply; 8+ messages in thread
From: Михаил Бахтерев @ 2018-10-05 19:43 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 32786

Hi. I am sorry for delay.

I've tested the patch (i've applied it to the v2.2.4 source code). The
test sample works now as expected with both test «payloads»: (sleep
N-seconds) and (system* "any" "command" "here") calls.

Thanks for fixing it.

- MB. Respectfully

On Thu, Sep 27, 2018 at 01:49:23AM -0400, Mark H Weaver wrote:
> Hi,
> 
> Thanks for the additional details.  I was able to reproduce the bug, and
> I believe I now see the problem.
> 
> 'atomic-box-compare-and-swap!' is implemented using
> 'atomic_compare_exchange_weak' (if available), but neglects to handle
> the case where 'atomic_compare_exchange_weak' may spuriously fail.  In
> that case, the box is left unchanged, although the observed value was
> equal to the expected value.
> 
> What's happening here is that the 'atomic-box-compare-and-swap!' in
> 'sleep-loop' fails spuriously, leaving the box in state #:accepted
> although it returns #:accepted as its result.  When the main loop
> discovers this, it changes the state to #:need-to-sleep, although the
> thread has already ended.
> 
> To confirm this hypothesis, I added a print statement to the main loop
> showing the state of the box that it observed during the protocol
> exchange, and indeed it sees #:accepted the first time it checks, and
> #:need-to-sleep in all later iterations.
> 
> I've attached a proposed patch that I hope will fix this problem.  If
> you'd be willing to test it, I'd be grateful, but otherwise I'll try to
> test it in the next week or so.  My access to armv7l boxes is somewhat
> limited.
> 
> Thanks for this report.
> 
>       Mark
> 
> 

> From 958d37686a9ac65f48cb2ca32a341cf182c24b5a Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <mhw@netris.org>
> Date: Thu, 27 Sep 2018 01:00:11 -0400
> Subject: [PATCH] UNTESTED: Fix 'atomic-box-compare-and-swap!'.
> 
> Fixes <https://bugs.gnu.org/32786>.
> 
> 'scm_atomic_compare_and_swap_scm' uses 'atomic_compare_exchange_weak' if
> it's available on the host platform.  'atomic_compare_exchange_weak' may
> fail spuriously, leaving the atomic object unchanged even when it
> contained the expected value.  'atomic-box-compare-and-swap!' uses
> 'scm_atomic_compare_and_swap_scm' without checking its return value, and
> therefore may return the expected value while leaving the box unchanged.
> This contradicts the documentation, which explicitly states that "you
> can know if the swap worked by checking if the return value is 'eq?' to
> EXPECTED.".
> 
> * libguile/atomic.c (scm_atomic_box_compare_and_swap_x): If
> 'scm_atomic_compare_and_swap_scm' returns false and the observed value
> is equal to 'expected', then try again.
> * libguile/vm-engine.c (atomic-box-compare-and-swap!): Ditto.
> ---
>  libguile/atomic.c    | 13 +++++++++----
>  libguile/vm-engine.c | 13 ++++++++-----
>  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/libguile/atomic.c b/libguile/atomic.c
> index 950874030..504643912 100644
> --- a/libguile/atomic.c
> +++ b/libguile/atomic.c
> @@ -1,4 +1,4 @@
> -/* Copyright (C) 2016 Free Software Foundation, Inc.
> +/* Copyright (C) 2016, 2018 Free Software Foundation, Inc.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public License
> @@ -95,10 +95,15 @@ SCM_DEFINE (scm_atomic_box_compare_and_swap_x,
>              "if the return value is @code{eq?} to @var{expected}.")
>  #define FUNC_NAME s_scm_atomic_box_compare_and_swap_x
>  {
> +  SCM result = expected;
> +
>    SCM_VALIDATE_ATOMIC_BOX (1, box);
> -  scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box),
> -                                   &expected, desired);
> -  return expected;
> +  while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box),
> +                                           &result, desired)
> +         && scm_is_eq (result, expected))
> +    {
> +    }
> +  return result;
>  }
>  #undef FUNC_NAME
>  
> diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
> index 19ff3e498..9650e33eb 100644
> --- a/libguile/vm-engine.c
> +++ b/libguile/vm-engine.c
> @@ -3868,16 +3868,19 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
>      {
>        scm_t_uint16 dst, box;
>        scm_t_uint32 expected, desired;
> -      SCM scm_box, scm_expected;
> +      SCM scm_box, scm_expected, scm_result;
>        UNPACK_12_12 (op, dst, box);
>        UNPACK_24 (ip[1], expected);
>        UNPACK_24 (ip[2], desired);
>        scm_box = SP_REF (box);
>        VM_VALIDATE_ATOMIC_BOX (scm_box, "atomic-box-compare-and-swap!");
> -      scm_expected = SP_REF (expected);
> -      scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box),
> -                                       &scm_expected, SP_REF (desired));
> -      SP_SET (dst, scm_expected);
> +      scm_result = scm_expected = SP_REF (expected);
> +      while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box),
> +                                               &scm_result, SP_REF (desired))
> +             && scm_is_eq (scm_result, scm_expected))
> +        {
> +        }
> +      SP_SET (dst, scm_result);
>        NEXT (3);
>      }
>  
> -- 
> 2.19.0
> 






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l
  2018-10-05 19:43       ` Михаил Бахтерев
@ 2018-10-05 23:22         ` Mark H Weaver
  0 siblings, 0 replies; 8+ messages in thread
From: Mark H Weaver @ 2018-10-05 23:22 UTC (permalink / raw)
  To: Михаил Бахтерев
  Cc: 32786-done

Hi,

Михаил Бахтерев <mob@k.imm.uran.ru> writes:

> I've tested the patch (i've applied it to the v2.2.4 source code). The
> test sample works now as expected with both test «payloads»: (sleep
> N-seconds) and (system* "any" "command" "here") calls.

Thanks for letting us know the results of your testing, this is very
helpful.  I just pushed a similar fix (same code, improved comments) to
the stable-2.2 branch, commit 2d09e0513fc11e2305077ba3653f6e4c2f266ddb,
which will be in the upcoming guile-2.2.5 release.

    Thanks,
      Mark


>
> Thanks for fixing it.
>
> - MB. Respectfully
>
> On Thu, Sep 27, 2018 at 01:49:23AM -0400, Mark H Weaver wrote:
>> Hi,
>> 
>> Thanks for the additional details.  I was able to reproduce the bug, and
>> I believe I now see the problem.
>> 
>> 'atomic-box-compare-and-swap!' is implemented using
>> 'atomic_compare_exchange_weak' (if available), but neglects to handle
>> the case where 'atomic_compare_exchange_weak' may spuriously fail.  In
>> that case, the box is left unchanged, although the observed value was
>> equal to the expected value.
>> 
>> What's happening here is that the 'atomic-box-compare-and-swap!' in
>> 'sleep-loop' fails spuriously, leaving the box in state #:accepted
>> although it returns #:accepted as its result.  When the main loop
>> discovers this, it changes the state to #:need-to-sleep, although the
>> thread has already ended.
>> 
>> To confirm this hypothesis, I added a print statement to the main loop
>> showing the state of the box that it observed during the protocol
>> exchange, and indeed it sees #:accepted the first time it checks, and
>> #:need-to-sleep in all later iterations.
>> 
>> I've attached a proposed patch that I hope will fix this problem.  If
>> you'd be willing to test it, I'd be grateful, but otherwise I'll try to
>> test it in the next week or so.  My access to armv7l boxes is somewhat
>> limited.
>> 
>> Thanks for this report.
>> 
>>       Mark
>> 
>> 
>
>> From 958d37686a9ac65f48cb2ca32a341cf182c24b5a Mon Sep 17 00:00:00 2001
>> From: Mark H Weaver <mhw@netris.org>
>> Date: Thu, 27 Sep 2018 01:00:11 -0400
>> Subject: [PATCH] UNTESTED: Fix 'atomic-box-compare-and-swap!'.
>> 
>> Fixes <https://bugs.gnu.org/32786>.
>> 
>> 'scm_atomic_compare_and_swap_scm' uses 'atomic_compare_exchange_weak' if
>> it's available on the host platform.  'atomic_compare_exchange_weak' may
>> fail spuriously, leaving the atomic object unchanged even when it
>> contained the expected value.  'atomic-box-compare-and-swap!' uses
>> 'scm_atomic_compare_and_swap_scm' without checking its return value, and
>> therefore may return the expected value while leaving the box unchanged.
>> This contradicts the documentation, which explicitly states that "you
>> can know if the swap worked by checking if the return value is 'eq?' to
>> EXPECTED.".
>> 
>> * libguile/atomic.c (scm_atomic_box_compare_and_swap_x): If
>> 'scm_atomic_compare_and_swap_scm' returns false and the observed value
>> is equal to 'expected', then try again.
>> * libguile/vm-engine.c (atomic-box-compare-and-swap!): Ditto.
>> ---
>>  libguile/atomic.c    | 13 +++++++++----
>>  libguile/vm-engine.c | 13 ++++++++-----
>>  2 files changed, 17 insertions(+), 9 deletions(-)
>> 
>> diff --git a/libguile/atomic.c b/libguile/atomic.c
>> index 950874030..504643912 100644
>> --- a/libguile/atomic.c
>> +++ b/libguile/atomic.c
>> @@ -1,4 +1,4 @@
>> -/* Copyright (C) 2016 Free Software Foundation, Inc.
>> +/* Copyright (C) 2016, 2018 Free Software Foundation, Inc.
>>   *
>>   * This library is free software; you can redistribute it and/or
>>   * modify it under the terms of the GNU Lesser General Public License
>> @@ -95,10 +95,15 @@ SCM_DEFINE (scm_atomic_box_compare_and_swap_x,
>>              "if the return value is @code{eq?} to @var{expected}.")
>>  #define FUNC_NAME s_scm_atomic_box_compare_and_swap_x
>>  {
>> +  SCM result = expected;
>> +
>>    SCM_VALIDATE_ATOMIC_BOX (1, box);
>> -  scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box),
>> -                                   &expected, desired);
>> -  return expected;
>> +  while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box),
>> +                                           &result, desired)
>> +         && scm_is_eq (result, expected))
>> +    {
>> +    }
>> +  return result;
>>  }
>>  #undef FUNC_NAME
>>  
>> diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
>> index 19ff3e498..9650e33eb 100644
>> --- a/libguile/vm-engine.c
>> +++ b/libguile/vm-engine.c
>> @@ -3868,16 +3868,19 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
>>      {
>>        scm_t_uint16 dst, box;
>>        scm_t_uint32 expected, desired;
>> -      SCM scm_box, scm_expected;
>> +      SCM scm_box, scm_expected, scm_result;
>>        UNPACK_12_12 (op, dst, box);
>>        UNPACK_24 (ip[1], expected);
>>        UNPACK_24 (ip[2], desired);
>>        scm_box = SP_REF (box);
>>        VM_VALIDATE_ATOMIC_BOX (scm_box, "atomic-box-compare-and-swap!");
>> -      scm_expected = SP_REF (expected);
>> -      scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box),
>> -                                       &scm_expected, SP_REF (desired));
>> -      SP_SET (dst, scm_expected);
>> +      scm_result = scm_expected = SP_REF (expected);
>> +      while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box),
>> +                                               &scm_result, SP_REF (desired))
>> +             && scm_is_eq (scm_result, scm_expected))
>> +        {
>> +        }
>> +      SP_SET (dst, scm_result);
>>        NEXT (3);
>>      }
>>  
>> -- 
>> 2.19.0
>> 





^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-10-05 23:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-20 17:13 bug#32786: (system* ...) call somehow interferes with atomic-box on armv7l Михаил Бахтерев
2018-09-21 21:26 ` Mark H Weaver
2018-09-22  4:39   ` Михаил Бахтерев
2018-09-22  4:50     ` Михаил Бахтерев
2018-09-27  5:49     ` Mark H Weaver
2018-09-27  6:18       ` Михаил Бахтерев
2018-10-05 19:43       ` Михаил Бахтерев
2018-10-05 23:22         ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).