From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: Linas Vepstas <linasvepstas@gmail.com>
Newsgroups: gmane.lisp.guile.devel,gmane.lisp.guile.user
Subject: guile-2.9.2 and threading
Date: Sun, 2 Jun 2019 18:25:27 -0500
Message-ID: <CAHrUA35tHiw6huwxC6Rt=dKj4=W7XyDOS61tDtEYu0LF1_AmSQ@mail.gmail.com>
Reply-To: linasvepstas@gmail.com
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="000000000000c9a433058a5f90a4"
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="124144"; mail-complaints-to="usenet@blaine.gmane.org"
To: Guile Development <guile-devel@gnu.org>, Guile User <guile-user@gnu.org>
Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Jun 03 01:29:11 2019
Return-path: <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>
Envelope-to: guile-devel@m.gmane.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256)
	(Exim 4.89)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1hXZuZ-000WA1-7m
	for guile-devel@m.gmane.org; Mon, 03 Jun 2019 01:29:11 +0200
Original-Received: from localhost ([127.0.0.1]:54316 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1hXZuX-0000To-KG
	for guile-devel@m.gmane.org; Sun, 02 Jun 2019 19:29:09 -0400
Original-Received: from eggs.gnu.org ([209.51.188.92]:45474)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <linasvepstas@gmail.com>) id 1hXZuQ-0000S3-S2
	for guile-devel@gnu.org; Sun, 02 Jun 2019 19:29:04 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <linasvepstas@gmail.com>) id 1hXZrB-0006yC-IV
	for guile-devel@gnu.org; Sun, 02 Jun 2019 19:25:43 -0400
Original-Received: from mail-lj1-x234.google.com ([2a00:1450:4864:20::234]:33985)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <linasvepstas@gmail.com>)
	id 1hXZrB-0006xN-5B; Sun, 02 Jun 2019 19:25:41 -0400
Original-Received: by mail-lj1-x234.google.com with SMTP id j24so14464877ljg.1;
	Sun, 02 Jun 2019 16:25:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:reply-to:from:date:message-id:subject:to;
	bh=4dO8L34TUDGTfazrdEtOmy5SlKVvyS16B/cm9/hqq9w=;
	b=LUaIj+egbLihx6efgZR+n7bpnj8UrNq/5FJeCMMq3saOrgyk3dMpY3SApWRBUmreZy
	TB7x+Vf8NQvSjjXPNlI1mm+m0fUgUaL6tbQCFyQB9x5txyvpcNSaCvFLs7Y5JwRjdoO2
	lApEiYcbJqT09KHe9zieEHbleoMj7SkV4y6CMXpnrLA/cKjCgv9r1Aa6R6KOdDPLx8HS
	3F+HVnWMRpY+B3iwTiR93sSxQWwtYvexsxVwatRWitjT+W6la3B8fg+kONjwX+2mlQgt
	xFxTv/fDv8kgJueqc5doDQOmnfpDzGzJ9ueyCcJU64Ply1noLLJuWcwIPXtgxQ0/56p0
	uf9Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:reply-to:from:date:message-id
	:subject:to;
	bh=4dO8L34TUDGTfazrdEtOmy5SlKVvyS16B/cm9/hqq9w=;
	b=NGzPRldPOTNbTD8Qn/JHZvpqcmG5YEhFKRl9+wXQytZhnXfDhVf7W8RtuZPlvAwc/9
	PJfqblDSrc0HozbVLiY3e0GNggSWEfuQwb5EMrrTFGXyyTNGo4AKRJBw2QddgWs4hjVw
	XwqNpmL2ZL3nz4r0kOwBplFE2k0wrHy67FxvWDyGUn4dwEhi8Etclj6vAx41n+zZ+Ojh
	EW4gPf/1Ky5Ugbo4u3xJkXfwwaHe5IyInxE9qsQo5R/kYvGInbSrQsw8k+3VCTcQi9c4
	A2+ENfYEOddAVV/tC7PANID9C8gi4AQwM3OPk9K03PUY0AS/FxDJMqZ4BbqN20Fqd+Nx
	l5iA==
X-Gm-Message-State: APjAAAVD/DALzhMR1DRqyxlZd6Xni4NemsoSfe8WYYV99SMea9tTp/m2
	VkPHENaKdEeA5smSjQH7dyB56cjk85R+8J46rahO+ACOu68=
X-Google-Smtp-Source: APXvYqyLdLSZtfJG3GogH6cbH1UwKMsu6dWAvouMpECjlGGeuqOKhmOGutvrev8ZOpoZw9IzcA4wsEbTkw5JKY1nGFQ=
X-Received: by 2002:a2e:9010:: with SMTP id h16mr1868802ljg.49.1559517939211; 
	Sun, 02 Jun 2019 16:25:39 -0700 (PDT)
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2a00:1450:4864:20::234
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Developers list for Guile,
	the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guile-devel/>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Original-Sender: "guile-devel" <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.lisp.guile.devel:19936 gmane.lisp.guile.user:15525
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/19936>

--000000000000c9a433058a5f90a4
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I'm trying to understand how scm_jit_enter_mcode leads to
scm_timed_lock_mutex ... I want to know who is attempting to lock, and why
... and how to work around this...

Background: par-for-each works poorly for my app, I want to understand why.
Specifically, how par-for-each and n-par-for-each work in guile-2.9.2.
I've got my app, which is a guile/c++ mixture.  I've got a list of 137078
items and a proc I want to apply.  Single-threaded, it takes 530 seconds so
about 137078/530 =3D 260 items/second. Baseline. So.

(n-proc-for-each 6 proc list)

and I see this: 230% cpu use in top (i.e. two+ active threads) Why only
two? It's always about the same: n-proc-for-each 8, n-proc-for-each 12,
whatever ...  gdb shows that all but two threads are stuck here:

```
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f343d27bb95 in scm_pthread_cond_wait (cond=3D<optimized out>,
    mutex=3D<optimized out>) at ../../libguile/threads.c:1615
#2  0x00007f343d27bd8b in block_self (queue=3D0x55ef64391cd0,
    mutex=3Dmutex@entry=3D0x55ef63d21fc0, waittime=3Dwaittime@entry=3D0x0)
    at ../../libguile/threads.c:316
#3  0x00007f343d27bedf in lock_mutex (current_thread=3D0x55f0a3d42c60,
    waittime=3D0x0, m=3D0x55ef63d21fc0, kind=3DSCM_MUTEX_STANDARD)
    at ../../libguile/threads.c:1031
#4  scm_timed_lock_mutex (mutex=3D0x55ef64391cc0, timeout=3D<optimized out>=
)
    at ../../libguile/threads.c:1092
#5  0x00007f343d56582a in ?? ()
#6  0x00007f34104bdf90 in ?? ()
#7  0x00007f343d4f00a0 in jump_table_ () from
/usr/local/lib/libguile-3.0.so.0
#8  0x000055ef6347b3a8 in ?? ()
#9  0x00007f343d22bf61 in scm_jit_enter_mcode (thread=3D0x55f0a3d42840,
    mcode=3D0x55f0a3d42840 "\240)=D4=A3\360U") at ../../libguile/jit.c:4819
#10 0x00007f343d28089c in vm_debug_engine (thread=3D0x55f0a3d42840)
    at ../../libguile/vm-engine.c:370
#11 0x00007f343d28707a in scm_call_n (proc=3Dproc@entry=3D0x55eff19c9f00,
    argv=3Dargv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at ../../libguile/vm.=
c:1605
```

The other two threads are happily doing things in my app.

I'm using (ice-9 threads)   I see this:

(define (n-par-for-each n proc . arglists)
  (let ((m (make-mutex))
   (threads '()))
    (do ((i 0 (+ 1 i)))
   ((=3D i n)
    (for-each join-thread threads))
      (set! threads
       (cons (begin-thread
         (let loop ()
           (lock-mutex m)
           (if (null? (car arglists))
          (unlock-mutex m)
          (let ((args (map car arglists)))
            (set! arglists (map cdr arglists))
            (unlock-mutex m)
            (apply proc args)
            (loop)))))
        threads)))))

Oh, I says to myself: bad bad mutex. Let me write a lock-less loop: it
chops the list into n pieces, each in its own thread. (a bit sloppy, but
adequate):

(define (my-for-each n proc args)
   (define len (length args))
   (define quo (euclidean-quotient len n))
   (define rem (euclidean-remainder len n))
   (define threads '())
   (do ((i 0 (+ 1 i)))
      ((=3D i n) (for-each join-thread threads))
         (set! threads
            (cons
               (begin-thread
                  (for-each proc (take (drop args (* i quo)) quo)))
               threads)))
   (for-each proc (drop args (* n quo)))
)

Let me go hog-wild: (my-for-each 12 proc list)   (I have a cpu with that
many cores) So... what happens? A little better .. not much.  This time,
gdb shows that there are four threads in my app. Two are stuck here:

#0  __lll_lock_wait () at
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f343ca69bb5 in __GI___pthread_mutex_lock (
    mutex=3Dmutex@entry=3D0x7f343d4f0f40 <bytes_until_gc_lock>)
    at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f343d213e20 in scm_gc_register_allocation (size=3Dsize@entry=3D1=
6)
    at ../../libguile/gc.c:591

All the others are stuck in the first stack trace.  Average speedup over
one thread is 1.5x (345 seconds vs 525 seconds) and it burns about 250% CPU
(according to top) to achieve this.

Why aren't there 12 threads in my app? my-for-each is lockless, so where is
the scm_timed_lock_mutex ... scm_jit_enter_mcode coming from?

Using par-for-each results in times that are identical to single-thread
times.  Identical - no speedup, no slow-down. Toy sniff tests show that
par-for-each really does run in parallel, so no clue why its not any faster=
.

FWIW, guile-2.2 behaves much worse. Running two threads, things got 1.5x
faster.  Running 4 threads, things ran at half-speed of single-threaded.
Running six threads ran maybe 10x or 20x slower than single-threaded.
Clearly, a really bad live-lock situation. I want to get out from under
this.  I'm trying to do big data, and this is a bottleneck.

--linas

--=20
cassette tapes - analog TV - film cameras - you

--000000000000c9a433058a5f90a4
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I&#39;m trying to understand how scm_jit_enter_mcode =
leads to scm_timed_lock_mutex ... I want to know who is attempting to lock,=
 and why ... and how to work around this...<br></div><div><br></div><div>Ba=
ckground: par-for-each works poorly for my app, I want to understand why. S=
pecifically, how par-for-each and n-par-for-each work in guile-2.9.2.=C2=A0=
 I&#39;ve got my app, which is a guile/c++ mixture.=C2=A0 I&#39;ve got a li=
st of 137078 items and a proc I want to apply.=C2=A0 Single-threaded, it ta=
kes 530 seconds so about 137078/530 =3D 260 items/second. Baseline. So.<br>=
</div><div><br></div><div>(n-proc-for-each 6 proc list)</div><div><br></div=
><div>and I see this: 230% cpu use in top (i.e. two+ active threads) Why on=
ly two? It&#39;s always about the same: n-proc-for-each 8, n-proc-for-each =
12, whatever ...=C2=A0 gdb shows that all but two threads are stuck here:</=
div><div><br></div><div>```<br></div><div>#0 =C2=A0pthread_cond_wait@@GLIBC=
_2.3.2 ()<br>=C2=A0 =C2=A0 at ../sysdeps/unix/sysv/linux/x86_64/pthread_con=
d_wait.S:185<br>#1 =C2=A00x00007f343d27bb95 in scm_pthread_cond_wait (cond=
=3D&lt;optimized out&gt;,<br>=C2=A0 =C2=A0 mutex=3D&lt;optimized out&gt;) a=
t ../../libguile/threads.c:1615<br>#2 =C2=A00x00007f343d27bd8b in block_sel=
f (queue=3D0x55ef64391cd0,<br>=C2=A0 =C2=A0 mutex=3Dmutex@entry=3D0x55ef63d=
21fc0, waittime=3Dwaittime@entry=3D0x0)<br>=C2=A0 =C2=A0 at ../../libguile/=
threads.c:316<br>#3 =C2=A00x00007f343d27bedf in lock_mutex (current_thread=
=3D0x55f0a3d42c60,<br>=C2=A0 =C2=A0 waittime=3D0x0, m=3D0x55ef63d21fc0, kin=
d=3DSCM_MUTEX_STANDARD)<br>=C2=A0 =C2=A0 at ../../libguile/threads.c:1031<b=
r>#4 =C2=A0scm_timed_lock_mutex (mutex=3D0x55ef64391cc0, timeout=3D&lt;opti=
mized out&gt;)<br>=C2=A0 =C2=A0 at ../../libguile/threads.c:1092<br>#5 =C2=
=A00x00007f343d56582a in ?? ()<br>#6 =C2=A00x00007f34104bdf90 in ?? ()<br>#=
7 =C2=A00x00007f343d4f00a0 in jump_table_ () from /usr/local/lib/libguile-3=
.0.so.0<br>#8 =C2=A00x000055ef6347b3a8 in ?? ()<br>#9 =C2=A00x00007f343d22b=
f61 in scm_jit_enter_mcode (thread=3D0x55f0a3d42840,<br>=C2=A0 =C2=A0 mcode=
=3D0x55f0a3d42840 &quot;\240)=D4=A3\360U&quot;) at ../../libguile/jit.c:481=
9<br>#10 0x00007f343d28089c in vm_debug_engine (thread=3D0x55f0a3d42840)<br=
>=C2=A0 =C2=A0 at ../../libguile/vm-engine.c:370<br>#11 0x00007f343d28707a =
in scm_call_n (proc=3Dproc@entry=3D0x55eff19c9f00,<br>=C2=A0 =C2=A0 argv=3D=
argv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at ../../libguile/vm.c:1605<br><=
/div><div>```</div><div><br></div><div>The other two threads are happily do=
ing things in my app.</div><div><br></div><div>I&#39;m using (ice-9 threads=
)=C2=A0=C2=A0 I see this:</div><div><br></div><div>(define (n-par-for-each =
n proc . arglists)<br>=C2=A0 (let ((m (make-mutex))<br>=C2=A0 =C2=A0(thread=
s &#39;()))<br>=C2=A0 =C2=A0 (do ((i 0 (+ 1 i)))<br>=C2=A0 =C2=A0((=3D i n)=
<br>=C2=A0 =C2=A0 (for-each join-thread threads))<br>=C2=A0 =C2=A0 =C2=A0 (=
set! threads<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0(cons (begin-thread<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0(let loop ()<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0(lock-mutex m)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(if (n=
ull? (car arglists))<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (unlock-mutex m)=
 <br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (let ((args (map car arglists)))<br=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (set! arglists (map cdr arglists=
))<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (unlock-mutex m)<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (apply proc args)<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 (loop)))))<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 threads)=
))))</div><div><br></div><div>Oh, I says to myself: bad bad mutex. Let me w=
rite a lock-less loop: it chops the list into n pieces, each in its own thr=
ead.  (a bit sloppy, but adequate):</div><div><br></div><div>(define (my-fo=
r-each n proc args)<br>=C2=A0 =C2=A0(define len (length args))<br>=C2=A0 =
=C2=A0(define quo (euclidean-quotient len n))<br>=C2=A0 =C2=A0(define rem (=
euclidean-remainder len n))<br>=C2=A0 =C2=A0(define threads &#39;())<br>=C2=
=A0 =C2=A0(do ((i 0 (+ 1 i)))<br>=C2=A0 =C2=A0 =C2=A0 ((=3D i n) (for-each =
join-thread threads))<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(set! threads<br=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (cons<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(begin-thread<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (for-each proc (take (drop args (* i=
 quo)) quo)))<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0thr=
eads)))<br>=C2=A0 =C2=A0(for-each proc (drop args (* n quo)))<br>)<br></div=
><div><br></div><div>Let me go hog-wild: (my-for-each 12 proc list)=C2=A0=
=C2=A0 (I have a cpu with that many cores) So... what happens? A little bet=
ter .. not much.=C2=A0 This time, gdb shows that there are four threads in =
my app. Two are stuck here:</div><div><br></div><div>#0 =C2=A0__lll_lock_wa=
it () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135<br>#1 =C2=A00=
x00007f343ca69bb5 in __GI___pthread_mutex_lock (<br>=C2=A0 =C2=A0 mutex=3Dm=
utex@entry=3D0x7f343d4f0f40 &lt;bytes_until_gc_lock&gt;)<br>=C2=A0 =C2=A0 a=
t ../nptl/pthread_mutex_lock.c:80<br>#2 =C2=A00x00007f343d213e20 in scm_gc_=
register_allocation (size=3Dsize@entry=3D16)<br>=C2=A0 =C2=A0 at ../../libg=
uile/gc.c:591</div><div><br></div><div>All the others are stuck in the firs=
t stack trace.=C2=A0 Average speedup over one thread is 1.5x (345 seconds v=
s 525 seconds) and it burns about 250% CPU (according to top) to achieve th=
is.</div><div><br></div><div>Why aren&#39;t there 12 threads in my app? my-=
for-each is lockless, so where is the scm_timed_lock_mutex ... scm_jit_ente=
r_mcode coming from?</div><div><br></div><div>Using par-for-each results in=
 times that are identical to single-thread times.=C2=A0 Identical - no spee=
dup, no slow-down. Toy sniff tests show that par-for-each really does run i=
n parallel, so no clue why its not any faster.<br></div><div><br></div><div=
>FWIW, guile-2.2 behaves much worse. Running two threads, things got 1.5x f=
aster.=C2=A0 Running 4 threads, things ran at half-speed of single-threaded=
. Running six threads ran maybe 10x or 20x slower than single-threaded.=C2=
=A0 Clearly, a really bad live-lock situation. I want to get out from under=
 this.=C2=A0 I&#39;m trying to do big data, and this is a bottleneck.</div>=
<div><br></div><div>--linas<br></div><div><br></div><div>-- <br><div dir=3D=
"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=
=3D"ltr">cassette tapes - analog TV - film cameras - you<br></div></div></d=
iv></div>

--000000000000c9a433058a5f90a4--