From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Linas Vepstas Newsgroups: gmane.lisp.guile.devel,gmane.lisp.guile.user Subject: guile-2.9.2 and threading Date: Sun, 2 Jun 2019 18:25:27 -0500 Message-ID: Reply-To: linasvepstas@gmail.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000c9a433058a5f90a4" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="124144"; mail-complaints-to="usenet@blaine.gmane.org" To: Guile Development , Guile User Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Jun 03 01:29:11 2019 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hXZuZ-000WA1-7m for guile-devel@m.gmane.org; Mon, 03 Jun 2019 01:29:11 +0200 Original-Received: from localhost ([127.0.0.1]:54316 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hXZuX-0000To-KG for guile-devel@m.gmane.org; Sun, 02 Jun 2019 19:29:09 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:45474) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hXZuQ-0000S3-S2 for guile-devel@gnu.org; Sun, 02 Jun 2019 19:29:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hXZrB-0006yC-IV for guile-devel@gnu.org; Sun, 02 Jun 2019 19:25:43 -0400 Original-Received: from mail-lj1-x234.google.com ([2a00:1450:4864:20::234]:33985) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hXZrB-0006xN-5B; Sun, 02 Jun 2019 19:25:41 -0400 Original-Received: by mail-lj1-x234.google.com with SMTP id j24so14464877ljg.1; Sun, 02 Jun 2019 16:25:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:from:date:message-id:subject:to; bh=4dO8L34TUDGTfazrdEtOmy5SlKVvyS16B/cm9/hqq9w=; b=LUaIj+egbLihx6efgZR+n7bpnj8UrNq/5FJeCMMq3saOrgyk3dMpY3SApWRBUmreZy TB7x+Vf8NQvSjjXPNlI1mm+m0fUgUaL6tbQCFyQB9x5txyvpcNSaCvFLs7Y5JwRjdoO2 lApEiYcbJqT09KHe9zieEHbleoMj7SkV4y6CMXpnrLA/cKjCgv9r1Aa6R6KOdDPLx8HS 3F+HVnWMRpY+B3iwTiR93sSxQWwtYvexsxVwatRWitjT+W6la3B8fg+kONjwX+2mlQgt xFxTv/fDv8kgJueqc5doDQOmnfpDzGzJ9ueyCcJU64Ply1noLLJuWcwIPXtgxQ0/56p0 uf9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:from:date:message-id :subject:to; bh=4dO8L34TUDGTfazrdEtOmy5SlKVvyS16B/cm9/hqq9w=; b=NGzPRldPOTNbTD8Qn/JHZvpqcmG5YEhFKRl9+wXQytZhnXfDhVf7W8RtuZPlvAwc/9 PJfqblDSrc0HozbVLiY3e0GNggSWEfuQwb5EMrrTFGXyyTNGo4AKRJBw2QddgWs4hjVw XwqNpmL2ZL3nz4r0kOwBplFE2k0wrHy67FxvWDyGUn4dwEhi8Etclj6vAx41n+zZ+Ojh EW4gPf/1Ky5Ugbo4u3xJkXfwwaHe5IyInxE9qsQo5R/kYvGInbSrQsw8k+3VCTcQi9c4 A2+ENfYEOddAVV/tC7PANID9C8gi4AQwM3OPk9K03PUY0AS/FxDJMqZ4BbqN20Fqd+Nx l5iA== X-Gm-Message-State: APjAAAVD/DALzhMR1DRqyxlZd6Xni4NemsoSfe8WYYV99SMea9tTp/m2 VkPHENaKdEeA5smSjQH7dyB56cjk85R+8J46rahO+ACOu68= X-Google-Smtp-Source: APXvYqyLdLSZtfJG3GogH6cbH1UwKMsu6dWAvouMpECjlGGeuqOKhmOGutvrev8ZOpoZw9IzcA4wsEbTkw5JKY1nGFQ= X-Received: by 2002:a2e:9010:: with SMTP id h16mr1868802ljg.49.1559517939211; Sun, 02 Jun 2019 16:25:39 -0700 (PDT) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::234 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.org gmane.lisp.guile.devel:19936 gmane.lisp.guile.user:15525 Archived-At: --000000000000c9a433058a5f90a4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm trying to understand how scm_jit_enter_mcode leads to scm_timed_lock_mutex ... I want to know who is attempting to lock, and why ... and how to work around this... Background: par-for-each works poorly for my app, I want to understand why. Specifically, how par-for-each and n-par-for-each work in guile-2.9.2. I've got my app, which is a guile/c++ mixture. I've got a list of 137078 items and a proc I want to apply. Single-threaded, it takes 530 seconds so about 137078/530 =3D 260 items/second. Baseline. So. (n-proc-for-each 6 proc list) and I see this: 230% cpu use in top (i.e. two+ active threads) Why only two? It's always about the same: n-proc-for-each 8, n-proc-for-each 12, whatever ... gdb shows that all but two threads are stuck here: ``` #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f343d27bb95 in scm_pthread_cond_wait (cond=3D, mutex=3D) at ../../libguile/threads.c:1615 #2 0x00007f343d27bd8b in block_self (queue=3D0x55ef64391cd0, mutex=3Dmutex@entry=3D0x55ef63d21fc0, waittime=3Dwaittime@entry=3D0x0) at ../../libguile/threads.c:316 #3 0x00007f343d27bedf in lock_mutex (current_thread=3D0x55f0a3d42c60, waittime=3D0x0, m=3D0x55ef63d21fc0, kind=3DSCM_MUTEX_STANDARD) at ../../libguile/threads.c:1031 #4 scm_timed_lock_mutex (mutex=3D0x55ef64391cc0, timeout=3D= ) at ../../libguile/threads.c:1092 #5 0x00007f343d56582a in ?? () #6 0x00007f34104bdf90 in ?? () #7 0x00007f343d4f00a0 in jump_table_ () from /usr/local/lib/libguile-3.0.so.0 #8 0x000055ef6347b3a8 in ?? () #9 0x00007f343d22bf61 in scm_jit_enter_mcode (thread=3D0x55f0a3d42840, mcode=3D0x55f0a3d42840 "\240)=D4=A3\360U") at ../../libguile/jit.c:4819 #10 0x00007f343d28089c in vm_debug_engine (thread=3D0x55f0a3d42840) at ../../libguile/vm-engine.c:370 #11 0x00007f343d28707a in scm_call_n (proc=3Dproc@entry=3D0x55eff19c9f00, argv=3Dargv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at ../../libguile/vm.= c:1605 ``` The other two threads are happily doing things in my app. I'm using (ice-9 threads) I see this: (define (n-par-for-each n proc . arglists) (let ((m (make-mutex)) (threads '())) (do ((i 0 (+ 1 i))) ((=3D i n) (for-each join-thread threads)) (set! threads (cons (begin-thread (let loop () (lock-mutex m) (if (null? (car arglists)) (unlock-mutex m) (let ((args (map car arglists))) (set! arglists (map cdr arglists)) (unlock-mutex m) (apply proc args) (loop))))) threads))))) Oh, I says to myself: bad bad mutex. Let me write a lock-less loop: it chops the list into n pieces, each in its own thread. (a bit sloppy, but adequate): (define (my-for-each n proc args) (define len (length args)) (define quo (euclidean-quotient len n)) (define rem (euclidean-remainder len n)) (define threads '()) (do ((i 0 (+ 1 i))) ((=3D i n) (for-each join-thread threads)) (set! threads (cons (begin-thread (for-each proc (take (drop args (* i quo)) quo))) threads))) (for-each proc (drop args (* n quo))) ) Let me go hog-wild: (my-for-each 12 proc list) (I have a cpu with that many cores) So... what happens? A little better .. not much. This time, gdb shows that there are four threads in my app. Two are stuck here: #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f343ca69bb5 in __GI___pthread_mutex_lock ( mutex=3Dmutex@entry=3D0x7f343d4f0f40 ) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f343d213e20 in scm_gc_register_allocation (size=3Dsize@entry=3D1= 6) at ../../libguile/gc.c:591 All the others are stuck in the first stack trace. Average speedup over one thread is 1.5x (345 seconds vs 525 seconds) and it burns about 250% CPU (according to top) to achieve this. Why aren't there 12 threads in my app? my-for-each is lockless, so where is the scm_timed_lock_mutex ... scm_jit_enter_mcode coming from? Using par-for-each results in times that are identical to single-thread times. Identical - no speedup, no slow-down. Toy sniff tests show that par-for-each really does run in parallel, so no clue why its not any faster= . FWIW, guile-2.2 behaves much worse. Running two threads, things got 1.5x faster. Running 4 threads, things ran at half-speed of single-threaded. Running six threads ran maybe 10x or 20x slower than single-threaded. Clearly, a really bad live-lock situation. I want to get out from under this. I'm trying to do big data, and this is a bottleneck. --linas --=20 cassette tapes - analog TV - film cameras - you --000000000000c9a433058a5f90a4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'm trying to understand how scm_jit_enter_mcode = leads to scm_timed_lock_mutex ... I want to know who is attempting to lock,= and why ... and how to work around this...

Ba= ckground: par-for-each works poorly for my app, I want to understand why. S= pecifically, how par-for-each and n-par-for-each work in guile-2.9.2.=C2=A0= I've got my app, which is a guile/c++ mixture.=C2=A0 I've got a li= st of 137078 items and a proc I want to apply.=C2=A0 Single-threaded, it ta= kes 530 seconds so about 137078/530 =3D 260 items/second. Baseline. So.
=

(n-proc-for-each 6 proc list)

and I see this: 230% cpu use in top (i.e. two+ active threads) Why on= ly two? It's always about the same: n-proc-for-each 8, n-proc-for-each = 12, whatever ...=C2=A0 gdb shows that all but two threads are stuck here:

```
#0 =C2=A0pthread_cond_wait@@GLIBC= _2.3.2 ()
=C2=A0 =C2=A0 at ../sysdeps/unix/sysv/linux/x86_64/pthread_con= d_wait.S:185
#1 =C2=A00x00007f343d27bb95 in scm_pthread_cond_wait (cond= =3D<optimized out>,
=C2=A0 =C2=A0 mutex=3D<optimized out>) a= t ../../libguile/threads.c:1615
#2 =C2=A00x00007f343d27bd8b in block_sel= f (queue=3D0x55ef64391cd0,
=C2=A0 =C2=A0 mutex=3Dmutex@entry=3D0x55ef63d= 21fc0, waittime=3Dwaittime@entry=3D0x0)
=C2=A0 =C2=A0 at ../../libguile/= threads.c:316
#3 =C2=A00x00007f343d27bedf in lock_mutex (current_thread= =3D0x55f0a3d42c60,
=C2=A0 =C2=A0 waittime=3D0x0, m=3D0x55ef63d21fc0, kin= d=3DSCM_MUTEX_STANDARD)
=C2=A0 =C2=A0 at ../../libguile/threads.c:1031#4 =C2=A0scm_timed_lock_mutex (mutex=3D0x55ef64391cc0, timeout=3D<opti= mized out>)
=C2=A0 =C2=A0 at ../../libguile/threads.c:1092
#5 =C2= =A00x00007f343d56582a in ?? ()
#6 =C2=A00x00007f34104bdf90 in ?? ()
#= 7 =C2=A00x00007f343d4f00a0 in jump_table_ () from /usr/local/lib/libguile-3= .0.so.0
#8 =C2=A00x000055ef6347b3a8 in ?? ()
#9 =C2=A00x00007f343d22b= f61 in scm_jit_enter_mcode (thread=3D0x55f0a3d42840,
=C2=A0 =C2=A0 mcode= =3D0x55f0a3d42840 "\240)=D4=A3\360U") at ../../libguile/jit.c:481= 9
#10 0x00007f343d28089c in vm_debug_engine (thread=3D0x55f0a3d42840)=C2=A0 =C2=A0 at ../../libguile/vm-engine.c:370
#11 0x00007f343d28707a = in scm_call_n (proc=3Dproc@entry=3D0x55eff19c9f00,
=C2=A0 =C2=A0 argv=3D= argv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at ../../libguile/vm.c:1605
<= /div>
```

The other two threads are happily do= ing things in my app.

I'm using (ice-9 threads= )=C2=A0=C2=A0 I see this:

(define (n-par-for-each = n proc . arglists)
=C2=A0 (let ((m (make-mutex))
=C2=A0 =C2=A0(thread= s '()))
=C2=A0 =C2=A0 (do ((i 0 (+ 1 i)))
=C2=A0 =C2=A0((=3D i n)=
=C2=A0 =C2=A0 (for-each join-thread threads))
=C2=A0 =C2=A0 =C2=A0 (= set! threads
=C2=A0 =C2=A0 =C2=A0 =C2=A0(cons (begin-thread
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0(let loop ()
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0(lock-mutex m)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(if (n= ull? (car arglists))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (unlock-mutex m)=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (let ((args (map car arglists)))=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (set! arglists (map cdr arglists= ))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (unlock-mutex m)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (apply proc args)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 (loop)))))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 threads)= ))))

Oh, I says to myself: bad bad mutex. Let me w= rite a lock-less loop: it chops the list into n pieces, each in its own thr= ead. (a bit sloppy, but adequate):

(define (my-fo= r-each n proc args)
=C2=A0 =C2=A0(define len (length args))
=C2=A0 = =C2=A0(define quo (euclidean-quotient len n))
=C2=A0 =C2=A0(define rem (= euclidean-remainder len n))
=C2=A0 =C2=A0(define threads '())
=C2= =A0 =C2=A0(do ((i 0 (+ 1 i)))
=C2=A0 =C2=A0 =C2=A0 ((=3D i n) (for-each = join-thread threads))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(set! threads=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (cons
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(begin-thread
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (for-each proc (take (drop args (* i= quo)) quo)))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0thr= eads)))
=C2=A0 =C2=A0(for-each proc (drop args (* n quo)))
)

Let me go hog-wild: (my-for-each 12 proc list)=C2=A0= =C2=A0 (I have a cpu with that many cores) So... what happens? A little bet= ter .. not much.=C2=A0 This time, gdb shows that there are four threads in = my app. Two are stuck here:

#0 =C2=A0__lll_lock_wa= it () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 =C2=A00= x00007f343ca69bb5 in __GI___pthread_mutex_lock (
=C2=A0 =C2=A0 mutex=3Dm= utex@entry=3D0x7f343d4f0f40 <bytes_until_gc_lock>)
=C2=A0 =C2=A0 a= t ../nptl/pthread_mutex_lock.c:80
#2 =C2=A00x00007f343d213e20 in scm_gc_= register_allocation (size=3Dsize@entry=3D16)
=C2=A0 =C2=A0 at ../../libg= uile/gc.c:591

All the others are stuck in the firs= t stack trace.=C2=A0 Average speedup over one thread is 1.5x (345 seconds v= s 525 seconds) and it burns about 250% CPU (according to top) to achieve th= is.

Why aren't there 12 threads in my app? my-= for-each is lockless, so where is the scm_timed_lock_mutex ... scm_jit_ente= r_mcode coming from?

Using par-for-each results in= times that are identical to single-thread times.=C2=A0 Identical - no spee= dup, no slow-down. Toy sniff tests show that par-for-each really does run i= n parallel, so no clue why its not any faster.

FWIW, guile-2.2 behaves much worse. Running two threads, things got 1.5x f= aster.=C2=A0 Running 4 threads, things ran at half-speed of single-threaded= . Running six threads ran maybe 10x or 20x slower than single-threaded.=C2= =A0 Clearly, a really bad live-lock situation. I want to get out from under= this.=C2=A0 I'm trying to do big data, and this is a bottleneck.
=

--linas

--
cassette tapes - analog TV - film cameras - you
--000000000000c9a433058a5f90a4--