* guile performance - Ackermann function: way slower than emacs, slower still if compiled @ 2009-08-04 9:28 Ken Raeburn 2009-08-04 13:52 ` Marijn Schouten (hkBst) ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Ken Raeburn @ 2009-08-04 9:28 UTC (permalink / raw) To: guile-devel [-- Attachment #1: Type: text/plain, Size: 5656 bytes --] I implemented Ackermann's function A(m,n), a recursive function with scary performance characteristics, based on the definition given at wikipedia involving lots of +1 and -1 operations... A(0,n) takes constant time to compute assuming bounded integer sizes, A(1,n) takes linear time, A(2,n) is quadratic, A(3,n) is exponential... A(4,n) takes a really long time if n=1, and for anything higher you can just give up now. (A(4,2) is in the neighborhood of 2**65536, and A(4,3) is about 2 raised to that power. And it gets there by adding or subtracting 1 in each invocation.) Optimizations are possible, like A(1,n)=n+2, A(2,n)=2n+3, A(3,n)=2**(n+3)-3, but I didn't encode those optimizations. I wrote both elisp and scheme versions; I changed the recursion to an explicitly-managed stack of values, to avoid some of the dynamic-vs- lexical argument binding issues, turn the native stack usage into large numbers of cell allocations and discards, and avoid exceeding the maximum Lisp evaluation depth Emacs permits. Not surprisingly, it's not too fast, and Emacs spends a lot of time in garbage collection passes even with the byte-compiled version. My time-test function here counts seconds of real time (on an 8-core Mac without a lot of other stuff going on, so CPU usage is probably very close): ELISP> (time-test '(ackermann 3 1)) 1.6e-05 ELISP> (time-test '(ackermann 3 6)) 0.036018 ELISP> (time-test '(ackermann 3 7)) 0.16574899999999998 ELISP> (time-test '(ackermann 3 8)) 0.6457170000000001 ELISP> (time-test '(ackermann 3 9)) 2.482042 ELISP> (time-test '(ackermann 4 0)) 1.4e-05 ELISP> (time-test '(ackermann 4 1)) 642.746514 ELISP> With the installed guile-1.8.7 binary and the interpreted Scheme version of the code, performance was worse: % time guile -l ack.scm -c '(ackermann 3 1)' 0.009u 0.003s 0:00.01 0.0% 0+0k 0+0io 0pf+0w % time guile -l ack.scm -c '(ackermann 3 6)' 0.409u 0.078s 0:00.52 90.3% 0+0k 0+0io 0pf+0w % time guile -l ack.scm -c '(ackermann 3 9)' 25.300u 4.540s 0:29.85 99.9% 0+0k 0+0io 0pf+0w % time guile -l ack.scm -c '(ackermann 4 1)' 6016.150u 1122.109s 1:58:59.32 99.9% 0+0k 8+4io 149pf+0w % Yes, that's just shy of two hours for the last one, where Emacs took less than 11 minutes. (Did I make some silly mistake in translating, that causes the Scheme version to be far less efficient?) It's not a terribly fair comparison, for various reasons, but moving on... I built and installed 1.9.1, so I could test with more recent code: % time guile -l ack.scm -c '(ackermann 3 1)' 0.026u 0.006s 0:00.03 66.6% 0+0k 0+0io 0pf+0w % time guile -l ack.scm -c '(ackermann 3 6)' 0.761u 0.141s 0:00.90 100.0% 0+0k 0+0io 0pf+0w % time guile -l ack.scm -c '(ackermann 3 9)' 47.242u 8.644s 0:55.89 99.9% 0+0k 0+0io 0pf+0w % Since this behaved worse than 1.8.6, I skipped the 4,1 test. I tested the compiled version: % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 1))' 0.012u 0.004s 0:00.01 100.0% 0+0k 0+0io 0pf+0w % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 6))' 0.772u 0.333s 0:01.10 100.0% 0+0k 0+0io 0pf+0w % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 9))' 48.728u 22.016s 1:10.75 99.9% 0+0k 0+0io 0pf+0w % Not much better than loading the .scm file, and only better for small values of n... in fact, with m=3 and n>=6 the performance seems worse in the precompiled case: % time guile -l ack.scm -c '(ackermann 3 10)' 181.997u 34.928s 3:36.94 99.9% 0+0k 0+0io 0pf+0w % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 10))' 196.678u 85.618s 4:42.34 99.9% 0+0k 0+0io 0pf+0w % Poking at it a couple of times with Apple's process sampling tool, it does seem to be spending a bit of time in garbage collection -- but scm_leave_guile (called from scm_async_click via scm_pthread_mutex_lock) and getrusage (called from scm_c_get_internal_run_time via mytime) also seem to show up on the stack a significant fraction of the time, and each of those appears to be a comparable if not bigger time sink. According to the execution timing with the compiled code, GC actually doesn't seem to be a huge fraction of the run time, maybe 8-9%: scheme@(guile-user)> ,t (ackermann 3 8) 2045 clock utime stime cutime cstime gctime 18.43 12.96 5.47 0.00 0.00 1.66 scheme@(guile-user)> ,t (ackermann 3 9) 4093 clock utime stime cutime cstime gctime 73.47 51.61 21.85 0.00 0.00 5.95 scheme@(guile-user)> The latter is the case that took under three seconds in Emacs, remember. Garbage collection is much more of a factor in the non-compiled version: scheme@(guile-user)> (load "ack.scm") scheme@(guile-user)> ,t (ackermann 3 8) 2045 clock utime stime cutime cstime gctime 21.97 19.74 2.21 0.00 0.00 7.76 scheme@(guile-user)> ,t (ackermann 3 9) 4093 clock utime stime cutime cstime gctime 86.26 77.37 8.88 0.00 0.00 30.20 scheme@(guile-user)> It also looks like execution time measurements may add more to the non- compiled execution time, as with the measurements the compiled code was still faster at 3,9. In retrospect, this is much heavier on the interpretation or execution of some of the simple operations than I intended, and Emacs benefits a lot from opcodes like "add1", and I'm more interested specifically in exercising allocation and GC at the moment, so it's not that useful a test for me after all. But I thought I'd send this along anyways in case someone was interested. In particular the fact that the performance of the precompiled version gets worse as n grows might be something to investigate, unless the cause is already known.... Ken [-- Attachment #2: ack.scm --] [-- Type: application/octet-stream, Size: 889 bytes --] (define (ackermann m n) (let ((stack (list m))) (while (not (null? stack)) ;; Compute A(top-of-stack, n) (set! m (car stack)) (set! stack (cdr stack)) (while (not (= m 0)) (cond ((and (= n 0) (> m 0)) ;;; (ackermann (- m 1) 1) (set! m (- m 1)) (set! n 1)) (#t ;;; (ackermann (- m 1) (ackermann m (- n 1))) ;; Push outer parameter onto the stack and ;; compute the inner function. (set! stack (cons (- m 1) stack)) (set! n (- n 1)) ))) ;; Now m=0 so result is n+1. (set! n (+ n 1)) ;; Done computing A() this round. )) n) (define (time-diff t1 t2) (+ (* 1.0e-6 (- (cdr t1) (cdr t2))) (- (car t1) (car t2)))) (define (ack-time-test m n) (let ((start (gettimeofday))) (ackermann m n) (display (time-diff (gettimeofday) start)) (newline))) [-- Attachment #3: ack.el --] [-- Type: application/octet-stream, Size: 867 bytes --] (defun ackermann (m n) (let ((stack (list m))) (while stack ;; Compute A(top-of-stack, n) (setq m (car stack) stack (cdr stack)) (while (not (= m 0)) (cond ((and (= n 0) (> m 0)) ;;; (ackermann (- m 1) 1) (setq m (- m 1) n 1)) (t ;;; (ackermann (- m 1) (ackermann m (- n 1))) ;; Push outer parameter onto the stack and ;; compute the inner function. (setq stack (cons (- m 1) stack) n (- n 1)) ))) ;; Now m=0 so result is n+1. (setq n (+ n 1)) ;; Done computing A() this round. )) n) (defun time-diff (t1 t2) (+ (* 65536.0 (- (nth 0 t1) (nth 0 t2))) (* 1.0 (- (nth 1 t1) (nth 1 t2))) (* 1.0e-6 (- (nth 2 t1) (nth 2 t2))))) (defun time-test (exp) (let ((start (current-time))) (eval exp) (time-diff (current-time) start))) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-04 9:28 guile performance - Ackermann function: way slower than emacs, slower still if compiled Ken Raeburn @ 2009-08-04 13:52 ` Marijn Schouten (hkBst) 2009-08-04 15:56 ` Ken Raeburn 2009-08-05 9:06 ` Andy Wingo ` (2 subsequent siblings) 3 siblings, 1 reply; 16+ messages in thread From: Marijn Schouten (hkBst) @ 2009-08-04 13:52 UTC (permalink / raw) To: Ken Raeburn; +Cc: guile-devel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ken Raeburn wrote: > I implemented Ackermann's function A(m,n), a recursive function with > scary performance characteristics, based on the definition given at > wikipedia involving lots of +1 and -1 operations... A(0,n) takes > constant time to compute assuming bounded integer sizes, A(1,n) takes > linear time, A(2,n) is quadratic, A(3,n) is exponential... A(4,n) takes > a really long time if n=1, and for anything higher you can just give up > now. (A(4,2) is in the neighborhood of 2**65536, and A(4,3) is about 2 > raised to that power. And it gets there by adding or subtracting 1 in > each invocation.) Optimizations are possible, like A(1,n)=n+2, > A(2,n)=2n+3, A(3,n)=2**(n+3)-3, but I didn't encode those optimizations. > > I wrote both elisp and scheme versions; I changed the recursion to an > explicitly-managed stack of values, to avoid some of the > dynamic-vs-lexical argument binding issues, turn the native stack usage > into large numbers of cell allocations and discards, and avoid exceeding > the maximum Lisp evaluation depth Emacs permits. Not surprisingly, it's > not too fast, and Emacs spends a lot of time in garbage collection > passes even with the byte-compiled version. My time-test function here > counts seconds of real time (on an 8-core Mac without a lot of other > stuff going on, so CPU usage is probably very close): > > ELISP> (time-test '(ackermann 3 1)) > 1.6e-05 > ELISP> (time-test '(ackermann 3 6)) > 0.036018 > ELISP> (time-test '(ackermann 3 7)) > 0.16574899999999998 > ELISP> (time-test '(ackermann 3 8)) > 0.6457170000000001 > ELISP> (time-test '(ackermann 3 9)) > 2.482042 > ELISP> (time-test '(ackermann 4 0)) > 1.4e-05 > ELISP> (time-test '(ackermann 4 1)) > 642.746514 > ELISP> > > With the installed guile-1.8.7 binary and the interpreted Scheme version > of the code, performance was worse: > > % time guile -l ack.scm -c '(ackermann 3 1)' > 0.009u 0.003s 0:00.01 0.0% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 6)' > 0.409u 0.078s 0:00.52 90.3% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 9)' > 25.300u 4.540s 0:29.85 99.9% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 4 1)' > 6016.150u 1122.109s 1:58:59.32 99.9% 0+0k 8+4io 149pf+0w > % > > Yes, that's just shy of two hours for the last one, where Emacs took > less than 11 minutes. > > (Did I make some silly mistake in translating, that causes the Scheme > version to be far less efficient?) > > It's not a terribly fair comparison, for various reasons, but moving on... > > I built and installed 1.9.1, so I could test with more recent code: > > % time guile -l ack.scm -c '(ackermann 3 1)' > 0.026u 0.006s 0:00.03 66.6% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 6)' > 0.761u 0.141s 0:00.90 100.0% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 9)' > 47.242u 8.644s 0:55.89 99.9% 0+0k 0+0io 0pf+0w > % > > Since this behaved worse than 1.8.6, I skipped the 4,1 test. I tested > the compiled version: > > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 1))' > 0.012u 0.004s 0:00.01 100.0% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 6))' > 0.772u 0.333s 0:01.10 100.0% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 9))' > 48.728u 22.016s 1:10.75 99.9% 0+0k 0+0io 0pf+0w > % > > Not much better than loading the .scm file, and only better for small > values of n... in fact, with m=3 and n>=6 the performance seems worse in > the precompiled case: > > % time guile -l ack.scm -c '(ackermann 3 10)' > 181.997u 34.928s 3:36.94 99.9% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 10))' > 196.678u 85.618s 4:42.34 99.9% 0+0k 0+0io 0pf+0w > % > > Poking at it a couple of times with Apple's process sampling tool, it > does seem to be spending a bit of time in garbage collection -- but > scm_leave_guile (called from scm_async_click via scm_pthread_mutex_lock) > and getrusage (called from scm_c_get_internal_run_time via mytime) also > seem to show up on the stack a significant fraction of the time, and > each of those appears to be a comparable if not bigger time sink. > > According to the execution timing with the compiled code, GC actually > doesn't seem to be a huge fraction of the run time, maybe 8-9%: > > scheme@(guile-user)> ,t (ackermann 3 8) > 2045 > clock utime stime cutime cstime gctime > 18.43 12.96 5.47 0.00 0.00 1.66 > scheme@(guile-user)> ,t (ackermann 3 9) > 4093 > clock utime stime cutime cstime gctime > 73.47 51.61 21.85 0.00 0.00 5.95 > scheme@(guile-user)> > > The latter is the case that took under three seconds in Emacs, remember. > > Garbage collection is much more of a factor in the non-compiled version: > > scheme@(guile-user)> (load "ack.scm") > scheme@(guile-user)> ,t (ackermann 3 8) > 2045 > clock utime stime cutime cstime gctime > 21.97 19.74 2.21 0.00 0.00 7.76 > scheme@(guile-user)> ,t (ackermann 3 9) > 4093 > clock utime stime cutime cstime gctime > 86.26 77.37 8.88 0.00 0.00 30.20 > scheme@(guile-user)> > > It also looks like execution time measurements may add more to the > non-compiled execution time, as with the measurements the compiled code > was still faster at 3,9. > > In retrospect, this is much heavier on the interpretation or execution > of some of the simple operations than I intended, and Emacs benefits a > lot from opcodes like "add1", and I'm more interested specifically in > exercising allocation and GC at the moment, so it's not that useful a > test for me after all. But I thought I'd send this along anyways in > case someone was interested. In particular the fact that the > performance of the precompiled version gets worse as n grows might be > something to investigate, unless the cause is already known.... A more idiomatic way to implement it that also works in other Schemes is: ; $ larceny -- acker.scm -e "(time (pretty-print (A 3 9)))" -e "(quit)" ; $ mzscheme -f acker.scm -e "(time (display (A 3 9))(newline))" ; $ time guile -l acker.scm -c "(display (A 3 9))(newline)" ; $ gsi acker.scm -e "(time (pp (A 3 9)))" (define (A m n) (let loop ((m+ (list m)) (n n)) (if (null? m+) n (let loop1 ((m (car m+)) (m* (cdr m+)) (n n)) (if (= m 0) (loop m* (+ n 1)) (if (= n 0) (loop1 (- m 1) m* 1) (loop1 m (cons (- m 1) m*) (- n 1)) )))))) On my machine larceny computes (A 3 9) in 83 ms and (A 4 1) in less than 22 seconds. In guile-1.8.6 this version is also twice as fast as your version for (A 3 9); they run 9s and 18s respectively. I haven't tried the new guile compiler yet. Marijn - -- If you cannot read my mind, then listen to what I say. Marijn Schouten (hkBst), Gentoo Lisp project, Gentoo ML <http://www.gentoo.org/proj/en/lisp/>, #gentoo-{lisp,ml} on FreeNode -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkp4ONoACgkQp/VmCx0OL2wCmwCgh4a3N6mifoRKuQXG/eaylTqb yzQAn0Kb6YRQy6CXyAEcs8t+yBZsHWLa =6wJR -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-04 13:52 ` Marijn Schouten (hkBst) @ 2009-08-04 15:56 ` Ken Raeburn 0 siblings, 0 replies; 16+ messages in thread From: Ken Raeburn @ 2009-08-04 15:56 UTC (permalink / raw) To: Marijn Schouten; +Cc: guile-devel On Aug 4, 2009, at 09:52, Marijn Schouten (hkBst) wrote: > A more idiomatic way to implement it that also works in other > Schemes is: I know, shame on me for using "set!" in my code. :-) I was aiming for code very similar between Lisp and Scheme versions, so that (insert some hand-waving here) it might be more reasonable to compare Emacs and Guile performance; named loops wouldn't work for Lisp, tail-recursive functions don't get optimized into loops in Emacs Lisp, etc. So while loops with explicit setting of new values seemed like a good compromise. (Then again, like I said, I was aiming for something to exercise the GC system in particular, and failed there anyways.) It would've been better still if I'd pulled up the elisp branch and compared that against Emacs for processing the same Lisp code.... > On my machine larceny computes (A 3 9) in 83 ms and (A 4 1) in less > than 22 > seconds. In guile-1.8.6 this version is also twice as fast as your > version for > (A 3 9); they run 9s and 18s respectively. I haven't tried the new > guile > compiler yet. Thanks! The compiler geek in me wants to suggest that the compiler should be able to transform away most of the differences and the performance should be nearly the same. :-) (The compiler geek in me also knows better than to expect it.) For completeness, the timings I get for A(3,9) with your version, on the same machine I used before: 1.8.7: 8.4s 1.9.1, not compiled: 13.1s 1.9.1, compiled: 2.8s I also tried A(4,1) in the compiled version, and it took 13m20s. So the 1.9.1-compiled version of your code, from two data points, appears to be only a bit slower than the byte-compiled Emacs version. Ken ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-04 9:28 guile performance - Ackermann function: way slower than emacs, slower still if compiled Ken Raeburn 2009-08-04 13:52 ` Marijn Schouten (hkBst) @ 2009-08-05 9:06 ` Andy Wingo 2009-08-05 14:06 ` Ken Raeburn 2009-08-08 21:52 ` guile performance - Ackermann function: way slower than emacs, slower still if compiled Ludovic Courtès 2009-08-05 10:42 ` Andy Wingo 2009-08-05 11:15 ` Andy Wingo 3 siblings, 2 replies; 16+ messages in thread From: Andy Wingo @ 2009-08-05 9:06 UTC (permalink / raw) To: Ken Raeburn; +Cc: guile-devel Hi Ken, On Tue 04 Aug 2009 11:28, Ken Raeburn <raeburn@raeburn.org> writes: > I implemented Ackermann's function A(m,n), a recursive function with > scary performance characteristics, based on the definition given at > wikipedia involving lots of +1 and -1 operations... Guile's VM does not yet have opcodes for +1 and and -1. It probably should, though. Still, this is a fairly artificial test. > I wrote both elisp and scheme versions; I changed the recursion to an > explicitly-managed stack of values, to avoid some of the dynamic-vs- > lexical argument binding issues, turn the native stack usage into large > numbers of cell allocations and discards, and avoid exceeding the > maximum Lisp evaluation depth Emacs permits. OK now we know we are entering the realm of the artificial benchmark :-) > Not surprisingly, it's > not too fast, and Emacs spends a lot of time in garbage collection > passes even with the byte-compiled version. My time-test function here > counts seconds of real time (on an 8-core Mac without a lot of other > stuff going on, so CPU usage is probably very close): > > ELISP> (time-test '(ackermann 3 1)) > 1.6e-05 > ELISP> (time-test '(ackermann 3 6)) > 0.036018 > ELISP> (time-test '(ackermann 3 7)) > 0.16574899999999998 > ELISP> (time-test '(ackermann 3 8)) > 0.6457170000000001 > ELISP> (time-test '(ackermann 3 9)) > 2.482042 > ELISP> (time-test '(ackermann 4 0)) > 1.4e-05 > ELISP> (time-test '(ackermann 4 1)) > 642.746514 OK > With the installed guile-1.8.7 binary and the interpreted Scheme version > of the code, performance was worse: > > % time guile -l ack.scm -c '(ackermann 3 1)' > 0.009u 0.003s 0:00.01 0.0% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 6)' > 0.409u 0.078s 0:00.52 90.3% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 9)' > 25.300u 4.540s 0:29.85 99.9% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 4 1)' > 6016.150u 1122.109s 1:58:59.32 99.9% 0+0k 8+4io 149pf+0w So, you're also timing Guile startup here, which takes about 20 ms. > Yes, that's just shy of two hours for the last one, where Emacs took > less than 11 minutes. Of course, that's comparing compiled code to interpreted code, but that is a pretty bad result, yes. > I built and installed 1.9.1, so I could test with more recent code: > > % time guile -l ack.scm -c '(ackermann 3 1)' > 0.026u 0.006s 0:00.03 66.6% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 6)' > 0.761u 0.141s 0:00.90 100.0% 0+0k 0+0io 0pf+0w > % time guile -l ack.scm -c '(ackermann 3 9)' > 47.242u 8.644s 0:55.89 99.9% 0+0k 0+0io 0pf+0w > % > > Since this behaved worse than 1.8.6, I skipped the 4,1 test. Yes, Guile 1.9's interpreter will be slower than 1.8's. > I tested > the compiled version: > > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 1))' > 0.012u 0.004s 0:00.01 100.0% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 6))' > 0.772u 0.333s 0:01.10 100.0% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 9))' > 48.728u 22.016s 1:10.75 99.9% 0+0k 0+0io 0pf+0w > % > > Not much better than loading the .scm file, and only better for small > values of n... in fact, with m=3 and n>=6 the performance seems worse > in the precompiled case: > > % time guile -l ack.scm -c '(ackermann 3 10)' > 181.997u 34.928s 3:36.94 99.9% 0+0k 0+0io 0pf+0w > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 10))' > 196.678u 85.618s 4:42.34 99.9% 0+0k 0+0io 0pf+0w > % > > Poking at it a couple of times with Apple's process sampling tool, it > does seem to be spending a bit of time in garbage collection -- but > scm_leave_guile (called from scm_async_click via > scm_pthread_mutex_lock) and getrusage (called from > scm_c_get_internal_run_time via mytime) also seem to show up on the > stack a significant fraction of the time, and each of those appears to > be a comparable if not bigger time sink. This could be simply that Mac OS has different performance characteristics than Linux. If locking uncontended mutexen or getting the resource usage of a process are expensive, that doesn't sound very good... > According to the execution timing with the compiled code, GC actually > doesn't seem to be a huge fraction of the run time, maybe 8-9%: > > scheme@(guile-user)> ,t (ackermann 3 8) > 2045 > clock utime stime cutime cstime gctime > 18.43 12.96 5.47 0.00 0.00 1.66 > scheme@(guile-user)> ,t (ackermann 3 9) > 4093 > clock utime stime cutime cstime gctime > 73.47 51.61 21.85 0.00 0.00 5.95 > scheme@(guile-user)> > > The latter is the case that took under three seconds in Emacs, > remember. Perhaps the deal is that 1+ is not being inlined. It seems to be calling out to the 1+ procedure, which is a primitive, actually making it a bit slower. I'll see what I can do. > In retrospect, this is much heavier on the interpretation or execution > of some of the simple operations than I intended, and Emacs benefits a > lot from opcodes like "add1", and I'm more interested specifically in > exercising allocation and GC at the moment, so it's not that useful a > test for me after all. But I thought I'd send this along anyways in > case someone was interested. In particular the fact that the > performance of the precompiled version gets worse as n grows might be > something to investigate, unless the cause is already known.... Thank you for the report, I'll investigate. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-05 9:06 ` Andy Wingo @ 2009-08-05 14:06 ` Ken Raeburn 2009-08-06 7:31 ` Ken Raeburn 2009-08-06 16:30 ` entering and leaving guile mode, and GC stack protection (was Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled) Ken Raeburn 2009-08-08 21:52 ` guile performance - Ackermann function: way slower than emacs, slower still if compiled Ludovic Courtès 1 sibling, 2 replies; 16+ messages in thread From: Ken Raeburn @ 2009-08-05 14:06 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel On Aug 5, 2009, at 05:06, Andy Wingo wrote: > Guile's VM does not yet have opcodes for +1 and and -1. It probably > should, though. Still, this is a fairly artificial test. Yeah, and it turned out to be useless for what I had wanted to test at the start, but I was still curious to see how it would work out. Based on your analysis of the "while" issue, it turned out to be interesting. :-) > So, you're also timing Guile startup here, which takes about 20 ms. Yes; except for the smallest cases, startup seems to be in the noise. >> Poking at it a couple of times with Apple's process sampling tool, it >> does seem to be spending a bit of time in garbage collection -- but >> scm_leave_guile (called from scm_async_click via >> scm_pthread_mutex_lock) and getrusage (called from >> scm_c_get_internal_run_time via mytime) also seem to show up on the >> stack a significant fraction of the time, and each of those appears >> to >> be a comparable if not bigger time sink. > > This could be simply that Mac OS has different performance > characteristics than Linux. If locking uncontended mutexen or getting > the resource usage of a process are expensive, that doesn't sound very > good... From my experience with profiling code on another project, mutexes definitely appear more expensive on Mac OS X vs Linux/glibc when there's contention. I just ran a quick test on a couple of x86 machines, and it does appear that the Mac OS X version of pthread_mutex_lock+pthread_mutex_unlock takes half again as many cycles as the GNU/Linux version, without contention. (A test with 10000000 iterations of getrusage(RUSAGE_SELF) shows a more drastic difference between OSes -- in terms of cycle counts, more than 3.5x as expensive on Mac OS X as on GNU/Linux.) But it's not pthread_mutex_lock that's showing up most often, it's sigaltstack and sigprocmask, called by scm_leave_guile. Some thoughts come to mind, but I don't have much time to work with them right now, maybe later: (1) In scm_pthread_mutex_lock, we leave and re-enter guile mode so that we don't block the thread while in guile mode. But we could use pthread_mutex_trylock first, and avoid the costs scm_leave_guile seems to incur on the Mac. If we can't acquire the lock, it should return immediately, and then we can do the expensive, blocking version. A quick, hack version of this changed my run time for A(3,8) from 17.5s to 14.5s, saving about 17%; sigaltstack and sigprocmask are still in the picture, because they're called from scm_catch_with_pre_unwind_handler. I'll work up a nicer patch later. (2) We're locking a global mutex to check if there's an async item pending for the current thread. First, why not a per-thread mutex so we don't have to potentially stop *all* threads just to see if the current one has async work to do? But second, it looks to me like by far the common case will be checking the async queue when it's empty -- not changing it, and not checking it when it's non-empty. If that's the case, can we use some other mechanism, perhaps atomic queue manipulation operations (like Apple's OSAtomic{Enqueue,Dequeue}) or atomic compare-and-swap, to speed up the common case? It introduces some portability problems, though, and if pthread_mutex_trylock helps enough maybe it doesn't matter. (3) My four-year-old comments on scm_enter/leave_guile, recorded in threads.c around line 300, still stand.... Those functions really ought to go away. At least they're confined to one file, now. Some of it looks a little messy, but I can probably get rid of some of the uses.... (4) Yeah, yeah, I should run gprof already. :-) Ken ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-05 14:06 ` Ken Raeburn @ 2009-08-06 7:31 ` Ken Raeburn 2009-08-26 22:39 ` Neil Jerram 2009-08-06 16:30 ` entering and leaving guile mode, and GC stack protection (was Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled) Ken Raeburn 1 sibling, 1 reply; 16+ messages in thread From: Ken Raeburn @ 2009-08-06 7:31 UTC (permalink / raw) To: guile-devel On Aug 5, 2009, at 10:06, Ken Raeburn wrote: > (1) In scm_pthread_mutex_lock, we leave and re-enter guile mode so > that we don't block the thread while in guile mode. But we could > use pthread_mutex_trylock first, and avoid the costs scm_leave_guile > seems to incur on the Mac. If we can't acquire the lock, it should > return immediately, and then we can do the expensive, blocking > version. A quick, hack version of this changed my run time for > A(3,8) from 17.5s to 14.5s, saving about 17%; sigaltstack and > sigprocmask are still in the picture, because they're called from > scm_catch_with_pre_unwind_handler. I'll work up a nicer patch later. Ah, we already had scm_i_pthread_mutex_trylock lying around; that made things easy. A second timing test with A(3,9) and this version of the patch (based on 1.9.1) shows the same improvement. diff --git a/libguile/threads.c b/libguile/threads.c index 9589336..8458a60 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -1826,10 +1826,15 @@ scm_std_select (int nfds, int scm_pthread_mutex_lock (scm_i_pthread_mutex_t *mutex) { - scm_t_guile_ticket t = scm_leave_guile (); - int res = scm_i_pthread_mutex_lock (mutex); - scm_enter_guile (t); - return res; + if (scm_i_pthread_mutex_trylock (mutex) == 0) + return 0; + else + { + scm_t_guile_ticket t = scm_leave_guile (); + int res = scm_i_pthread_mutex_lock (mutex); + scm_enter_guile (t); + return res; + } } static void ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-06 7:31 ` Ken Raeburn @ 2009-08-26 22:39 ` Neil Jerram 0 siblings, 0 replies; 16+ messages in thread From: Neil Jerram @ 2009-08-26 22:39 UTC (permalink / raw) To: Ken Raeburn; +Cc: guile-devel Ken Raeburn <raeburn@raeburn.org> writes: > Ah, we already had scm_i_pthread_mutex_trylock lying around; that made > things easy. > A second timing test with A(3,9) and this version of the patch (based > on 1.9.1) shows the same improvement. > > diff --git a/libguile/threads.c b/libguile/threads.c Thanks, I've committed this now. (Although note that this one will also evaporate when BDW-GC is merged.) Neil ^ permalink raw reply [flat|nested] 16+ messages in thread
* entering and leaving guile mode, and GC stack protection (was Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled) 2009-08-05 14:06 ` Ken Raeburn 2009-08-06 7:31 ` Ken Raeburn @ 2009-08-06 16:30 ` Ken Raeburn 2009-08-12 22:06 ` entering and leaving guile mode, and GC stack protection Andy Wingo 1 sibling, 1 reply; 16+ messages in thread From: Ken Raeburn @ 2009-08-06 16:30 UTC (permalink / raw) To: guile-devel [-- Attachment #1: Type: text/plain, Size: 2287 bytes --] On Aug 5, 2009, at 10:06, I wrote: > (3) My four-year-old comments on scm_enter/leave_guile, recorded in > threads.c around line 300, still stand.... Those functions really > ought to go away. At least they're confined to one file, now. Some > of it looks a little messy, but I can probably get rid of some of > the uses.... I've made a bit of progress on this. The basic issue is that the way scm_enter/leave_guile are implemented, they make some bad assumptions about how the stack will be laid out and where SCM values might get stored. The right way to tackle it, I believe, is to only provide functions like scm_without_guile that make callbacks, and get complete control over the transitions both into and out of guile mode so they can better manage the stack layout. It's not just scm_leave_guile() that needs to go away and be replaced by a different interface, but suspend() and anything that calls it without pairing it up nicely with resume(). Functions pairing them up need updating too, but not their interfaces; the code in between suspend and resume just needs to be pulled out into a callback function. So my changes in this first cut are basically: * New function while_suspended takes callback and replaces suspend...resume sequence; used only in scm_i_thread_sleep_for_gc now but scm_without_guile is an obvious additional place, when we can get rid of scm_leave_guile. * New helper function scm_without_guile_int, like scm_without_guile but return type is int. Static for now; would it be useful to export? * Replace scm_i_thread_put_to_sleep and _wake_up with scm_i_with_threads_sleeping which takes a callback. Change resize_all_states, scm_i_gc, scm_i_string_writable_chars accordingly. * Use scm_without_guile(_int) in scm_std_select, scm_pthread_mutex_lock (assumes my earlier change!), scm_pthread_cond_wait, scm_pthread_cond_timedwait, scm_i_with_threads_sleeping. There are still unpaired uses of scm_leave_guile and scm_enter_guile that I'm going to have to look at more closely before I can figure out what to do about them. And the problematic suspend() function itself is still there, but once its usage is confined to while_suspended, it'll be easier to clean up. Comments? Ken [-- Attachment #2: guile-patch.diff --] [-- Type: application/octet-stream, Size: 14137 bytes --] diff --git a/libguile/fluids.c b/libguile/fluids.c index bcd04c4..e532e39 100644 --- a/libguile/fluids.c +++ b/libguile/fluids.c @@ -87,6 +87,27 @@ static scm_t_bits tc16_dynamic_state; static SCM all_dynamic_states = SCM_EOL; static SCM all_fluids = SCM_EOL; +static void +update_all_states (void *p) +{ + SCM new_vectors = *(SCM *)p; + SCM state; + + for (state = all_dynamic_states; !scm_is_null (state); + state = DYNAMIC_STATE_NEXT (state)) + { + SCM old_fluids = DYNAMIC_STATE_FLUIDS (state); + SCM new_fluids = SCM_CAR (new_vectors); + size_t i, old_len = SCM_SIMPLE_VECTOR_LENGTH (old_fluids); + + for (i = 0; i < old_len; i++) + SCM_SIMPLE_VECTOR_SET (new_fluids, i, + SCM_SIMPLE_VECTOR_REF (old_fluids, i)); + SET_DYNAMIC_STATE_FLUIDS (state, new_fluids); + new_vectors = SCM_CDR (new_vectors); + } +} + /* Make sure that all states have the right size. This must be called while fluid_admin_mutex is held. */ @@ -112,21 +133,7 @@ resize_all_states () SCM_BOOL_F), new_vectors); - scm_i_thread_put_to_sleep (); - for (state = all_dynamic_states; !scm_is_null (state); - state = DYNAMIC_STATE_NEXT (state)) - { - SCM old_fluids = DYNAMIC_STATE_FLUIDS (state); - SCM new_fluids = SCM_CAR (new_vectors); - size_t i, old_len = SCM_SIMPLE_VECTOR_LENGTH (old_fluids); - - for (i = 0; i < old_len; i++) - SCM_SIMPLE_VECTOR_SET (new_fluids, i, - SCM_SIMPLE_VECTOR_REF (old_fluids, i)); - SET_DYNAMIC_STATE_FLUIDS (state, new_fluids); - new_vectors = SCM_CDR (new_vectors); - } - scm_i_thread_wake_up (); + scm_i_with_threads_sleeping (update_all_states, &new_vectors); } /* This is called during GC, that is, while being single threaded. diff --git a/libguile/gc.c b/libguile/gc.c index b7a3bf0..818aec6 100644 --- a/libguile/gc.c +++ b/libguile/gc.c @@ -561,18 +561,20 @@ scm_check_deprecated_memory_return () long int scm_i_last_marked_cell_count; -/* Must be called while holding scm_i_sweep_mutex. +/* Must be called while holding scm_i_sweep_mutex and with all threads + sleeping. This function is fairly long, but it touches various global variables. To not obscure the side effects on global variables, this function has not been split up. */ -void -scm_i_gc (const char *what) +static void +scm_i_gc_1 (void *p) { unsigned long t_before_gc = 0; - - scm_i_thread_put_to_sleep (); +#ifdef DEBUGINFO + const char *what = *(const char *)p; +#endif scm_c_hook_run (&scm_before_gc_c_hook, 0); @@ -678,8 +680,6 @@ scm_i_gc (const char *what) the time taken. */ scm_gc_time_taken += (scm_c_get_internal_run_time () - t_before_gc); - - scm_i_thread_wake_up (); /* For debugging purposes, you could do scm_i_sweep_all_segments ("debug"), but then the remains of the @@ -687,6 +687,11 @@ scm_i_gc (const char *what) */ } +void +scm_i_gc (const char *what) +{ + scm_i_with_threads_sleeping (scm_i_gc_1, &what); +} \f /* {GC Protection Helper Functions} diff --git a/libguile/strings.c b/libguile/strings.c index 4e21f3e..ab8cd14 100644 --- a/libguile/strings.c +++ b/libguile/strings.c @@ -339,6 +339,21 @@ scm_i_string_chars (SCM str) return STRINGBUF_CHARS (buf) + start; } +struct replace_data { + SCM str, new_buf; + size_t start; +}; + +static void +replace_stringbuf_data (void *p) +{ + struct replace_data *r = p; + + SET_STRING_STRINGBUF (r->str, r->new_buf); + r->start = STRING_START (r->str); + SET_STRING_START (r->str, 0); +} + char * scm_i_string_writable_chars (SCM orig_str) { @@ -357,19 +372,17 @@ scm_i_string_writable_chars (SCM orig_str) size_t len = STRING_LENGTH (str); SCM new_buf; + struct replace_data r; scm_i_pthread_mutex_unlock (&stringbuf_write_mutex); new_buf = make_stringbuf (len); memcpy (STRINGBUF_CHARS (new_buf), STRINGBUF_CHARS (buf) + STRING_START (str), len); - - scm_i_thread_put_to_sleep (); - SET_STRING_STRINGBUF (str, new_buf); - start -= STRING_START (str); - SET_STRING_START (str, 0); - scm_i_thread_wake_up (); - + r.str = str; + r.new_buf = new_buf; + scm_i_with_threads_sleeping (replace_stringbuf_data, &r); + start -= r.start; buf = new_buf; scm_i_pthread_mutex_lock (&stringbuf_write_mutex); diff --git a/libguile/threads.c b/libguile/threads.c index 8458a60..a2a98f3 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -428,6 +428,25 @@ suspend (void) setjmp (t->regs); return t; } + + +#if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1) +/* For paranoia's sake, make sure the stack slot we take the address + of is allocated after (deeper on the stack) than anything from the + parent, by not permitting inline expansion and thus possible + rearranging of things within the caller's stack frame. */ +static void +while_suspended (void (*func)(void *, scm_i_thread *), void *data) + __attribute__((noinline)); +#endif + +static void +while_suspended (void (*func)(void *, scm_i_thread *), void *data) +{ + scm_i_thread *t = suspend (); + func (data, t); + resume (t); +} static scm_t_guile_ticket scm_leave_guile () @@ -870,6 +889,20 @@ scm_without_guile (void *(*func)(void *), void *data) return res; } +/* Internal helper function -- like scm_without_guile, but for various + cases in this file where we actually want to return an int, not + a pointer. */ +static int +scm_without_guile_int (int (*func)(void *), void *data) +{ + int res; + scm_t_guile_ticket t; + t = scm_leave_guile (); + res = func (data); + scm_enter_guile (t); + return res; +} + /*** Thread creation */ typedef struct { @@ -1769,6 +1802,28 @@ scm_threads_mark_stacks (void) /*** Select */ +struct select_callback_data { + SELECT_TYPE *readfds, *writefds, *exceptfds; + struct timeval *timeout; + scm_i_thread *t; + int nfds, wakeup_fd, eno; +}; + +static int +select_callback (void *p) +{ + struct select_callback_data *cb = p; + int res; + + FD_SET (cb->wakeup_fd, cb->readfds); + if (cb->wakeup_fd >= cb->nfds) + cb->nfds = cb->wakeup_fd+1; + res = select (cb->nfds, cb->readfds, cb->writefds, cb->exceptfds, cb->timeout); + cb->t->sleep_fd = -1; + cb->eno = errno; + return res; +} + int scm_std_select (int nfds, SELECT_TYPE *readfds, @@ -1777,45 +1832,41 @@ scm_std_select (int nfds, struct timeval *timeout) { fd_set my_readfds; - int res, eno, wakeup_fd; + int res; scm_i_thread *t = SCM_I_CURRENT_THREAD; - scm_t_guile_ticket ticket; + struct select_callback_data cb = { + readfds, writefds, exceptfds, timeout, t, nfds, + }; if (readfds == NULL) { FD_ZERO (&my_readfds); - readfds = &my_readfds; + cb.readfds = &my_readfds; } while (scm_i_setup_sleep (t, SCM_BOOL_F, NULL, t->sleep_pipe[1])) SCM_TICK; - wakeup_fd = t->sleep_pipe[0]; - ticket = scm_leave_guile (); - FD_SET (wakeup_fd, readfds); - if (wakeup_fd >= nfds) - nfds = wakeup_fd+1; - res = select (nfds, readfds, writefds, exceptfds, timeout); - t->sleep_fd = -1; - eno = errno; - scm_enter_guile (ticket); + cb.wakeup_fd = t->sleep_pipe[0]; + + res = scm_without_guile_int (select_callback, &cb); scm_i_reset_sleep (t); - if (res > 0 && FD_ISSET (wakeup_fd, readfds)) + if (res > 0 && FD_ISSET (cb.wakeup_fd, cb.readfds)) { char dummy; - full_read (wakeup_fd, &dummy, 1); + full_read (cb.wakeup_fd, &dummy, 1); - FD_CLR (wakeup_fd, readfds); + FD_CLR (cb.wakeup_fd, cb.readfds); res -= 1; if (res == 0) { - eno = EINTR; + cb.eno = EINTR; res = -1; } } - errno = eno; + errno = cb.eno; return res; } @@ -1823,18 +1874,18 @@ scm_std_select (int nfds, #if SCM_USE_PTHREAD_THREADS +static int +mutex_lock_callback (void *p) +{ + return scm_i_pthread_mutex_lock (p); +} + int scm_pthread_mutex_lock (scm_i_pthread_mutex_t *mutex) { if (scm_i_pthread_mutex_trylock (mutex) == 0) return 0; - else - { - scm_t_guile_ticket t = scm_leave_guile (); - int res = scm_i_pthread_mutex_lock (mutex); - scm_enter_guile (t); - return res; - } + return scm_without_guile_int (mutex_lock_callback, mutex); } static void @@ -1850,14 +1901,45 @@ scm_dynwind_pthread_mutex_lock (scm_i_pthread_mutex_t *mutex) scm_dynwind_unwind_handler (do_unlock, mutex, SCM_F_WIND_EXPLICITLY); } +struct cond_wait_callback_data { + scm_i_pthread_cond_t *cond; + scm_i_pthread_mutex_t *mutex; +}; + +static int +cond_wait_callback (void *p) +{ + struct cond_wait_callback_data *cb = p; + int res; + scm_i_thread *t = SCM_I_CURRENT_THREAD; + t->held_mutex = cb->mutex; + res = scm_i_pthread_cond_wait (cb->cond, cb->mutex); + t->held_mutex = NULL; + return res; +} + int scm_pthread_cond_wait (scm_i_pthread_cond_t *cond, scm_i_pthread_mutex_t *mutex) { - scm_t_guile_ticket t = scm_leave_guile (); - ((scm_i_thread *)t)->held_mutex = mutex; - int res = scm_i_pthread_cond_wait (cond, mutex); - ((scm_i_thread *)t)->held_mutex = NULL; - scm_enter_guile (t); + struct cond_wait_callback_data cb = { cond, mutex }; + return scm_without_guile_int (cond_wait_callback, &cb); +} + +struct cond_timedwait_callback_data { + scm_i_pthread_cond_t *cond; + scm_i_pthread_mutex_t *mutex; + const scm_t_timespec *wt; +}; + +static int +cond_timedwait_callback (void *p) +{ + struct cond_timedwait_callback_data *cb = p; + int res; + scm_i_thread *t = SCM_I_CURRENT_THREAD; + t->held_mutex = cb->mutex; + res = scm_i_pthread_cond_timedwait (cb->cond, cb->mutex, cb->wt); + t->held_mutex = NULL; return res; } @@ -1866,12 +1948,8 @@ scm_pthread_cond_timedwait (scm_i_pthread_cond_t *cond, scm_i_pthread_mutex_t *mutex, const scm_t_timespec *wt) { - scm_t_guile_ticket t = scm_leave_guile (); - ((scm_i_thread *)t)->held_mutex = mutex; - int res = scm_i_pthread_cond_timedwait (cond, mutex, wt); - ((scm_i_thread *)t)->held_mutex = NULL; - scm_enter_guile (t); - return res; + struct cond_timedwait_callback_data cb = { cond, mutex, wt }; + return scm_without_guile_int (cond_timedwait_callback, &cb); } #endif @@ -1970,25 +2048,6 @@ int scm_i_thread_go_to_sleep; static int threads_initialized_p = 0; void -scm_i_thread_put_to_sleep () -{ - if (threads_initialized_p) - { - scm_i_thread *t; - - scm_leave_guile (); - scm_i_pthread_mutex_lock (&thread_admin_mutex); - - /* Signal all threads to go to sleep - */ - scm_i_thread_go_to_sleep = 1; - for (t = all_threads; t; t = t->next_thread) - scm_i_pthread_mutex_lock (&t->heap_mutex); - scm_i_thread_go_to_sleep = 0; - } -} - -void scm_i_thread_invalidate_freelists () { /* thread_admin_mutex is already locked. */ @@ -1999,34 +2058,68 @@ scm_i_thread_invalidate_freelists () t->clear_freelists_p = 1; } +struct sleeping_callback { + void (*func) (void *); + void *data; +}; + +static void * +sleeping_callback (void *p) +{ + struct sleeping_callback *cb = p; + scm_i_thread *t; + + scm_i_pthread_mutex_lock (&thread_admin_mutex); + + /* Signal all threads to go to sleep + */ + scm_i_thread_go_to_sleep = 1; + for (t = all_threads; t; t = t->next_thread) + scm_i_pthread_mutex_lock (&t->heap_mutex); + scm_i_thread_go_to_sleep = 0; + + cb->func (cb->data); + + scm_i_pthread_cond_broadcast (&wake_up_cond); + for (t = all_threads; t; t = t->next_thread) + scm_i_pthread_mutex_unlock (&t->heap_mutex); + scm_i_pthread_mutex_unlock (&thread_admin_mutex); + + return NULL; +} + void -scm_i_thread_wake_up () +scm_i_with_threads_sleeping (void (*func)(void *), void *data) { if (threads_initialized_p) { - scm_i_thread *t; - - scm_i_pthread_cond_broadcast (&wake_up_cond); - for (t = all_threads; t; t = t->next_thread) - scm_i_pthread_mutex_unlock (&t->heap_mutex); - scm_i_pthread_mutex_unlock (&thread_admin_mutex); - scm_enter_guile ((scm_t_guile_ticket) SCM_I_CURRENT_THREAD); + struct sleeping_callback cb = { func, data }; + scm_without_guile (sleeping_callback, &cb); + } + else + { + func (data); + /* Can func ever cause initialization to happen if it hadn't + been done before? */ + assert (threads_initialized_p == 0); } } -void -scm_i_thread_sleep_for_gc () +static void +sleep_for_gc_helper (void *p, scm_i_thread *t) { - scm_i_thread *t = suspend (); - /* Don't put t->heap_mutex in t->held_mutex here, because if the thread is cancelled during the cond wait, the thread's cleanup function (scm_leave_guile_cleanup) will handle unlocking the heap_mutex, so we don't need to do that again in on_thread_exit. */ scm_i_pthread_cond_wait (&wake_up_cond, &t->heap_mutex); +} - resume (t); +void +scm_i_thread_sleep_for_gc () +{ + while_suspended (sleep_for_gc_helper, NULL); } /* This mutex is used by SCM_CRITICAL_SECTION_START/END. diff --git a/libguile/threads.h b/libguile/threads.h index 32b0ea6..407acb7 100644 --- a/libguile/threads.h +++ b/libguile/threads.h @@ -153,8 +153,8 @@ SCM_INTERNAL void *scm_i_with_guile_and_parent (void *(*func)(void *), extern int scm_i_thread_go_to_sleep; -SCM_INTERNAL void scm_i_thread_put_to_sleep (void); -SCM_INTERNAL void scm_i_thread_wake_up (void); +SCM_INTERNAL void scm_i_with_threads_sleeping (void (*func)(void *), + void *data); SCM_INTERNAL void scm_i_thread_invalidate_freelists (void); void scm_i_thread_sleep_for_gc (void); [-- Attachment #3: Type: text/plain, Size: 1 bytes --] ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: entering and leaving guile mode, and GC stack protection 2009-08-06 16:30 ` entering and leaving guile mode, and GC stack protection (was Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled) Ken Raeburn @ 2009-08-12 22:06 ` Andy Wingo 2009-08-14 7:58 ` Ludovic Courtès 0 siblings, 1 reply; 16+ messages in thread From: Andy Wingo @ 2009-08-12 22:06 UTC (permalink / raw) To: Ken Raeburn; +Cc: guile-devel Hi Ken, On Thu 06 Aug 2009 18:30, Ken Raeburn <raeburn@raeburn.org> writes: > On Aug 5, 2009, at 10:06, I wrote: >> (3) My four-year-old comments on scm_enter/leave_guile, recorded in >> threads.c around line 300, still stand.... Those functions really >> ought to go away. At least they're confined to one file, now. Some >> of it looks a little messy, but I can probably get rid of some of the >> uses.... > > I've made a bit of progress on this. The patches look good to me; my only wonder is what relation they have to the BDW-GC branch Ludovic was working on. If BDW will land before 2.0, then perhaps all this mess can go away (wishful thinking); otherwise we should apply it now (after the release). Ludovic? :) Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: entering and leaving guile mode, and GC stack protection 2009-08-12 22:06 ` entering and leaving guile mode, and GC stack protection Andy Wingo @ 2009-08-14 7:58 ` Ludovic Courtès 0 siblings, 0 replies; 16+ messages in thread From: Ludovic Courtès @ 2009-08-14 7:58 UTC (permalink / raw) To: guile-devel Hello, Andy Wingo <wingo@pobox.com> writes: > On Thu 06 Aug 2009 18:30, Ken Raeburn <raeburn@raeburn.org> writes: > >> On Aug 5, 2009, at 10:06, I wrote: >>> (3) My four-year-old comments on scm_enter/leave_guile, recorded in >>> threads.c around line 300, still stand.... Those functions really >>> ought to go away. At least they're confined to one file, now. Some >>> of it looks a little messy, but I can probably get rid of some of the >>> uses.... >> >> I've made a bit of progress on this. > > The patches look good to me; my only wonder is what relation they have > to the BDW-GC branch Ludovic was working on. If BDW will land before > 2.0, then perhaps all this mess can go away (wishful thinking); > otherwise we should apply it now (after the release). Ludovic? :) Exactly. I've been meaning to reply to this thread because of this. These functions don't do much in the BDW-GC branch: http://git.savannah.gnu.org/cgit/guile.git/tree/libguile/threads.c?h=boehm-demers-weiser-gc#n379 Likewise, `scm_without_guile ()' does a `GC_do_blocking ()', but in many cases `scm_without_guile ()' is not needed because BDW-GC doesn't rely on cooperation from all threads to work. And yes, I do hope to have it part of 2.0, but I haven't taken the time to update it lately. Thanks, Ludo'. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-05 9:06 ` Andy Wingo 2009-08-05 14:06 ` Ken Raeburn @ 2009-08-08 21:52 ` Ludovic Courtès 2009-08-12 22:28 ` Andy Wingo 1 sibling, 1 reply; 16+ messages in thread From: Ludovic Courtès @ 2009-08-08 21:52 UTC (permalink / raw) To: guile-devel Hello! Andy Wingo <wingo@pobox.com> writes: > Perhaps the deal is that 1+ is not being inlined. It seems to be calling > out to the 1+ procedure, which is a primitive, actually making it a > bit slower. I'll see what I can do. Currently, invoking subrs conses, which is Bad(TM). Code that keeps invoking `1+' and `1-' must suffer from this. Thanks, Ludo'. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-08 21:52 ` guile performance - Ackermann function: way slower than emacs, slower still if compiled Ludovic Courtès @ 2009-08-12 22:28 ` Andy Wingo 0 siblings, 0 replies; 16+ messages in thread From: Andy Wingo @ 2009-08-12 22:28 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel Heya, On Sat 08 Aug 2009 23:52, ludo@gnu.org (Ludovic Courtès) writes: > Andy Wingo <wingo@pobox.com> writes: > >> Perhaps the deal is that 1+ is not being inlined. It seems to be calling >> out to the 1+ procedure, which is a primitive, actually making it a >> bit slower. I'll see what I can do. > > Currently, invoking subrs conses, which is Bad(TM). Indeed. > Code that keeps invoking `1+' and `1-' must suffer from this. Not any more! There are add1 and sub1 instructions now. A -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-04 9:28 guile performance - Ackermann function: way slower than emacs, slower still if compiled Ken Raeburn 2009-08-04 13:52 ` Marijn Schouten (hkBst) 2009-08-05 9:06 ` Andy Wingo @ 2009-08-05 10:42 ` Andy Wingo 2009-08-06 15:59 ` Andy Wingo 2009-08-05 11:15 ` Andy Wingo 3 siblings, 1 reply; 16+ messages in thread From: Andy Wingo @ 2009-08-05 10:42 UTC (permalink / raw) To: Ken Raeburn; +Cc: guile-devel Hi, On Tue 04 Aug 2009 11:28, Ken Raeburn <raeburn@raeburn.org> writes: > I implemented Ackermann's function A(m,n), a recursive function with > scary performance characteristics, based on the definition given at > wikipedia involving lots of +1 and -1 operations... I have now added 1+ and 1- ops to Guile's VM. They don't seem to make a terrible difference for your case however -- I get similar timings both with Marijn's code and with yours. One issue is that Guile's `while' macro provides continue and break constructs, which involve consing up those closures when you enter a while block. The compiler should be able to note when those vars are not referenced, but it does not do so now. That will have to wait for our inliner. > I wrote both elisp and scheme versions; I changed the recursion to an > explicitly-managed stack of values, to avoid some of the dynamic-vs- > lexical argument binding issues, turn the native stack usage into large > numbers of cell allocations and discards, and avoid exceeding the > maximum Lisp evaluation depth Emacs permits. Not surprisingly, it's > not too fast, and Emacs spends a lot of time in garbage collection > passes even with the byte-compiled version. Indeed, moving to an explicitly-managed stack tests GC much more than a recursive solution. The simple stack formulation of Ackermann's function is faster of course: (define (A m n) (if (= m 0) (+ n 1) (if (= n 0) (A (- m 1) 1) (A (- m 1) (A m (- n 1)))))) (A 3 9) completes in 1.45 seconds for me. Curiously, the letrec case takes more time: (define (A m n) (let A ((m m) (n n)) (if (= m 0) (+ n 1) (if (= n 0) (A (- m 1) 1) (A (- m 1) (A m (- n 1))))))) It should be faster, because we don't have to deref the toplevel variable for A, but the compiler needs to be a little smarter. This version takes about 1.6 seconds for me. For comparison, your elisp version computes A(3,9) in 2.9 seconds on this box. Apples and oranges, of course, due to the different implementation strategies. > ELISP> (time-test '(ackermann 3 9)) > 2.482042 I suppose this is the comparison, then. > ELISP> (time-test '(ackermann 4 1)) > 642.746514 And the stack-based version overflows for this one. We need to implement overflow handlers for the VM stack. > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 9))' > 48.728u 22.016s 1:10.75 99.9% 0+0k 0+0io 0pf+0w Indeed, I get similar times (44 seconds; presumably some of the compilation fixes actually had an effect). I'm running this under valgrind now, I'll see what pops up. Very interesting test, thanks. Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-05 10:42 ` Andy Wingo @ 2009-08-06 15:59 ` Andy Wingo 2009-08-07 17:27 ` Andy Wingo 0 siblings, 1 reply; 16+ messages in thread From: Andy Wingo @ 2009-08-06 15:59 UTC (permalink / raw) To: Ken Raeburn; +Cc: guile-devel Hi, On Wed 05 Aug 2009 12:42, Andy Wingo <wingo@pobox.com> writes: > The simple stack formulation of Ackermann's function is faster of > course: > > (define (A m n) > (if (= m 0) > (+ n 1) > (if (= n 0) > (A (- m 1) 1) > (A (- m 1) (A m (- n 1)))))) > > (A 3 9) completes in 1.45 seconds for me. > > Curiously, the letrec case takes more time: > > (define (A m n) > (let A ((m m) (n n)) > (if (= m 0) > (+ n 1) > (if (= n 0) > (A (- m 1) 1) > (A (- m 1) (A m (- n 1))))))) > > It should be faster, because we don't have to deref the toplevel > variable for A, but the compiler needs to be a little smarter. This > version takes about 1.6 seconds for me. I have fixed this issue. The letrec version now takes slightly less time than the version that recurses through the toplevel. >> % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 9))' >> 48.728u 22.016s 1:10.75 99.9% 0+0k 0+0io 0pf+0w > > Indeed, I get similar times (44 seconds; presumably some of the > compilation fixes actually had an effect). This one now takes very, very marginally less time now (about a second) -- not much of an improvement, but I wanted to take care of it before taking catch / dynamic-wind. Thanks, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-06 15:59 ` Andy Wingo @ 2009-08-07 17:27 ` Andy Wingo 0 siblings, 0 replies; 16+ messages in thread From: Andy Wingo @ 2009-08-07 17:27 UTC (permalink / raw) To: Ken Raeburn; +Cc: guile-devel On Thu 06 Aug 2009 17:59, Andy Wingo <wingo@pobox.com> writes: > On Wed 05 Aug 2009 12:42, Andy Wingo <wingo@pobox.com> writes: > >> The simple stack formulation of Ackermann's function is faster of >> course: >> >> (define (A m n) >> (if (= m 0) >> (+ n 1) >> (if (= n 0) >> (A (- m 1) 1) >> (A (- m 1) (A m (- n 1)))))) >> >> (A 3 9) completes in 1.45 seconds for me. >> >> Curiously, the letrec case takes more time: >> >> (define (A m n) >> (let A ((m m) (n n)) >> (if (= m 0) >> (+ n 1) >> (if (= n 0) >> (A (- m 1) 1) >> (A (- m 1) (A m (- n 1))))))) >> >> It should be faster, because we don't have to deref the toplevel >> variable for A, but the compiler needs to be a little smarter. This >> version takes about 1.6 seconds for me. > > I have fixed this issue. The letrec version now takes slightly less time > than the version that recurses through the toplevel. So, I couldn't stop, being so close -- I went ahead and implemented mutual tail-recursion as Steele did in the Lambda: The Ultimate GOTO paper. It doesn't have any effect in this case, but in Daniel's tight-loop code it should have an effect. Here was that code: (define (test to) (let ((i 0)) (while (< i to) (set! i (1+ i))))) Daniel said this was the result: > Tight loop, (test 1000000): > Scheme: 0.72s That's with 1 million iterations. But I just tried it with 10 million iterations, with current Guile from this afternoon, and got the same timing. Daniel can you test again? This would be a pleasant surprise :) It doesn't help the ackermann case, though. That will come later. Unfortunately the letrec case is actually taking more time it used to before this patch; alack. It doesn't look like I'll get to dynamic-wind until next thursday or so. Until then, happy hacking, and keep up the great bug reports! A -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled 2009-08-04 9:28 guile performance - Ackermann function: way slower than emacs, slower still if compiled Ken Raeburn ` (2 preceding siblings ...) 2009-08-05 10:42 ` Andy Wingo @ 2009-08-05 11:15 ` Andy Wingo 3 siblings, 0 replies; 16+ messages in thread From: Andy Wingo @ 2009-08-05 11:15 UTC (permalink / raw) To: Ken Raeburn; +Cc: guile-devel Hi Ken, Replying a lot :) On Tue 04 Aug 2009 11:28, Ken Raeburn <raeburn@raeburn.org> writes: > % time guile -c '(begin (load-compiled "ack.go") (ackermann 3 9))' > 48.728u 22.016s 1:10.75 99.9% 0+0k 0+0io 0pf+0w > % > > Not much better than loading the .scm file, and only better for small > values of n... in fact, with m=3 and n>=6 the performance seems worse > in the precompiled case: I ran this under kcachegrind and it seems that the penalty is leaving the VM for dispatching to `catch'. This is related to Daniel's issues leaving the VM for dynamic-wind. So it seems that's what I should work on. There are a number of other little issues, but this is the big one. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-08-26 22:39 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-04 9:28 guile performance - Ackermann function: way slower than emacs, slower still if compiled Ken Raeburn 2009-08-04 13:52 ` Marijn Schouten (hkBst) 2009-08-04 15:56 ` Ken Raeburn 2009-08-05 9:06 ` Andy Wingo 2009-08-05 14:06 ` Ken Raeburn 2009-08-06 7:31 ` Ken Raeburn 2009-08-26 22:39 ` Neil Jerram 2009-08-06 16:30 ` entering and leaving guile mode, and GC stack protection (was Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled) Ken Raeburn 2009-08-12 22:06 ` entering and leaving guile mode, and GC stack protection Andy Wingo 2009-08-14 7:58 ` Ludovic Courtès 2009-08-08 21:52 ` guile performance - Ackermann function: way slower than emacs, slower still if compiled Ludovic Courtès 2009-08-12 22:28 ` Andy Wingo 2009-08-05 10:42 ` Andy Wingo 2009-08-06 15:59 ` Andy Wingo 2009-08-07 17:27 ` Andy Wingo 2009-08-05 11:15 ` Andy Wingo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).