From: Daniel Llorens <daniel.llorens@bluewin.ch>
To: Noah Lavine <noah.b.lavine@gmail.com>
Cc: Andy Wingo <wingo@pobox.com>, guile-devel <guile-devel@gnu.org>
Subject: array operations
Date: Sat, 9 Mar 2013 00:42:43 +0100 [thread overview]
Message-ID: <3543F651-4AF5-4F55-AECF-68A66DCBD0CF@bluewin.ch> (raw)
In-Reply-To: <CA+U71=MHmuPJpfO3qbsd4bLxxH=OZBFWsDSHObChd8v7zWXVUw@mail.gmail.com>
On Mar 4, 2013, at 03:27, Noah Lavine wrote:
> (I'm snipping the rest of your message because it needs more thought than I can give it right now.)
That message contained a number of errors, the consequence of talking before walking, I suppose.
I have added a simple reduction operator to the library, enough to define an inner product operation:
(define* (dot + * A B #:key type)
"dot + * A B
Inner product between the last axis of A and the first of B."
(let ((type (or type (array-type* A))))
(ply/type type
(w/rank (Jv (cut folda/type type (lambda (c a b) (+ c (* a b))) 0 <> <>) '_ '_)
1 '_)
A B)))
There are three loops in here:
---the outer loop (ply) over the axes of A, save the last (whence the arg 1 to w/rank)
---the middle loop (folda) over the last of A and the first of B (the axes being reduced)
---the inner loop (ply) over the axes of B, save the first —--this is because folda applies its op (lambda (c a b) ...) as an array op.
If A & B are both rank 2, this is like 'ikj' in §1.1.11 of Golub & Van Loan.
For example:
(define a (i. 300 100))
(define b (i. 100 200))
(define af (array-copy 'f64 a))
(define bf (array-copy 'f64 b))
(define as (array-copy 's64 a))
(define bs (array-copy 's64 b))
> ,time (dot + * a b)
$19 = #
;; 4.836000s real time, 4.870000s run time. 0.350000s spent in GC.
> ,time (dot + * af bf)
$20 = #
;; 7.166000s real time, 7.210000s run time. 0.690000s spent in GC.
> ,time (dot + * as bs)
$21 = #
;; 10.218000s real time, 10.280000s run time. 0.240000s spent in GC.
So it's horribly slow, and the differences between types are very large. Still, this shows that it's spending a significant fraction of the time in arithmetic, or at least not in loop overhead (I think?)
The profiler output for the type #t version is
> ,profile (dot + * a b)
% cumulative self
time seconds seconds name
71.43 5.07 5.07 #<procedure 11d767ab0 at util/reduce.scm:87:47 (c a b)>
7.14 1.01 0.51 map
7.14 0.51 0.51 =
7.14 0.51 0.51 %after-gc-thunk
7.14 0.51 0.51 #<procedure 11c52ff90 at util/ploy.scm:165:16 i>
0.00 7.10 0.00 #<procedure 1163083a0 at ice-9/top-repl.scm:66:5 ()>
0.00 7.10 0.00 array-index-map!
0.00 7.10 0.00 call-with-prompt
0.00 7.10 0.00 folda/type
0.00 7.10 0.00 apply-smob/1
0.00 7.10 0.00 catch
0.00 7.10 0.00 start-repl
0.00 7.10 0.00 run-repl
0.00 7.10 0.00 ply/type
0.00 7.10 0.00 statprof
0.00 7.10 0.00 #<procedure 1163085a0 at ice-9/top-repl.scm:31:6 (thunk)>
0.00 7.10 0.00 #<procedure 1189d1d20 at util/ploy.scm:121:5 i>
0.00 7.10 0.00 #<procedure 1189cf210 at statprof.scm:655:4 ()>
0.00 5.07 0.00 array-map!
0.00 1.52 0.00 nested-op-frames
0.00 0.51 0.00 nested-match-frame
0.00 0.51 0.00 array-map/frame!
0.00 0.51 0.00 length=?
0.00 0.51 0.00 make-shared-array
---
Sample count: 14
Total time: 7.1 seconds (0.22 seconds in GC)
The top entry is (lambda (c a b) ...) in the definition of dot above. The map comes from the definition of folda/type (the middle loop).
(define (folda/type type op_ z . a)
(if (null? a)
z
(let ((op (if (verb? op_) op_ (make-verb op_ #f #f))))
(let ((end (tally (car a))))
(let loop ((i 0) (c z))
(if (< i end)
; @todo factor out frame matching from repeated application of ply
(loop (+ 1 i) (apply ply/type type op c (map (cut from_ <> i) a)))
c))))))
^
here -----------------------------------------------
There's no map in the inner loop since the op has rank 0, so it resolves to array-map!.
------------
As a reference point,
> ,time (blas-dgemm a b 0. 'no 'no )
$28 = #
;; 0.009000s real time, 0.000000s run time. 0.000000s spent in GC.
> ,time (blas-dgemm af bf 0. 'no 'no )
$29 = #
;; 0.002000s real time, 0.000000s run time. 0.000000s spent in GC.
is a few 1000 times faster. It takes more than 3 times longer to convert from scm to f64 than to multiply... lots of room for improvement.
------------
1) The variable rank/arity code is by necessity full of apply / map / rest args. However, in practice, it will be used most of the time with rank 1 / 2 / maybe 3. It's not acceptable to have this apply-map business going on in an inner loop, but I don't want to write rank-specific or arity-specific variants by hand. I could try to use syntax-rules' ... for the fixed rank/arity functions and select with case-lambda, but that still requires that I write two versions of nearly every function in the library. Is there a way to avoid this? It occurred to me that the partial evaluator could help, but I'm not sure how. Or maybe somebody has a meta-macro that takes the version using ... and converts it to the variable rank/arity version or viceversa.
2) Iterating over an array is a pain. I had this issue before when mapping over subarrays, so I wrote an array-map/frame!. If I write an array-for-each/frame
(let ((c z))
(apply array-for-each/frame f (lambda a (set! c (apply ply/type type op c a))) a)
c)
at least this would remove the map-from in the scalar case. It's something... still, carrying c outside it's a bit ugly.
3) is it normal that the arithmetic is so slow? the code above generates tons of temporaries and has a ton of overhead and it still manages to spend most of its time in the + * op.
Regards,
Daniel
next prev parent reply other threads:[~2013-03-08 23:42 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.153.1351958430.10005.guile-devel@gnu.org>
2012-11-03 16:52 ` propose deprecation of generalized-vector-* Daniel Llorens
2012-11-03 21:10 ` Ludovic Courtès
2013-01-21 16:11 ` Andy Wingo
2013-01-22 14:31 ` Daniel Llorens
2013-01-22 18:31 ` Daniel Llorens
2013-01-22 20:52 ` Andy Wingo
2013-01-22 23:27 ` Daniel Llorens
2013-01-23 9:20 ` Andy Wingo
2013-01-23 14:55 ` Ludovic Courtès
2013-01-23 9:06 ` Andy Wingo
2013-01-23 12:20 ` Daniel Llorens
2013-02-18 15:55 ` Andy Wingo
2013-02-18 16:05 ` Noah Lavine
2013-02-18 16:25 ` Mike Gran
2013-02-18 16:29 ` Noah Lavine
2013-02-18 17:11 ` David Pirotte
2013-02-18 17:17 ` Mike Gran
2013-02-18 23:57 ` Daniel Hartwig
2013-02-18 23:12 ` Problems with 'number->string' (was Re: propose deprecation of generalized-vector-*) Mark H Weaver
2013-02-21 1:13 ` propose deprecation of generalized-vector-* Daniel Llorens
2013-02-22 0:22 ` Noah Lavine
2013-02-28 19:10 ` Daniel Llorens
2013-03-01 2:42 ` Noah Lavine
2013-03-01 3:46 ` Noah Lavine
2013-03-01 9:01 ` Daniel Llorens
2013-03-01 9:44 ` Andy Wingo
2013-03-04 2:27 ` Noah Lavine
2013-03-08 23:42 ` Daniel Llorens [this message]
2013-02-18 15:40 ` Andy Wingo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3543F651-4AF5-4F55-AECF-68A66DCBD0CF@bluewin.ch \
--to=daniel.llorens@bluewin.ch \
--cc=guile-devel@gnu.org \
--cc=noah.b.lavine@gmail.com \
--cc=wingo@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).