From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Daniel Llorens Newsgroups: gmane.lisp.guile.devel Subject: array operations Date: Sat, 9 Mar 2013 00:42:43 +0100 Message-ID: <3543F651-4AF5-4F55-AECF-68A66DCBD0CF@bluewin.ch> References: <0F432FA1-CFF8-4A22-A477-5291A1B9925D@bluewin.ch> <87ip9mgzp4.fsf@gnu.org> <878v7m5xdh.fsf@pobox.com> <2E5FFE0D-9001-409C-BCD4-9EE3BF9883F0@bluewin.ch> <87mww0nu8l.fsf@pobox.com> <2D31D517-08F8-4D07-84DB-098E335AE0AD@bluewin.ch> <874nh9boqe.fsf@pobox.com> <96617E9F-D83C-48EE-B84D-7CD45C4181C2@bluewin.ch> <441E015F-F545-48DF-AF96-E1FEA64F64A3@bluewin.ch> <14C63C7B-DEA6-4C0C-AB18-17695BC5FAD3@bluewin.ch> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1362786183 16668 80.91.229.3 (8 Mar 2013 23:43:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 8 Mar 2013 23:43:03 +0000 (UTC) Cc: Andy Wingo , guile-devel To: Noah Lavine Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Mar 09 00:43:25 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UE6we-0000PO-4Z for guile-devel@m.gmane.org; Sat, 09 Mar 2013 00:43:24 +0100 Original-Received: from localhost ([::1]:54327 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UE6wI-0002Na-8I for guile-devel@m.gmane.org; Fri, 08 Mar 2013 18:43:02 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:34502) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UE6w9-0002NU-Tx for guile-devel@gnu.org; Fri, 08 Mar 2013 18:42:59 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UE6w4-0001nR-GE for guile-devel@gnu.org; Fri, 08 Mar 2013 18:42:53 -0500 Original-Received: from zhhdzmsp-smta16.bluewin.ch ([195.186.227.132]:47707) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UE6w4-0001nB-67 for guile-devel@gnu.org; Fri, 08 Mar 2013 18:42:48 -0500 Original-Received: from [195.186.99.130] ([195.186.99.130:61267] helo=zhbdzmsp-smta11.bluewin.ch) by zhhdzmsp-smta16.bluewin.ch (envelope-from ) (ecelerity 2.2.3.47 r(39824M)) with ESMTP id 6C/61-25143-4777A315; Fri, 08 Mar 2013 23:42:45 +0000 Original-Received: from [10.0.1.10] (62.203.196.29) by zhbdzmsp-smta11.bluewin.ch (8.5.142) (authenticated as dll@bluewin.ch) id 510085AB03E33291; Fri, 8 Mar 2013 23:42:44 +0000 In-Reply-To: X-Mailer: Apple Mail (2.1085) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 195.186.227.132 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:15888 Archived-At: On Mar 4, 2013, at 03:27, Noah Lavine wrote: > (I'm snipping the rest of your message because it needs more thought = than I can give it right now.) That message contained a number of errors, the consequence of talking = before walking, I suppose. I have added a simple reduction operator to the library, enough to = define an inner product operation: (define* (dot + * A B #:key type) "dot + * A B Inner product between the last axis of A and the first of B." (let ((type (or type (array-type* A)))) (ply/type type (w/rank (Jv (cut folda/type type (lambda (c a b) (+ c (* a = b))) 0 <> <>) '_ '_) 1 '_) A B))) There are three loops in here: ---the outer loop (ply) over the axes of A, save the last (whence the = arg 1 to w/rank) ---the middle loop (folda) over the last of A and the first of B (the = axes being reduced) ---the inner loop (ply) over the axes of B, save the first =97--this is = because folda applies its op (lambda (c a b) ...) as an array op. If A & B are both rank 2, this is like 'ikj' in =A71.1.11 of Golub & Van = Loan. For example: (define a (i. 300 100)) (define b (i. 100 200)) (define af (array-copy 'f64 a)) (define bf (array-copy 'f64 b)) (define as (array-copy 's64 a)) (define bs (array-copy 's64 b)) > ,time (dot + * a b) $19 =3D # ;; 4.836000s real time, 4.870000s run time. 0.350000s spent in GC. > ,time (dot + * af bf) $20 =3D # ;; 7.166000s real time, 7.210000s run time. 0.690000s spent in GC. > ,time (dot + * as bs) $21 =3D # ;; 10.218000s real time, 10.280000s run time. 0.240000s spent in GC. So it's horribly slow, and the differences between types are very large. = Still, this shows that it's spending a significant fraction of the time = in arithmetic, or at least not in loop overhead (I think?) The profiler output for the type #t version is > ,profile (dot + * a b) % cumulative self =20 time seconds seconds name 71.43 5.07 5.07 # 7.14 1.01 0.51 map 7.14 0.51 0.51 =3D 7.14 0.51 0.51 %after-gc-thunk 7.14 0.51 0.51 # 0.00 7.10 0.00 # 0.00 7.10 0.00 array-index-map! 0.00 7.10 0.00 call-with-prompt 0.00 7.10 0.00 folda/type 0.00 7.10 0.00 apply-smob/1 0.00 7.10 0.00 catch 0.00 7.10 0.00 start-repl 0.00 7.10 0.00 run-repl 0.00 7.10 0.00 ply/type 0.00 7.10 0.00 statprof 0.00 7.10 0.00 # 0.00 7.10 0.00 # 0.00 7.10 0.00 # 0.00 5.07 0.00 array-map! 0.00 1.52 0.00 nested-op-frames 0.00 0.51 0.00 nested-match-frame 0.00 0.51 0.00 array-map/frame! 0.00 0.51 0.00 length=3D? 0.00 0.51 0.00 make-shared-array --- Sample count: 14 Total time: 7.1 seconds (0.22 seconds in GC) The top entry is (lambda (c a b) ...) in the definition of dot above. = The map comes from the definition of folda/type (the middle loop). (define (folda/type type op_ z . a) (if (null? a) z (let ((op (if (verb? op_) op_ (make-verb op_ #f #f)))) (let ((end (tally (car a)))) (let loop ((i 0) (c z)) (if (< i end) ; @todo factor out frame matching from repeated application of ply (loop (+ 1 i) (apply ply/type type op c (map (cut from_ <> = i) a))) c)))))) =20 ^ here ----------------------------------------------- There's no map in the inner loop since the op has rank 0, so it resolves = to array-map!. ------------ As a reference point, > ,time (blas-dgemm a b 0. 'no 'no ) $28 =3D # ;; 0.009000s real time, 0.000000s run time. 0.000000s spent in GC. > ,time (blas-dgemm af bf 0. 'no 'no ) $29 =3D # ;; 0.002000s real time, 0.000000s run time. 0.000000s spent in GC. is a few 1000 times faster. It takes more than 3 times longer to convert = from scm to f64 than to multiply... lots of room for improvement. ------------ 1) The variable rank/arity code is by necessity full of apply / map / = rest args. However, in practice, it will be used most of the time with = rank 1 / 2 / maybe 3. It's not acceptable to have this apply-map = business going on in an inner loop, but I don't want to write = rank-specific or arity-specific variants by hand. I could try to use = syntax-rules' ... for the fixed rank/arity functions and select with = case-lambda, but that still requires that I write two versions of nearly = every function in the library. Is there a way to avoid this? It occurred = to me that the partial evaluator could help, but I'm not sure how. Or = maybe somebody has a meta-macro that takes the version using ... and = converts it to the variable rank/arity version or viceversa. 2) Iterating over an array is a pain. I had this issue before when = mapping over subarrays, so I wrote an array-map/frame!. If I write an = array-for-each/frame (let ((c z)) (apply array-for-each/frame f (lambda a (set! c (apply ply/type type = op c a))) a) c) at least this would remove the map-from in the scalar case. It's = something... still, carrying c outside it's a bit ugly. 3) is it normal that the arithmetic is so slow? the code above generates = tons of temporaries and has a ton of overhead and it still manages to = spend most of its time in the + * op. Regards, Daniel