array operations - Daniel Llorens

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: Daniel Llorens <daniel.llorens@bluewin.ch>
To: Noah Lavine <noah.b.lavine@gmail.com>
Cc: Andy Wingo <wingo@pobox.com>, guile-devel <guile-devel@gnu.org>
Subject: array operations
Date: Sat, 9 Mar 2013 00:42:43 +0100	[thread overview]
Message-ID: <3543F651-4AF5-4F55-AECF-68A66DCBD0CF@bluewin.ch> (raw)
In-Reply-To: <CA+U71=MHmuPJpfO3qbsd4bLxxH=OZBFWsDSHObChd8v7zWXVUw@mail.gmail.com>

On Mar 4, 2013, at 03:27, Noah Lavine wrote:

> (I'm snipping the rest of your message because it needs more thought than I can give it right now.)

That message contained a number of errors, the consequence of talking before walking, I suppose.

I have added a simple reduction operator to the library, enough to define an inner product operation:

(define* (dot + * A B #:key type)
  "dot + * A B
   Inner product between the last axis of A and the first of B."
  (let ((type (or type (array-type* A))))
    (ply/type type
              (w/rank (Jv (cut folda/type type (lambda (c a b) (+ c (* a b))) 0 <> <>) '_ '_)
                      1 '_)
              A B)))

There are three loops in here:

---the outer loop (ply) over the axes of A, save the last (whence the arg 1 to w/rank)
---the middle loop (folda) over the last of A and the first of B (the axes being reduced)
---the inner loop (ply) over the axes of B, save the first —--this is because folda applies its op (lambda (c a b) ...) as an array op.

If A & B are both rank 2, this is like 'ikj' in §1.1.11 of Golub & Van Loan.

For example:

(define a (i. 300 100))
(define b (i. 100 200))
(define af (array-copy 'f64 a))
(define bf (array-copy 'f64 b))
(define as (array-copy 's64 a))
(define bs (array-copy 's64 b))

> ,time (dot + * a b)
$19 = #
;; 4.836000s real time, 4.870000s run time.  0.350000s spent in GC.
> ,time (dot + * af bf)
$20 = #
;; 7.166000s real time, 7.210000s run time.  0.690000s spent in GC.
> ,time (dot + * as bs)
$21 = #
;; 10.218000s real time, 10.280000s run time.  0.240000s spent in GC.

So it's horribly slow, and the differences between types are very large. Still, this shows that it's spending a significant fraction of the time in  arithmetic, or at least not in loop overhead (I think?)

The profiler output for the type #t version is

> ,profile (dot + * a b)
%     cumulative   self             
time   seconds     seconds      name
 71.43      5.07      5.07  #<procedure 11d767ab0 at util/reduce.scm:87:47 (c a b)>
  7.14      1.01      0.51  map
  7.14      0.51      0.51  =
  7.14      0.51      0.51  %after-gc-thunk
  7.14      0.51      0.51  #<procedure 11c52ff90 at util/ploy.scm:165:16 i>
  0.00      7.10      0.00  #<procedure 1163083a0 at ice-9/top-repl.scm:66:5 ()>
  0.00      7.10      0.00  array-index-map!
  0.00      7.10      0.00  call-with-prompt
  0.00      7.10      0.00  folda/type
  0.00      7.10      0.00  apply-smob/1
  0.00      7.10      0.00  catch
  0.00      7.10      0.00  start-repl
  0.00      7.10      0.00  run-repl
  0.00      7.10      0.00  ply/type
  0.00      7.10      0.00  statprof
  0.00      7.10      0.00  #<procedure 1163085a0 at ice-9/top-repl.scm:31:6 (thunk)>
  0.00      7.10      0.00  #<procedure 1189d1d20 at util/ploy.scm:121:5 i>
  0.00      7.10      0.00  #<procedure 1189cf210 at statprof.scm:655:4 ()>
  0.00      5.07      0.00  array-map!
  0.00      1.52      0.00  nested-op-frames
  0.00      0.51      0.00  nested-match-frame
  0.00      0.51      0.00  array-map/frame!
  0.00      0.51      0.00  length=?
  0.00      0.51      0.00  make-shared-array
---
Sample count: 14
Total time: 7.1 seconds (0.22 seconds in GC)

The top entry is (lambda (c a b) ...) in the definition of dot above. The map comes from the definition of folda/type (the middle loop).

(define (folda/type type op_ z . a)
  (if (null? a)
    z
    (let ((op (if (verb? op_) op_ (make-verb op_ #f #f))))
      (let ((end (tally (car a))))
        (let loop ((i 0) (c z))
          (if (< i end)
; @todo factor out frame matching from repeated application of ply
            (loop (+ 1 i) (apply ply/type type op c (map (cut from_ <> i) a)))
            c))))))                                   
                                                      ^
   here -----------------------------------------------

There's no map in the inner loop since the op has rank 0, so it resolves to array-map!.

------------

As a reference point,

> ,time (blas-dgemm a b 0. 'no 'no )
$28 = #
;; 0.009000s real time, 0.000000s run time.  0.000000s spent in GC.
> ,time (blas-dgemm af bf 0. 'no 'no )
$29 = #
;; 0.002000s real time, 0.000000s run time.  0.000000s spent in GC.

is a few 1000 times faster. It takes more than 3 times longer to convert from scm to f64 than to multiply... lots of room for improvement.

------------

1) The variable rank/arity code is by necessity full of apply / map / rest args. However, in practice, it will be used most of the time with rank 1 / 2 / maybe 3. It's not acceptable to have  this apply-map business going on in an inner loop, but I don't want to write rank-specific or arity-specific variants by hand. I could try to use syntax-rules' ... for the fixed rank/arity functions and select with case-lambda, but that still requires that I write two versions of nearly every function in the library. Is there a way to avoid this? It occurred to me that the partial evaluator could help, but I'm not sure how. Or maybe somebody has a meta-macro that takes the version using ... and converts it to the variable rank/arity version or viceversa.

2) Iterating over an array is a pain. I had this issue before when mapping over subarrays, so I wrote an array-map/frame!. If I write an array-for-each/frame

(let ((c z))
  (apply array-for-each/frame f (lambda a (set! c (apply ply/type type op c a))) a)
  c)

at least this would remove the map-from in the scalar case. It's something... still, carrying c outside it's a bit ugly.

3) is it normal that the arithmetic is so slow? the code above generates tons of temporaries and has a ton of overhead and it still manages to spend most of its time in the + * op.

Regards,

	Daniel

next prev parent reply	other threads:[~2013-03-08 23:42 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.153.1351958430.10005.guile-devel@gnu.org>
2012-11-03 16:52 ` propose deprecation of generalized-vector-* Daniel Llorens
2012-11-03 21:10   ` Ludovic Courtès
2013-01-21 16:11     ` Andy Wingo
2013-01-22 14:31       ` Daniel Llorens
2013-01-22 18:31         ` Daniel Llorens
2013-01-22 20:52           ` Andy Wingo
2013-01-22 23:27             ` Daniel Llorens
2013-01-23  9:20               ` Andy Wingo
2013-01-23 14:55             ` Ludovic Courtès
2013-01-23  9:06         ` Andy Wingo
2013-01-23 12:20           ` Daniel Llorens
2013-02-18 15:55             ` Andy Wingo
2013-02-18 16:05               ` Noah Lavine
2013-02-18 16:25                 ` Mike Gran
2013-02-18 16:29                   ` Noah Lavine
2013-02-18 17:11                     ` David Pirotte
2013-02-18 17:17                       ` Mike Gran
2013-02-18 23:57                   ` Daniel Hartwig
2013-02-18 23:12               ` Problems with 'number->string' (was Re: propose deprecation of generalized-vector-*) Mark H Weaver
2013-02-21  1:13               ` propose deprecation of generalized-vector-* Daniel Llorens
2013-02-22  0:22                 ` Noah Lavine
2013-02-28 19:10                   ` Daniel Llorens
2013-03-01  2:42                     ` Noah Lavine
2013-03-01  3:46                       ` Noah Lavine
2013-03-01  9:01                       ` Daniel Llorens
2013-03-01  9:44                         ` Andy Wingo
2013-03-04  2:27                         ` Noah Lavine
2013-03-08 23:42                           ` Daniel Llorens [this message]
2013-02-18 15:40           ` Andy Wingo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3543F651-4AF5-4F55-AECF-68A66DCBD0CF@bluewin.ch \
    --to=daniel.llorens@bluewin.ch \
    --cc=guile-devel@gnu.org \
    --cc=noah.b.lavine@gmail.com \
    --cc=wingo@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).