unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Daniel Llorens <daniel.llorens@bluewin.ch>
To: guile-devel <guile-devel@gnu.org>
Subject: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join)
Date: Mon, 1 Apr 2013 19:15:31 +0200	[thread overview]
Message-ID: <F525A557-2F6F-4204-9CB7-CBB96420D7C3@bluewin.ch> (raw)
In-Reply-To: <mailman.1257718.1364809945.854.guile-devel@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 4365 bytes --]


> Message: 5
> Date: Mon, 1 Apr 2013 15:40:48 +0800
> From: Daniel Hartwig <mandyke@gmail.com>
> To: guile-devel@gnu.org
> Subject: Re: Extremly slow for format & string-join

> On 1 April 2013 14:59, Daniel Llorens <daniel.llorens@bluewin.ch> wrote:

>> How can it be slower to allocate the result at once?
> 
> Shrug.  I do not know much of array internals.  You probably have much
> more experience there than I.

Not much with the implementation, no :-/

> Except for the curious profile output, I suspect the overhead is due
> to such factors as repeated application of MAPFUNC and consequent
> arithmetic to access the shared arrays contents

mapfunc is used only to compute the strides and bounds, it isn't kept beyond make-shared-array.

But I hadn't thought that the profile was wrong. Indeed, the slow part is not make-typed-array but array-copy!.

scheme@(guile-user)> (define s "1234567890")
scheme@(guile-user)> (define t (make-shared-array s (lambda (i j) (list j)) 1000000 10))
scheme@(guile-user)> (define a (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> (define b (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> ,time (array-copy! t a)
;; 1.718000s real time, 1.710000s run time.  0.000000s spent in GC.
scheme@(guile-user)> ,time (array-copy! a b)
;; 1.673000s real time, 1.670000s run time.  0.000000s spent in GC.
scheme@(guile-user)> 

Since I have no intuition for these numbers, I thought maybe it's really this slow, or a cache problem, who knows:

scheme@(guile-user)> (import (rnrs bytevectors))
scheme@(guile-user)> (define x (make-bytevector 40000000))
scheme@(guile-user)> ,time (define y (bytevector-copy x))
;; 0.018000s real time, 0.020000s run time.  0.000000s spent in GC.

In NumPy (using doubles):

In [11]: %time a = np.zeros([1000000, 10])
CPU times: user 0.04 s, sys: 0.05 s, total: 0.09 s
Wall time: 0.09 s

In [12]: %time b = a+1
CPU times: user 0.04 s, sys: 0.05 s, total: 0.09 s
Wall time: 0.09 s

So it's really bad. I had a look at libguile/array-map.c. There are three parts in there:

[1] scm_ramapc(). This is a general array traversal function. It does linearization, so the (array-copy! a b) call above should resolve to a single call to racp().

[2] array-copy!, array-fill!, array-map!, array-for-each, etc. These all use scm_ramapc().

[3] a bunch of specializations scm_ra_sum, scm_ra_difference, and so on. 

First, I think that all of [3] should be gone, it's dead code. This is the first patch.

Second, array-map!, array-for-each cons on each application of proc. The quick & dirty solution is to add 1-arg, 2-args, etc. cases to ramap(), rafe(). array-index-map! does its own traversal and can't be linearized, so that can't be fixed as easily. There are weirdo cases. For example array-equal? calls array_compare that recurses on itself down to the last rank. This means that there's a function call on each and every array element.

I don't know whether fixing these problems is worthwhile, or the whole thing should be rewritten, maybe with a different approach. Either go to Scheme where we have macros and can inline the inner loops, or use a code generator to generate fixed rank cases, etc.

Third, none of the above are causing the slowness of array-copy!. I noticed that there's a double indirection in racp(). The second patch removes it. Actually this double indirection goes on all over array-map.c and I don't understand why it's needed...

It's only a bit faster than before, though:

scheme@(guile-user)> (define s "1234567890")
scheme@(guile-user)> (define t (make-shared-array s (lambda (i j) (list j)) 1000000 10))
scheme@(guile-user)> (define a (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> (define b (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> ,time (array-copy! t a)
;; 1.187000s real time, 1.190000s run time.  0.000000s spent in GC.
scheme@(guile-user)> ,time (array-copy! a b)
;; 1.107000s real time, 1.110000s run time.  0.000000s spent in GC.
scheme@(guile-user)> 

There's the overhead of impl->, etc. I'm thinking that one can do a direct memory copy when the array types are the same, or even call memcpy() when the strides allow. I think these should be relatively common cases.

Regards,

	Daniel


[-- Attachment #2: 0001-Remove-dead-code-in-array-map.c.patch --]
[-- Type: application/octet-stream, Size: 8036 bytes --]

From 3ab911215c98d6c3a018755439dda59cd5c0ed40 Mon Sep 17 00:00:00 2001
From: Daniel Llorens <daniel.llorens@bluewin.ch>
Date: Mon, 1 Apr 2013 17:16:15 +0200
Subject: [PATCH 1/2] Remove dead code in array-map.c

* array-map.h, array-map.c: remove scm_ra_eqp, ra_compare, scm_ra_lessp,
  scm_ra_leqp, scm_ra_grp, scm_ra_greqp, scm_ra_sum, scm_ra_difference,
  scm_ra_product, scm_ra_divide, scm_array_identity.
---
 libguile/array-map.c | 244 ---------------------------------------------------
 libguile/array-map.h |  10 ---
 2 files changed, 254 deletions(-)

diff --git a/libguile/array-map.c b/libguile/array-map.c
index d4da152..8c60f34 100644
--- a/libguile/array-map.c
+++ b/libguile/array-map.c
@@ -385,252 +385,8 @@ SCM_DEFINE (scm_array_copy_x, "array-copy!", 2, 0, 0,
 }
 #undef FUNC_NAME
 
-/* Functions callable by ARRAY-MAP! */
-
-
-int
-scm_ra_eqp (SCM ra0, SCM ras)
-{
-  SCM ra1 = SCM_CAR (ras), ra2 = SCM_CAR (SCM_CDR (ras));
-  scm_t_array_handle ra0_handle;
-  scm_t_array_dim *ra0_dims;
-  size_t n;
-  ssize_t inc0;
-  size_t i0 = 0;
-  unsigned long i1 = SCM_I_ARRAY_BASE (ra1), i2 = SCM_I_ARRAY_BASE (ra2);
-  long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
-  long inc2 = SCM_I_ARRAY_DIMS (ra1)->inc;
-  ra1 = SCM_I_ARRAY_V (ra1);
-  ra2 = SCM_I_ARRAY_V (ra2);
-
-  scm_array_get_handle (ra0, &ra0_handle);
-  ra0_dims = scm_array_handle_dims (&ra0_handle);
-  n = ra0_dims[0].ubnd - ra0_dims[0].lbnd + 1;
-  inc0 = ra0_dims[0].inc;
-
-  {
-    for (; n-- > 0; i0 += inc0, i1 += inc1, i2 += inc2)
-      if (scm_is_true (scm_array_handle_ref (&ra0_handle, i0)))
-	if (!scm_is_eq (GVREF (ra1, i1), GVREF (ra2, i2)))
-	  scm_array_handle_set (&ra0_handle, i0, SCM_BOOL_F);
-  }
-
-  scm_array_handle_release (&ra0_handle);
-  return 1;
-}
-
-/* opt 0 means <, nonzero means >= */
 
 static int
-ra_compare (SCM ra0, SCM ra1, SCM ra2, int opt)
-{
-  scm_t_array_handle ra0_handle;
-  scm_t_array_dim *ra0_dims;
-  size_t n;
-  ssize_t inc0;
-  size_t i0 = 0;
-  unsigned long i1 = SCM_I_ARRAY_BASE (ra1), i2 = SCM_I_ARRAY_BASE (ra2);
-  long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
-  long inc2 = SCM_I_ARRAY_DIMS (ra1)->inc;
-  ra1 = SCM_I_ARRAY_V (ra1);
-  ra2 = SCM_I_ARRAY_V (ra2);
-
-  scm_array_get_handle (ra0, &ra0_handle);
-  ra0_dims = scm_array_handle_dims (&ra0_handle);
-  n = ra0_dims[0].ubnd - ra0_dims[0].lbnd + 1;
-  inc0 = ra0_dims[0].inc;
-
-  {
-    for (; n-- > 0; i0 += inc0, i1 += inc1, i2 += inc2)
-      if (scm_is_true (scm_array_handle_ref (&ra0_handle, i0)))
-	if (opt ?
-	    scm_is_true (scm_less_p (GVREF (ra1, i1), GVREF (ra2, i2))) :
-	    scm_is_false (scm_less_p (GVREF (ra1, i1), GVREF (ra2, i2))))
-	  scm_array_handle_set (&ra0_handle, i0, SCM_BOOL_F);
-  }
-
-  scm_array_handle_release (&ra0_handle);
-  return 1;
-}
-
-
-
-int
-scm_ra_lessp (SCM ra0, SCM ras)
-{
-  return ra_compare (ra0, SCM_CAR (ras), SCM_CAR (SCM_CDR (ras)), 0);
-}
-
-
-int
-scm_ra_leqp (SCM ra0, SCM ras)
-{
-  return ra_compare (ra0, SCM_CAR (SCM_CDR (ras)), SCM_CAR (ras), 1);
-}
-
-
-int
-scm_ra_grp (SCM ra0, SCM ras)
-{
-  return ra_compare (ra0, SCM_CAR (SCM_CDR (ras)), SCM_CAR (ras), 0);
-}
-
-
-int
-scm_ra_greqp (SCM ra0, SCM ras)
-{
-  return ra_compare (ra0, SCM_CAR (ras), SCM_CAR (SCM_CDR (ras)), 1);
-}
-
-
-int
-scm_ra_sum (SCM ra0, SCM ras)
-{
-  long n = SCM_I_ARRAY_DIMS (ra0)->ubnd - SCM_I_ARRAY_DIMS (ra0)->lbnd + 1;
-  unsigned long i0 = SCM_I_ARRAY_BASE (ra0);
-  long inc0 = SCM_I_ARRAY_DIMS (ra0)->inc;
-  ra0 = SCM_I_ARRAY_V (ra0);
-  if (!scm_is_null(ras))
-    {
-      SCM ra1 = SCM_CAR (ras);
-      unsigned long i1 = SCM_I_ARRAY_BASE (ra1);
-      long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
-      ra1 = SCM_I_ARRAY_V (ra1);
-      switch (SCM_TYP7 (ra0) == SCM_TYP7 (ra1) ? SCM_TYP7 (ra0) : 0)
-	{
-	default:
-	  {
-	    for (; n-- > 0; i0 += inc0, i1 += inc1)
-	      GVSET (ra0, i0, scm_sum (GVREF(ra0, i0), GVREF(ra1, i1)));
-	    break;
-	  }
-	}
-    }
-  return 1;
-}
-
-
-
-int
-scm_ra_difference (SCM ra0, SCM ras)
-{
-  long n = SCM_I_ARRAY_DIMS (ra0)->ubnd - SCM_I_ARRAY_DIMS (ra0)->lbnd + 1;
-  unsigned long i0 = SCM_I_ARRAY_BASE (ra0);
-  long inc0 = SCM_I_ARRAY_DIMS (ra0)->inc;
-  ra0 = SCM_I_ARRAY_V (ra0);
-  if (scm_is_null (ras))
-    {
-      switch (SCM_TYP7 (ra0))
-	{
-	default:
-	  {
-	    for (; n-- > 0; i0 += inc0)
-	      GVSET (ra0, i0, scm_difference (GVREF(ra0, i0), SCM_UNDEFINED));
-	    break;
-	  }
-	}
-    }
-  else
-    {
-      SCM ra1 = SCM_CAR (ras);
-      unsigned long i1 = SCM_I_ARRAY_BASE (ra1);
-      long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
-      ra1 = SCM_I_ARRAY_V (ra1);
-      switch (SCM_TYP7 (ra0) == SCM_TYP7 (ra1) ? SCM_TYP7 (ra0) : 0)
-	{
-	default:
-	  {
-	    for (; n-- > 0; i0 += inc0, i1 += inc1)
-	      GVSET (ra0, i0, scm_difference (GVREF (ra0, i0),
-					      GVREF (ra1, i1)));
-	    break;
-	  }
-	}
-    }
-  return 1;
-}
-
-
-
-int
-scm_ra_product (SCM ra0, SCM ras)
-{
-  long n = SCM_I_ARRAY_DIMS (ra0)->ubnd - SCM_I_ARRAY_DIMS (ra0)->lbnd + 1;
-  unsigned long i0 = SCM_I_ARRAY_BASE (ra0);
-  long inc0 = SCM_I_ARRAY_DIMS (ra0)->inc;
-  ra0 = SCM_I_ARRAY_V (ra0);
-  if (!scm_is_null (ras))
-    {
-      SCM ra1 = SCM_CAR (ras);
-      unsigned long i1 = SCM_I_ARRAY_BASE (ra1);
-      long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
-      ra1 = SCM_I_ARRAY_V (ra1);
-      switch (SCM_TYP7 (ra0) == SCM_TYP7 (ra1) ? SCM_TYP7 (ra0) : 0)
-	{
-	default:
-	  {
-	    for (; n-- > 0; i0 += inc0, i1 += inc1)
-	      GVSET (ra0, i0, scm_product (GVREF (ra0, i0),
-					   GVREF (ra1, i1)));
-	  }
-	}
-    }
-  return 1;
-}
-
-
-int
-scm_ra_divide (SCM ra0, SCM ras)
-{
-  long n = SCM_I_ARRAY_DIMS (ra0)->ubnd - SCM_I_ARRAY_DIMS (ra0)->lbnd + 1;
-  unsigned long i0 = SCM_I_ARRAY_BASE (ra0);
-  long inc0 = SCM_I_ARRAY_DIMS (ra0)->inc;
-  ra0 = SCM_I_ARRAY_V (ra0);
-  if (scm_is_null (ras))
-    {
-      switch (SCM_TYP7 (ra0))
-	{
-	default:
-	  {
-	    for (; n-- > 0; i0 += inc0)
-	      GVSET (ra0, i0, scm_divide (GVREF (ra0, i0), SCM_UNDEFINED));
-	    break;
-	  }
-	}
-    }
-  else
-    {
-      SCM ra1 = SCM_CAR (ras);
-      unsigned long i1 = SCM_I_ARRAY_BASE (ra1);
-      long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
-      ra1 = SCM_I_ARRAY_V (ra1);
-      switch (SCM_TYP7 (ra0) == SCM_TYP7 (ra1) ? SCM_TYP7 (ra0) : 0)
-	{
-	default:
-	  {
-	    for (; n-- > 0; i0 += inc0, i1 += inc1)
-	      {
-		SCM res =  scm_divide (GVREF (ra0, i0),
-				       GVREF (ra1, i1));
-		GVSET (ra0, i0, res);
-	      }
-	    break;
-	  }
-	}
-    }
-  return 1;
-}
-
-
-int
-scm_array_identity (SCM dst, SCM src)
-{
-  return racp (SCM_CAR (src), scm_cons (dst, SCM_EOL));
-}
-
-
-
-static int 
 ramap (SCM ra0, SCM proc, SCM ras)
 {
   long i = SCM_I_ARRAY_DIMS (ra0)->lbnd;
diff --git a/libguile/array-map.h b/libguile/array-map.h
index 43d2a92..a50fcc5 100644
--- a/libguile/array-map.h
+++ b/libguile/array-map.h
@@ -34,16 +34,6 @@ SCM_API int scm_ramapc (void *cproc, SCM data, SCM ra0, SCM lra,
 SCM_API int scm_array_fill_int (SCM ra, SCM fill, SCM ignore);
 SCM_API SCM scm_array_fill_x (SCM ra, SCM fill);
 SCM_API SCM scm_array_copy_x (SCM src, SCM dst);
-SCM_API int scm_ra_eqp (SCM ra0, SCM ras);
-SCM_API int scm_ra_lessp (SCM ra0, SCM ras);
-SCM_API int scm_ra_leqp (SCM ra0, SCM ras);
-SCM_API int scm_ra_grp (SCM ra0, SCM ras);
-SCM_API int scm_ra_greqp (SCM ra0, SCM ras);
-SCM_API int scm_ra_sum (SCM ra0, SCM ras);
-SCM_API int scm_ra_difference (SCM ra0, SCM ras);
-SCM_API int scm_ra_product (SCM ra0, SCM ras);
-SCM_API int scm_ra_divide (SCM ra0, SCM ras);
-SCM_API int scm_array_identity (SCM src, SCM dst);
 SCM_API SCM scm_array_map_x (SCM ra0, SCM proc, SCM lra);
 SCM_API SCM scm_array_for_each (SCM proc, SCM ra0, SCM lra);
 SCM_API SCM scm_array_index_map_x (SCM ra, SCM proc);
-- 
1.8.2


[-- Attachment #3: 0002-Remove-double-indirection-in-element-access-in-array.patch --]
[-- Type: application/octet-stream, Size: 1851 bytes --]

From a38b0a98ed6093ee9ebe4ac60b4b6f9efbdcfdd5 Mon Sep 17 00:00:00 2001
From: Daniel Llorens <daniel.llorens@bluewin.ch>
Date: Mon, 1 Apr 2013 18:43:58 +0200
Subject: [PATCH 2/2] Remove double indirection in element access in
 array-copy!

* libguile/array-map.c: (racp): factor scm_generalized_vector_ref,
  scm_generalized_vector_set_x out of the rank-1 loop.
---
 libguile/array-map.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/libguile/array-map.c b/libguile/array-map.c
index 8c60f34..767e97d 100644
--- a/libguile/array-map.c
+++ b/libguile/array-map.c
@@ -350,21 +350,24 @@ scm_array_fill_int (SCM ra, SCM fill, SCM ignore SCM_UNUSED)
 #undef FUNC_NAME
 
 
-
-static int 
+static int
 racp (SCM src, SCM dst)
 {
   long n = (SCM_I_ARRAY_DIMS (src)->ubnd - SCM_I_ARRAY_DIMS (src)->lbnd + 1);
-  long inc_d, inc_s = SCM_I_ARRAY_DIMS (src)->inc;
-  unsigned long i_d, i_s = SCM_I_ARRAY_BASE (src);
-  dst = SCM_CAR (dst);
-  inc_d = SCM_I_ARRAY_DIMS (dst)->inc;
-  i_d = SCM_I_ARRAY_BASE (dst);
-  src = SCM_I_ARRAY_V (src);
-  dst = SCM_I_ARRAY_V (dst);
+  scm_t_array_handle h_s, h_d;
+  size_t i_s, i_d;
+  ssize_t inc_s, inc_d;
+  dst = SCM_CAR(dst);
+  scm_generalized_vector_get_handle (SCM_I_ARRAY_V (src), &h_s);
+  scm_generalized_vector_get_handle (SCM_I_ARRAY_V (dst), &h_d);
+  i_s = h_s.base + h_s.dims[0].lbnd + SCM_I_ARRAY_BASE (src)*h_s.dims[0].inc;
+  i_d = h_d.base + h_d.dims[0].lbnd + SCM_I_ARRAY_BASE (dst)*h_d.dims[0].inc;
+  inc_s = SCM_I_ARRAY_DIMS (src)->inc * h_s.dims[0].inc;
+  inc_d = SCM_I_ARRAY_DIMS (dst)->inc * h_d.dims[0].inc;
 
   for (; n-- > 0; i_s += inc_s, i_d += inc_d)
-    GVSET (dst, i_d, GVREF (src, i_s));
+    h_d.impl->vset (&h_d, i_d, h_s.impl->vref (&h_s, i_s));
+
   return 1;
 }
 
-- 
1.8.2


[-- Attachment #4: Type: text/plain, Size: 2 bytes --]




       reply	other threads:[~2013-04-01 17:15 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.1257718.1364809945.854.guile-devel@gnu.org>
2013-04-01 17:15 ` Daniel Llorens [this message]
2013-04-02 10:19   ` array-copy! is slow & array-map.c (was: Extremly slow for format & string-join) Daniel Llorens
2013-04-02 14:06     ` Daniel Llorens
2013-04-03 12:50       ` array-copy! is slow & array-map.c Ludovic Courtès
2013-04-03 14:50         ` Daniel Llorens
2013-04-03 17:03           ` Ludovic Courtès
2013-04-03 17:06           ` Ludovic Courtès
2013-04-03 17:59             ` Daniel Llorens
2013-04-03 17:07           ` Ludovic Courtès
2013-04-03 19:36           ` Ludovic Courtès
     [not found]             ` <ECA152EF-A180-45EF-9E8F-D40DD28A2779@jast.ch>
     [not found]               ` <87mwtfmlap.fsf@gnu.org>
2013-04-03 21:04                 ` Daniel Llorens
2013-04-05 17:20                   ` Ludovic Courtès
2013-04-05 17:29                     ` Daniel Llorens
2013-04-05 20:32                       ` Ludovic Courtès
2013-04-05 20:36                   ` Ludovic Courtès
2013-04-06 22:59                     ` Daniel Llorens
2013-04-06 23:01                       ` Fwd: " Daniel Llorens
2013-04-07  9:18                       ` Ludovic Courtès
2013-04-03 19:42           ` Ludovic Courtès
2013-04-02 14:55     ` array-copy! is slow & array-map.c (was: Extremly slow for format & string-join) Daniel Llorens
2013-04-02 14:57       ` Daniel Llorens
2013-04-02 15:14       ` Daniel Llorens
2013-04-03 12:05   ` array-copy! is slow & array-map.c Ludovic Courtès
2013-04-03 12:23   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F525A557-2F6F-4204-9CB7-CBB96420D7C3@bluewin.ch \
    --to=daniel.llorens@bluewin.ch \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).