From: Daniel Llorens <daniel.llorens@bluewin.ch>
To: guile-devel <guile-devel@gnu.org>
Subject: array-copy! is slow & array-map.c (was: Extremly slow for format & string-join)
Date: Mon, 1 Apr 2013 19:15:31 +0200 [thread overview]
Message-ID: <F525A557-2F6F-4204-9CB7-CBB96420D7C3@bluewin.ch> (raw)
In-Reply-To: <mailman.1257718.1364809945.854.guile-devel@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 4365 bytes --]
> Message: 5
> Date: Mon, 1 Apr 2013 15:40:48 +0800
> From: Daniel Hartwig <mandyke@gmail.com>
> To: guile-devel@gnu.org
> Subject: Re: Extremly slow for format & string-join
> On 1 April 2013 14:59, Daniel Llorens <daniel.llorens@bluewin.ch> wrote:
>> How can it be slower to allocate the result at once?
>
> Shrug. I do not know much of array internals. You probably have much
> more experience there than I.
Not much with the implementation, no :-/
> Except for the curious profile output, I suspect the overhead is due
> to such factors as repeated application of MAPFUNC and consequent
> arithmetic to access the shared arrays contents
mapfunc is used only to compute the strides and bounds, it isn't kept beyond make-shared-array.
But I hadn't thought that the profile was wrong. Indeed, the slow part is not make-typed-array but array-copy!.
scheme@(guile-user)> (define s "1234567890")
scheme@(guile-user)> (define t (make-shared-array s (lambda (i j) (list j)) 1000000 10))
scheme@(guile-user)> (define a (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> (define b (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> ,time (array-copy! t a)
;; 1.718000s real time, 1.710000s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,time (array-copy! a b)
;; 1.673000s real time, 1.670000s run time. 0.000000s spent in GC.
scheme@(guile-user)>
Since I have no intuition for these numbers, I thought maybe it's really this slow, or a cache problem, who knows:
scheme@(guile-user)> (import (rnrs bytevectors))
scheme@(guile-user)> (define x (make-bytevector 40000000))
scheme@(guile-user)> ,time (define y (bytevector-copy x))
;; 0.018000s real time, 0.020000s run time. 0.000000s spent in GC.
In NumPy (using doubles):
In [11]: %time a = np.zeros([1000000, 10])
CPU times: user 0.04 s, sys: 0.05 s, total: 0.09 s
Wall time: 0.09 s
In [12]: %time b = a+1
CPU times: user 0.04 s, sys: 0.05 s, total: 0.09 s
Wall time: 0.09 s
So it's really bad. I had a look at libguile/array-map.c. There are three parts in there:
[1] scm_ramapc(). This is a general array traversal function. It does linearization, so the (array-copy! a b) call above should resolve to a single call to racp().
[2] array-copy!, array-fill!, array-map!, array-for-each, etc. These all use scm_ramapc().
[3] a bunch of specializations scm_ra_sum, scm_ra_difference, and so on.
First, I think that all of [3] should be gone, it's dead code. This is the first patch.
Second, array-map!, array-for-each cons on each application of proc. The quick & dirty solution is to add 1-arg, 2-args, etc. cases to ramap(), rafe(). array-index-map! does its own traversal and can't be linearized, so that can't be fixed as easily. There are weirdo cases. For example array-equal? calls array_compare that recurses on itself down to the last rank. This means that there's a function call on each and every array element.
I don't know whether fixing these problems is worthwhile, or the whole thing should be rewritten, maybe with a different approach. Either go to Scheme where we have macros and can inline the inner loops, or use a code generator to generate fixed rank cases, etc.
Third, none of the above are causing the slowness of array-copy!. I noticed that there's a double indirection in racp(). The second patch removes it. Actually this double indirection goes on all over array-map.c and I don't understand why it's needed...
It's only a bit faster than before, though:
scheme@(guile-user)> (define s "1234567890")
scheme@(guile-user)> (define t (make-shared-array s (lambda (i j) (list j)) 1000000 10))
scheme@(guile-user)> (define a (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> (define b (make-typed-array 'a *unspecified* 1000000 10))
scheme@(guile-user)> ,time (array-copy! t a)
;; 1.187000s real time, 1.190000s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,time (array-copy! a b)
;; 1.107000s real time, 1.110000s run time. 0.000000s spent in GC.
scheme@(guile-user)>
There's the overhead of impl->, etc. I'm thinking that one can do a direct memory copy when the array types are the same, or even call memcpy() when the strides allow. I think these should be relatively common cases.
Regards,
Daniel
[-- Attachment #2: 0001-Remove-dead-code-in-array-map.c.patch --]
[-- Type: application/octet-stream, Size: 8036 bytes --]
From 3ab911215c98d6c3a018755439dda59cd5c0ed40 Mon Sep 17 00:00:00 2001
From: Daniel Llorens <daniel.llorens@bluewin.ch>
Date: Mon, 1 Apr 2013 17:16:15 +0200
Subject: [PATCH 1/2] Remove dead code in array-map.c
* array-map.h, array-map.c: remove scm_ra_eqp, ra_compare, scm_ra_lessp,
scm_ra_leqp, scm_ra_grp, scm_ra_greqp, scm_ra_sum, scm_ra_difference,
scm_ra_product, scm_ra_divide, scm_array_identity.
---
libguile/array-map.c | 244 ---------------------------------------------------
libguile/array-map.h | 10 ---
2 files changed, 254 deletions(-)
diff --git a/libguile/array-map.c b/libguile/array-map.c
index d4da152..8c60f34 100644
--- a/libguile/array-map.c
+++ b/libguile/array-map.c
@@ -385,252 +385,8 @@ SCM_DEFINE (scm_array_copy_x, "array-copy!", 2, 0, 0,
}
#undef FUNC_NAME
-/* Functions callable by ARRAY-MAP! */
-
-
-int
-scm_ra_eqp (SCM ra0, SCM ras)
-{
- SCM ra1 = SCM_CAR (ras), ra2 = SCM_CAR (SCM_CDR (ras));
- scm_t_array_handle ra0_handle;
- scm_t_array_dim *ra0_dims;
- size_t n;
- ssize_t inc0;
- size_t i0 = 0;
- unsigned long i1 = SCM_I_ARRAY_BASE (ra1), i2 = SCM_I_ARRAY_BASE (ra2);
- long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
- long inc2 = SCM_I_ARRAY_DIMS (ra1)->inc;
- ra1 = SCM_I_ARRAY_V (ra1);
- ra2 = SCM_I_ARRAY_V (ra2);
-
- scm_array_get_handle (ra0, &ra0_handle);
- ra0_dims = scm_array_handle_dims (&ra0_handle);
- n = ra0_dims[0].ubnd - ra0_dims[0].lbnd + 1;
- inc0 = ra0_dims[0].inc;
-
- {
- for (; n-- > 0; i0 += inc0, i1 += inc1, i2 += inc2)
- if (scm_is_true (scm_array_handle_ref (&ra0_handle, i0)))
- if (!scm_is_eq (GVREF (ra1, i1), GVREF (ra2, i2)))
- scm_array_handle_set (&ra0_handle, i0, SCM_BOOL_F);
- }
-
- scm_array_handle_release (&ra0_handle);
- return 1;
-}
-
-/* opt 0 means <, nonzero means >= */
static int
-ra_compare (SCM ra0, SCM ra1, SCM ra2, int opt)
-{
- scm_t_array_handle ra0_handle;
- scm_t_array_dim *ra0_dims;
- size_t n;
- ssize_t inc0;
- size_t i0 = 0;
- unsigned long i1 = SCM_I_ARRAY_BASE (ra1), i2 = SCM_I_ARRAY_BASE (ra2);
- long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
- long inc2 = SCM_I_ARRAY_DIMS (ra1)->inc;
- ra1 = SCM_I_ARRAY_V (ra1);
- ra2 = SCM_I_ARRAY_V (ra2);
-
- scm_array_get_handle (ra0, &ra0_handle);
- ra0_dims = scm_array_handle_dims (&ra0_handle);
- n = ra0_dims[0].ubnd - ra0_dims[0].lbnd + 1;
- inc0 = ra0_dims[0].inc;
-
- {
- for (; n-- > 0; i0 += inc0, i1 += inc1, i2 += inc2)
- if (scm_is_true (scm_array_handle_ref (&ra0_handle, i0)))
- if (opt ?
- scm_is_true (scm_less_p (GVREF (ra1, i1), GVREF (ra2, i2))) :
- scm_is_false (scm_less_p (GVREF (ra1, i1), GVREF (ra2, i2))))
- scm_array_handle_set (&ra0_handle, i0, SCM_BOOL_F);
- }
-
- scm_array_handle_release (&ra0_handle);
- return 1;
-}
-
-
-
-int
-scm_ra_lessp (SCM ra0, SCM ras)
-{
- return ra_compare (ra0, SCM_CAR (ras), SCM_CAR (SCM_CDR (ras)), 0);
-}
-
-
-int
-scm_ra_leqp (SCM ra0, SCM ras)
-{
- return ra_compare (ra0, SCM_CAR (SCM_CDR (ras)), SCM_CAR (ras), 1);
-}
-
-
-int
-scm_ra_grp (SCM ra0, SCM ras)
-{
- return ra_compare (ra0, SCM_CAR (SCM_CDR (ras)), SCM_CAR (ras), 0);
-}
-
-
-int
-scm_ra_greqp (SCM ra0, SCM ras)
-{
- return ra_compare (ra0, SCM_CAR (ras), SCM_CAR (SCM_CDR (ras)), 1);
-}
-
-
-int
-scm_ra_sum (SCM ra0, SCM ras)
-{
- long n = SCM_I_ARRAY_DIMS (ra0)->ubnd - SCM_I_ARRAY_DIMS (ra0)->lbnd + 1;
- unsigned long i0 = SCM_I_ARRAY_BASE (ra0);
- long inc0 = SCM_I_ARRAY_DIMS (ra0)->inc;
- ra0 = SCM_I_ARRAY_V (ra0);
- if (!scm_is_null(ras))
- {
- SCM ra1 = SCM_CAR (ras);
- unsigned long i1 = SCM_I_ARRAY_BASE (ra1);
- long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
- ra1 = SCM_I_ARRAY_V (ra1);
- switch (SCM_TYP7 (ra0) == SCM_TYP7 (ra1) ? SCM_TYP7 (ra0) : 0)
- {
- default:
- {
- for (; n-- > 0; i0 += inc0, i1 += inc1)
- GVSET (ra0, i0, scm_sum (GVREF(ra0, i0), GVREF(ra1, i1)));
- break;
- }
- }
- }
- return 1;
-}
-
-
-
-int
-scm_ra_difference (SCM ra0, SCM ras)
-{
- long n = SCM_I_ARRAY_DIMS (ra0)->ubnd - SCM_I_ARRAY_DIMS (ra0)->lbnd + 1;
- unsigned long i0 = SCM_I_ARRAY_BASE (ra0);
- long inc0 = SCM_I_ARRAY_DIMS (ra0)->inc;
- ra0 = SCM_I_ARRAY_V (ra0);
- if (scm_is_null (ras))
- {
- switch (SCM_TYP7 (ra0))
- {
- default:
- {
- for (; n-- > 0; i0 += inc0)
- GVSET (ra0, i0, scm_difference (GVREF(ra0, i0), SCM_UNDEFINED));
- break;
- }
- }
- }
- else
- {
- SCM ra1 = SCM_CAR (ras);
- unsigned long i1 = SCM_I_ARRAY_BASE (ra1);
- long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
- ra1 = SCM_I_ARRAY_V (ra1);
- switch (SCM_TYP7 (ra0) == SCM_TYP7 (ra1) ? SCM_TYP7 (ra0) : 0)
- {
- default:
- {
- for (; n-- > 0; i0 += inc0, i1 += inc1)
- GVSET (ra0, i0, scm_difference (GVREF (ra0, i0),
- GVREF (ra1, i1)));
- break;
- }
- }
- }
- return 1;
-}
-
-
-
-int
-scm_ra_product (SCM ra0, SCM ras)
-{
- long n = SCM_I_ARRAY_DIMS (ra0)->ubnd - SCM_I_ARRAY_DIMS (ra0)->lbnd + 1;
- unsigned long i0 = SCM_I_ARRAY_BASE (ra0);
- long inc0 = SCM_I_ARRAY_DIMS (ra0)->inc;
- ra0 = SCM_I_ARRAY_V (ra0);
- if (!scm_is_null (ras))
- {
- SCM ra1 = SCM_CAR (ras);
- unsigned long i1 = SCM_I_ARRAY_BASE (ra1);
- long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
- ra1 = SCM_I_ARRAY_V (ra1);
- switch (SCM_TYP7 (ra0) == SCM_TYP7 (ra1) ? SCM_TYP7 (ra0) : 0)
- {
- default:
- {
- for (; n-- > 0; i0 += inc0, i1 += inc1)
- GVSET (ra0, i0, scm_product (GVREF (ra0, i0),
- GVREF (ra1, i1)));
- }
- }
- }
- return 1;
-}
-
-
-int
-scm_ra_divide (SCM ra0, SCM ras)
-{
- long n = SCM_I_ARRAY_DIMS (ra0)->ubnd - SCM_I_ARRAY_DIMS (ra0)->lbnd + 1;
- unsigned long i0 = SCM_I_ARRAY_BASE (ra0);
- long inc0 = SCM_I_ARRAY_DIMS (ra0)->inc;
- ra0 = SCM_I_ARRAY_V (ra0);
- if (scm_is_null (ras))
- {
- switch (SCM_TYP7 (ra0))
- {
- default:
- {
- for (; n-- > 0; i0 += inc0)
- GVSET (ra0, i0, scm_divide (GVREF (ra0, i0), SCM_UNDEFINED));
- break;
- }
- }
- }
- else
- {
- SCM ra1 = SCM_CAR (ras);
- unsigned long i1 = SCM_I_ARRAY_BASE (ra1);
- long inc1 = SCM_I_ARRAY_DIMS (ra1)->inc;
- ra1 = SCM_I_ARRAY_V (ra1);
- switch (SCM_TYP7 (ra0) == SCM_TYP7 (ra1) ? SCM_TYP7 (ra0) : 0)
- {
- default:
- {
- for (; n-- > 0; i0 += inc0, i1 += inc1)
- {
- SCM res = scm_divide (GVREF (ra0, i0),
- GVREF (ra1, i1));
- GVSET (ra0, i0, res);
- }
- break;
- }
- }
- }
- return 1;
-}
-
-
-int
-scm_array_identity (SCM dst, SCM src)
-{
- return racp (SCM_CAR (src), scm_cons (dst, SCM_EOL));
-}
-
-
-
-static int
ramap (SCM ra0, SCM proc, SCM ras)
{
long i = SCM_I_ARRAY_DIMS (ra0)->lbnd;
diff --git a/libguile/array-map.h b/libguile/array-map.h
index 43d2a92..a50fcc5 100644
--- a/libguile/array-map.h
+++ b/libguile/array-map.h
@@ -34,16 +34,6 @@ SCM_API int scm_ramapc (void *cproc, SCM data, SCM ra0, SCM lra,
SCM_API int scm_array_fill_int (SCM ra, SCM fill, SCM ignore);
SCM_API SCM scm_array_fill_x (SCM ra, SCM fill);
SCM_API SCM scm_array_copy_x (SCM src, SCM dst);
-SCM_API int scm_ra_eqp (SCM ra0, SCM ras);
-SCM_API int scm_ra_lessp (SCM ra0, SCM ras);
-SCM_API int scm_ra_leqp (SCM ra0, SCM ras);
-SCM_API int scm_ra_grp (SCM ra0, SCM ras);
-SCM_API int scm_ra_greqp (SCM ra0, SCM ras);
-SCM_API int scm_ra_sum (SCM ra0, SCM ras);
-SCM_API int scm_ra_difference (SCM ra0, SCM ras);
-SCM_API int scm_ra_product (SCM ra0, SCM ras);
-SCM_API int scm_ra_divide (SCM ra0, SCM ras);
-SCM_API int scm_array_identity (SCM src, SCM dst);
SCM_API SCM scm_array_map_x (SCM ra0, SCM proc, SCM lra);
SCM_API SCM scm_array_for_each (SCM proc, SCM ra0, SCM lra);
SCM_API SCM scm_array_index_map_x (SCM ra, SCM proc);
--
1.8.2
[-- Attachment #3: 0002-Remove-double-indirection-in-element-access-in-array.patch --]
[-- Type: application/octet-stream, Size: 1851 bytes --]
From a38b0a98ed6093ee9ebe4ac60b4b6f9efbdcfdd5 Mon Sep 17 00:00:00 2001
From: Daniel Llorens <daniel.llorens@bluewin.ch>
Date: Mon, 1 Apr 2013 18:43:58 +0200
Subject: [PATCH 2/2] Remove double indirection in element access in
array-copy!
* libguile/array-map.c: (racp): factor scm_generalized_vector_ref,
scm_generalized_vector_set_x out of the rank-1 loop.
---
libguile/array-map.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/libguile/array-map.c b/libguile/array-map.c
index 8c60f34..767e97d 100644
--- a/libguile/array-map.c
+++ b/libguile/array-map.c
@@ -350,21 +350,24 @@ scm_array_fill_int (SCM ra, SCM fill, SCM ignore SCM_UNUSED)
#undef FUNC_NAME
-
-static int
+static int
racp (SCM src, SCM dst)
{
long n = (SCM_I_ARRAY_DIMS (src)->ubnd - SCM_I_ARRAY_DIMS (src)->lbnd + 1);
- long inc_d, inc_s = SCM_I_ARRAY_DIMS (src)->inc;
- unsigned long i_d, i_s = SCM_I_ARRAY_BASE (src);
- dst = SCM_CAR (dst);
- inc_d = SCM_I_ARRAY_DIMS (dst)->inc;
- i_d = SCM_I_ARRAY_BASE (dst);
- src = SCM_I_ARRAY_V (src);
- dst = SCM_I_ARRAY_V (dst);
+ scm_t_array_handle h_s, h_d;
+ size_t i_s, i_d;
+ ssize_t inc_s, inc_d;
+ dst = SCM_CAR(dst);
+ scm_generalized_vector_get_handle (SCM_I_ARRAY_V (src), &h_s);
+ scm_generalized_vector_get_handle (SCM_I_ARRAY_V (dst), &h_d);
+ i_s = h_s.base + h_s.dims[0].lbnd + SCM_I_ARRAY_BASE (src)*h_s.dims[0].inc;
+ i_d = h_d.base + h_d.dims[0].lbnd + SCM_I_ARRAY_BASE (dst)*h_d.dims[0].inc;
+ inc_s = SCM_I_ARRAY_DIMS (src)->inc * h_s.dims[0].inc;
+ inc_d = SCM_I_ARRAY_DIMS (dst)->inc * h_d.dims[0].inc;
for (; n-- > 0; i_s += inc_s, i_d += inc_d)
- GVSET (dst, i_d, GVREF (src, i_s));
+ h_d.impl->vset (&h_d, i_d, h_s.impl->vref (&h_s, i_s));
+
return 1;
}
--
1.8.2
[-- Attachment #4: Type: text/plain, Size: 2 bytes --]
next parent reply other threads:[~2013-04-01 17:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.1257718.1364809945.854.guile-devel@gnu.org>
2013-04-01 17:15 ` Daniel Llorens [this message]
2013-04-02 10:19 ` array-copy! is slow & array-map.c (was: Extremly slow for format & string-join) Daniel Llorens
2013-04-02 14:06 ` Daniel Llorens
2013-04-03 12:50 ` array-copy! is slow & array-map.c Ludovic Courtès
2013-04-03 14:50 ` Daniel Llorens
2013-04-03 17:03 ` Ludovic Courtès
2013-04-03 17:06 ` Ludovic Courtès
2013-04-03 17:59 ` Daniel Llorens
2013-04-03 17:07 ` Ludovic Courtès
2013-04-03 19:36 ` Ludovic Courtès
[not found] ` <ECA152EF-A180-45EF-9E8F-D40DD28A2779@jast.ch>
[not found] ` <87mwtfmlap.fsf@gnu.org>
2013-04-03 21:04 ` Daniel Llorens
2013-04-05 17:20 ` Ludovic Courtès
2013-04-05 17:29 ` Daniel Llorens
2013-04-05 20:32 ` Ludovic Courtès
2013-04-05 20:36 ` Ludovic Courtès
2013-04-06 22:59 ` Daniel Llorens
2013-04-06 23:01 ` Fwd: " Daniel Llorens
2013-04-07 9:18 ` Ludovic Courtès
2013-04-03 19:42 ` Ludovic Courtès
2013-04-02 14:55 ` array-copy! is slow & array-map.c (was: Extremly slow for format & string-join) Daniel Llorens
2013-04-02 14:57 ` Daniel Llorens
2013-04-02 15:14 ` Daniel Llorens
2013-04-03 12:05 ` array-copy! is slow & array-map.c Ludovic Courtès
2013-04-03 12:23 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F525A557-2F6F-4204-9CB7-CBB96420D7C3@bluewin.ch \
--to=daniel.llorens@bluewin.ch \
--cc=guile-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).