unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* our benchmark-suite
@ 2012-04-23  9:22 Andy Wingo
  2012-04-24  8:26 ` Andy Wingo
  2012-04-25 20:39 ` Ludovic Courtès
  0 siblings, 2 replies; 11+ messages in thread
From: Andy Wingo @ 2012-04-23  9:22 UTC (permalink / raw)
  To: guile-devel

Hi,

I was going to try to optimize vhash-assoc, but I wanted a good
benchmark first, so I started to look at our benchmark suite.  We have
some issues to deal with.

For those of you who are not familiar with the benchmark suite, we have
a bunch of benchmarks in benchmark-suite/benchmarks/: those files that
end in ".bm".  The format of a .bm file is like our .test files, except
that instead of `pass-if' and the like, we have `benchmark'.  You run
benchmarks via ./benchmark-guile in the $top_builddir.

The benchmarking framework tries to be appropriate for microbenchmarks,
as the `benchmark' form includes a suggested number of iterations.
Ideally when you create a benchmark, you give it a number of iterations
that makes it run approximately as long as the other benchmarks.

When the benchmarking suite was first made, 10 years ago, there was an
empty "reference" benchmark that was created to run for approximately 1
second.  Currently it runs in 0.012 seconds.  This is one problem: the
overall suite has old iteration counts.  There is a facility for scaling
the iteration counts of the suite as a whole, but it is unused.

Another problem is that the actual runtime of the various benchmarks
varies quite a lot, from 3.3 seconds for assoc (srfi-1), to 0.012 for
if.bm.

Short runtimes magnify imprecisions in measurement.  It used to be that
the measurement function was "times", but I just changed that to the
higher-precision get-internal-real-time / get-internal-run-time.  Still,
though, there is nothing you can do for a benchmark that runs in a few
milliseconds or less.

Another big problem is that some effect-free microbenchmarks optimize
away.  For example, the computations in arithmetic.bm fold entirely.
The same goes for if.bm.  These benchmarks do not measure anything
useful.

The benchmarking suite attempts to compensate for the overhead of the
test by providing for "core time": the time taken to run a benchmark,
minus the time taken to run an empty benchmark with the same number of
iterations.  The benchmark itself is compiled as a thunk, and the
framework calls the thunk repeatedly.  In theory this sounds good.  In
practice however, for high-iteration microbenchmarks, the overhead of
the thunk call outweighs any micro-benchmark being called.

For what it's worth, the current overhead of the benchmark appears to be
about 35 microseconds per iteration, on my laptop.  If we inline the
iteration into the benchmark itself, rather than calling a thunk
repeatedly, we can bring that down to around 13 microseconds.  However
it's probably best to leave it as it is, because if we inline the loop,
it's liable to be optimized out.

So, those are the problems: benchmarks running for inappropriate,
inconsistent durations; inappropriate benchmarks; and benchmarks being
optimized out.

My proposal is to rebase the iteration count in 0-reference.bm to run
for 0.5s on some modern machine, and adjust all benchmarks to match,
removing those benchmarks that do not measure anything useful.  Finally
we should perhaps enable automatic scaling of the iteration count.  What
do folks think about that?

On the positive side, all of our benchmarks are very clear that they are
a time per number of iterations, and so this change should not affect
users that measure time per iteration.

Regards,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-04-23  9:22 our benchmark-suite Andy Wingo
@ 2012-04-24  8:26 ` Andy Wingo
  2012-04-25 20:39 ` Ludovic Courtès
  1 sibling, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2012-04-24  8:26 UTC (permalink / raw)
  To: Neil Jerram

Heya Neil,

I pushed a change to the format of the text logged to the console when
you do a ./benchmark-guile.  It seems that this affected your
benchmarking bot.  I was hoping that this would not be the case, because
the benchmark suite also writes a log to `guile-benchmark.log', and I
tried to avoid changing the format of that file.

Can you take a look at your bot and see if it's possible to switch to
use benchmark-guile.log instead of the console output?

Other suggestions as to a solution are also most welcome.

Thanks!

Andy

On Mon 23 Apr 2012 11:22, Andy Wingo <wingo@pobox.com> writes:

> Hi,
>
> I was going to try to optimize vhash-assoc, but I wanted a good
> benchmark first, so I started to look at our benchmark suite.  We have
> some issues to deal with.
>
> For those of you who are not familiar with the benchmark suite, we have
> a bunch of benchmarks in benchmark-suite/benchmarks/: those files that
> end in ".bm".  The format of a .bm file is like our .test files, except
> that instead of `pass-if' and the like, we have `benchmark'.  You run
> benchmarks via ./benchmark-guile in the $top_builddir.
>
> The benchmarking framework tries to be appropriate for microbenchmarks,
> as the `benchmark' form includes a suggested number of iterations.
> Ideally when you create a benchmark, you give it a number of iterations
> that makes it run approximately as long as the other benchmarks.
>
> When the benchmarking suite was first made, 10 years ago, there was an
> empty "reference" benchmark that was created to run for approximately 1
> second.  Currently it runs in 0.012 seconds.  This is one problem: the
> overall suite has old iteration counts.  There is a facility for scaling
> the iteration counts of the suite as a whole, but it is unused.
>
> Another problem is that the actual runtime of the various benchmarks
> varies quite a lot, from 3.3 seconds for assoc (srfi-1), to 0.012 for
> if.bm.
>
> Short runtimes magnify imprecisions in measurement.  It used to be that
> the measurement function was "times", but I just changed that to the
> higher-precision get-internal-real-time / get-internal-run-time.  Still,
> though, there is nothing you can do for a benchmark that runs in a few
> milliseconds or less.
>
> Another big problem is that some effect-free microbenchmarks optimize
> away.  For example, the computations in arithmetic.bm fold entirely.
> The same goes for if.bm.  These benchmarks do not measure anything
> useful.
>
> The benchmarking suite attempts to compensate for the overhead of the
> test by providing for "core time": the time taken to run a benchmark,
> minus the time taken to run an empty benchmark with the same number of
> iterations.  The benchmark itself is compiled as a thunk, and the
> framework calls the thunk repeatedly.  In theory this sounds good.  In
> practice however, for high-iteration microbenchmarks, the overhead of
> the thunk call outweighs any micro-benchmark being called.
>
> For what it's worth, the current overhead of the benchmark appears to be
> about 35 microseconds per iteration, on my laptop.  If we inline the
> iteration into the benchmark itself, rather than calling a thunk
> repeatedly, we can bring that down to around 13 microseconds.  However
> it's probably best to leave it as it is, because if we inline the loop,
> it's liable to be optimized out.
>
> So, those are the problems: benchmarks running for inappropriate,
> inconsistent durations; inappropriate benchmarks; and benchmarks being
> optimized out.
>
> My proposal is to rebase the iteration count in 0-reference.bm to run
> for 0.5s on some modern machine, and adjust all benchmarks to match,
> removing those benchmarks that do not measure anything useful.  Finally
> we should perhaps enable automatic scaling of the iteration count.  What
> do folks think about that?
>
> On the positive side, all of our benchmarks are very clear that they are
> a time per number of iterations, and so this change should not affect
> users that measure time per iteration.
>
> Regards,
>
> Andy

-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-04-23  9:22 our benchmark-suite Andy Wingo
  2012-04-24  8:26 ` Andy Wingo
@ 2012-04-25 20:39 ` Ludovic Courtès
  2012-04-28 21:09   ` Neil Jerram
  2012-05-16 17:01   ` Andy Wingo
  1 sibling, 2 replies; 11+ messages in thread
From: Ludovic Courtès @ 2012-04-25 20:39 UTC (permalink / raw)
  To: guile-devel

Hi Andy!

Andy Wingo <wingo@pobox.com> skribis:

> For what it's worth, the current overhead of the benchmark appears to be
> about 35 microseconds per iteration, on my laptop.  If we inline the
> iteration into the benchmark itself, rather than calling a thunk
> repeatedly, we can bring that down to around 13 microseconds.

There are a few benchmarks doing it already.  See, for instance,
‘repeat’ in ‘arithmetic.bm’.

> So, those are the problems: benchmarks running for inappropriate,
> inconsistent durations;

I don’t really see such a problem.  It doesn’t matter to me if
‘arithmetic.bm’ takes 2mn while ‘vlists.bm’ takes 40s, since I’m not
comparing them.

> inappropriate benchmarks;

I agree that things like ‘if.bm’ are not very relevant now.  But there
are also appropriate benchmarks, and benchmarks are always better than
wild guess.  ;-)

> and benchmarks being optimized out.

That should be fixed.

> My proposal is to rebase the iteration count in 0-reference.bm to run
> for 0.5s on some modern machine, and adjust all benchmarks to match,
> removing those benchmarks that do not measure anything useful.

Sounds good.  However, adjusting iteration counts of the benchmarks
themselves should be done rarely, as it breaks performance tracking like
<http://ossau.homelinux.net/~neil/bm_master_i.html>.

> Finally we should perhaps enable automatic scaling of the iteration
> count.  What do folks think about that?
>
> On the positive side, all of our benchmarks are very clear that they are
> a time per number of iterations, and so this change should not affect
> users that measure time per iteration.

If the reported time is divided by the global iteration count, then
automatic scaling of the global iteration count would be good, yes.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-04-25 20:39 ` Ludovic Courtès
@ 2012-04-28 21:09   ` Neil Jerram
  2012-05-02 21:24     ` Ludovic Courtès
  2012-05-16 17:01   ` Andy Wingo
  1 sibling, 1 reply; 11+ messages in thread
From: Neil Jerram @ 2012-04-28 21:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

ludo@gnu.org (Ludovic Courtès) writes:

>> My proposal is to rebase the iteration count in 0-reference.bm to run
>> for 0.5s on some modern machine, and adjust all benchmarks to match,
>> removing those benchmarks that do not measure anything useful.
>
> Sounds good.  However, adjusting iteration counts of the benchmarks
> themselves should be done rarely, as it breaks performance tracking like
> <http://ossau.homelinux.net/~neil/bm_master_i.html>.
>
>> Finally we should perhaps enable automatic scaling of the iteration
>> count.  What do folks think about that?
>>
>> On the positive side, all of our benchmarks are very clear that they are
>> a time per number of iterations, and so this change should not affect
>> users that measure time per iteration.
>
> If the reported time is divided by the global iteration count, then
> automatic scaling of the global iteration count would be good, yes.

For http://ossau.homelinux.net/~neil I do still have all of the raw data
including iteration counts, so I could easily implement dividing by the
iteration count, and hence allow for future iteration count changes.

Is there any downside from doing that?  (I don't think so.)

Regards,
        Neil



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-04-28 21:09   ` Neil Jerram
@ 2012-05-02 21:24     ` Ludovic Courtès
  2012-05-04 21:43       ` Neil Jerram
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2012-05-02 21:24 UTC (permalink / raw)
  To: Neil Jerram; +Cc: guile-devel

Hi,

Neil Jerram <neil@ossau.homelinux.net> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>>> My proposal is to rebase the iteration count in 0-reference.bm to run
>>> for 0.5s on some modern machine, and adjust all benchmarks to match,
>>> removing those benchmarks that do not measure anything useful.
>>
>> Sounds good.  However, adjusting iteration counts of the benchmarks
>> themselves should be done rarely, as it breaks performance tracking like
>> <http://ossau.homelinux.net/~neil/bm_master_i.html>.
>>
>>> Finally we should perhaps enable automatic scaling of the iteration
>>> count.  What do folks think about that?
>>>
>>> On the positive side, all of our benchmarks are very clear that they are
>>> a time per number of iterations, and so this change should not affect
>>> users that measure time per iteration.
>>
>> If the reported time is divided by the global iteration count, then
>> automatic scaling of the global iteration count would be good, yes.
>
> For http://ossau.homelinux.net/~neil I do still have all of the raw data
> including iteration counts, so I could easily implement dividing by the
> iteration count, and hence allow for future iteration count changes.
>
> Is there any downside from doing that?  (I don't think so.)

No, I guess.  And as you show, having raw data instead of synthesized
figures gives more freedom.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-05-02 21:24     ` Ludovic Courtès
@ 2012-05-04 21:43       ` Neil Jerram
  2012-05-07 14:38         ` Ludovic Courtès
  2012-05-15 20:48         ` Andy Wingo
  0 siblings, 2 replies; 11+ messages in thread
From: Neil Jerram @ 2012-05-04 21:43 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

>> For http://ossau.homelinux.net/~neil I do still have all of the raw data
>> including iteration counts, so I could easily implement dividing by the
>> iteration count, and hence allow for future iteration count changes.
>>
>> Is there any downside from doing that?  (I don't think so.)
>
> No, I guess.  And as you show, having raw data instead of synthesized
> figures gives more freedom.

It turns out I'm already scaling by iteration count - in fact since
November 2009. :-)

Still, I wanted to do something new, so I've added further graphs
showing just the last 50 measurements for each benchmark (whereas the
existing graphs showed all measurements since my data collection
began).  The generation of those is still running at the moment, but
should be complete in an hour or so.

Regards,
        Neil



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-05-04 21:43       ` Neil Jerram
@ 2012-05-07 14:38         ` Ludovic Courtès
  2012-05-15 20:48         ` Andy Wingo
  1 sibling, 0 replies; 11+ messages in thread
From: Ludovic Courtès @ 2012-05-07 14:38 UTC (permalink / raw)
  To: Neil Jerram; +Cc: guile-devel

Hi Neil!

Neil Jerram <neil@ossau.homelinux.net> skribis:

> Still, I wanted to do something new, so I've added further graphs
> showing just the last 50 measurements for each benchmark (whereas the
> existing graphs showed all measurements since my data collection
> began).  The generation of those is still running at the moment, but
> should be complete in an hour or so.

In case you have spare time in your hands ;-), Flot [0] provides a very
nice UI for plots (Hydra uses it for its history charts.).

Also, the GNUnet people have developed a complete tool for performance
tracking, Gauger [1].  I haven’t managed to display a single plot from
there, but the idea seems nice.

Thanks,
Ludo’.

[0] http://code.google.com/p/flot/
[1] https://gnunet.org/gauger/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-05-04 21:43       ` Neil Jerram
  2012-05-07 14:38         ` Ludovic Courtès
@ 2012-05-15 20:48         ` Andy Wingo
  2012-05-19 21:54           ` Neil Jerram
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2012-05-15 20:48 UTC (permalink / raw)
  To: Neil Jerram; +Cc: Ludovic Courtès, guile-devel

Heya Neil,

On Fri 04 May 2012 23:43, Neil Jerram <neil@ossau.homelinux.net> writes:

> It turns out I'm already scaling by iteration count - in fact since
> November 2009. :-)

Excellent, so we can scale iteration counts in Guile's git with impunity
:)

It would be nice for the graphs for individual benchmarks to have an
absolute Y axis, in terms of microseconds I guess.

> Still, I wanted to do something new, so I've added further graphs
> showing just the last 50 measurements for each benchmark (whereas the
> existing graphs showed all measurements since my data collection
> began).  The generation of those is still running at the moment, but
> should be complete in an hour or so.

Neat :)  (Do you pngcrush these?  They seem a little slow to serve.)

It would also be nice to have overview graphs from the last 50 days as
well, should you have time to hack that up.

Thanks for this tool, it's neat :)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-04-25 20:39 ` Ludovic Courtès
  2012-04-28 21:09   ` Neil Jerram
@ 2012-05-16 17:01   ` Andy Wingo
  2012-05-16 21:01     ` Ludovic Courtès
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2012-05-16 17:01 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Howdy!

On Wed 25 Apr 2012 22:39, ludo@gnu.org (Ludovic Courtès) writes:

>> So, those are the problems: benchmarks running for inappropriate,
>> inconsistent durations;
>
> I don’t really see such a problem.  It doesn’t matter to me if
> ‘arithmetic.bm’ takes 2mn while ‘vlists.bm’ takes 40s, since I’m not
> comparing them.

Running a benchmark for 2 minutes is not harmful to the results, but it
is a bit needless.  One second is enough.

However, running a benchmark for just a few milliseconds is not very
interesting:

;; ("if.bm: if-<bool>-then: executing then" 330000 real 0.011994627 real/iteration 3.63473545454545e-8 run/iteration 3.62829060606061e-8 core/iteration 9.61427360606058e-10 gc 0.0)

That's 12 milliseconds.  The jitter there is too much.

>> inappropriate benchmarks;
>
> I agree that things like ‘if.bm’ are not very relevant now.  But there
> are also appropriate benchmarks, and benchmarks are always better than
> wild guess.  ;-)

Agreed :-)

>> and benchmarks being optimized out.
>
> That should be fixed.

In what way?  It would make those benchmarks different.

Thesis: anything for which you would want to turn off the optimizer is
not a good benchmark anyway.

See also: http://www.azulsystems.com/presentations/art-of-java-benchmarking

>> My proposal is to rebase the iteration count in 0-reference.bm to run
>> for 0.5s on some modern machine, and adjust all benchmarks to match,
>> removing those benchmarks that do not measure anything useful.
>
> Sounds good.  However, adjusting iteration counts of the benchmarks
> themselves should be done rarely, as it breaks performance tracking like
> <http://ossau.homelinux.net/~neil/bm_master_i.html>.

I think we've established that this isn't the case -- modulo the effect
that such a change would have on GC (process image size, etc)

>> Finally we should perhaps enable automatic scaling of the iteration
>> count.  What do folks think about that?
>>
>> On the positive side, all of our benchmarks are very clear that they are
>> a time per number of iterations, and so this change should not affect
>> users that measure time per iteration.
>
> If the reported time is divided by the global iteration count, then
> automatic scaling of the global iteration count would be good, yes.

OK, will do.

Speak now or be surprised by a commit!

;-)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-05-16 17:01   ` Andy Wingo
@ 2012-05-16 21:01     ` Ludovic Courtès
  0 siblings, 0 replies; 11+ messages in thread
From: Ludovic Courtès @ 2012-05-16 21:01 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hi!

Andy Wingo <wingo@pobox.com> skribis:

> On Wed 25 Apr 2012 22:39, ludo@gnu.org (Ludovic Courtès) writes:
>
>>> So, those are the problems: benchmarks running for inappropriate,
>>> inconsistent durations;
>>
>> I don’t really see such a problem.  It doesn’t matter to me if
>> ‘arithmetic.bm’ takes 2mn while ‘vlists.bm’ takes 40s, since I’m not
>> comparing them.
>
> Running a benchmark for 2 minutes is not harmful to the results, but it
> is a bit needless.  One second is enough.

Well, duration has to be chosen such that the jitter is small enough.
Sometimes it could be 2mn, sometimes 1s.

[...]

>>> and benchmarks being optimized out.
>>
>> That should be fixed.
>
> In what way?  It would make those benchmarks different.
>
> Thesis: anything for which you would want to turn off the optimizer is
> not a good benchmark anyway.

Yes, it depends on the benchmarks.  For instance, I once added
benchmarks for ‘1+’ and ‘1-’, because I wanted to see the impact of an
optimization to the corresponding VM instructions.

Nowadays peval would optimize those benchmarks out.  Yet, the fact is
that I was interested in the performance of the underlying VM
instructions, regardless of what the compiler might be doing.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: our benchmark-suite
  2012-05-15 20:48         ` Andy Wingo
@ 2012-05-19 21:54           ` Neil Jerram
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Jerram @ 2012-05-19 21:54 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

Andy Wingo <wingo@pobox.com> writes:

> Neat :)  (Do you pngcrush these?  They seem a little slow to serve.)

I just tried running pngcrush on all the .pngs, and didn't get more than
6-8% reduction.  So unfortunately it doesn't look like that would help
much.

Thanks for the idea though!

     Neil



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-05-19 21:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-23  9:22 our benchmark-suite Andy Wingo
2012-04-24  8:26 ` Andy Wingo
2012-04-25 20:39 ` Ludovic Courtès
2012-04-28 21:09   ` Neil Jerram
2012-05-02 21:24     ` Ludovic Courtès
2012-05-04 21:43       ` Neil Jerram
2012-05-07 14:38         ` Ludovic Courtès
2012-05-15 20:48         ` Andy Wingo
2012-05-19 21:54           ` Neil Jerram
2012-05-16 17:01   ` Andy Wingo
2012-05-16 21:01     ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).