* data-crunching in guile
@ 2009-06-24 12:03 Andy Wingo
2009-06-25 7:26 ` Neil Jerram
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Andy Wingo @ 2009-06-24 12:03 UTC (permalink / raw)
To: guile-devel
Hey all,
I'm pretty displeased about the way that the brainfuck compilation
works. Check this:
brainfuck@(guile-user)> ,c ++]
(The trailing ] is to note the end of the program. It's kindof a hack,
but it works fine.)
Disassembly of #<objcode b719c8e0>:
0 (make-int8:0) ;; 0
1 (load-symbol "make-vector") ;; make-vector
16 (link-now)
17 (variable-ref)
18 (make-int16 117 48) ;; 30000
21 (make-int8:0) ;; 0
22 (call 2)
24 (local-set 1)
26 (local-set 0)
OK, up to here we're good, it's a generic prolog to make the tape and
pointer.
28 (load-symbol "vector-set!") ;; vector-set!
43 (link-now)
44 (variable-ref)
45 (local-ref 1)
47 (local-ref 0)
49 (load-symbol "vector-ref") ;; vector-ref
63 (link-now)
64 (variable-ref)
65 (local-ref 1)
67 (local-ref 0)
69 (call 2)
71 (make-int8:1) ;; 1
72 (add)
73 (mv-call 3 :L32) ;; MV -> 81
77 (drop)
78 (br :L33) ;; -> 84
81 (truncate-values 0 0)
84 (load-symbol "vector-set!") ;; vector-set!
99 (link-now)
100 (variable-ref)
101 (local-ref 1)
103 (local-ref 0)
105 (load-symbol "vector-ref") ;; vector-ref
119 (link-now)
120 (variable-ref)
121 (local-ref 1)
123 (local-ref 0)
125 (call 2)
127 (make-int8:1) ;; 1
128 (add)
129 (goto/args 3)
Then all of this to add 2 to a cell of a vector!!!
The fact that we have to resolve vector-set! and vector-ref, indeed at
every `+', is due to the fact that the code is not wrapped in a lambda,
and thus has no place to cache the looked up var locations. See
toplevel-ref in vm.texi, for more info.
But then at every call to vector-set! (except the last one of course),
we have to set up a multiple-values context, in the possibility that
vector-ref returns multiple values. In brainfuck, which is an imperative
language, most calls are for effect, not tail calls or calls for value,
meaning that mv calls proportionally take up many instructions.
And all this for vector-set!, which only returns a value because it has
to (in the C api).
But I was thinking, the brainfuck pattern (random-access reads and
writes to a vector) is really the pattern of data crunching. The
compiled output of brainfuck programs is the same as the compiled output
of a hashing algorithm or an image synthesizer. And these data-crunching
operations are precisely those that are closest to the underlying
hardware, for better or for worse.
What I'm getting at is that I think we should have VM ops for working on
vectors -- both generic vectors, and specific ops for bytevectors, and
probably an op for string-ref as well, and possibly string-set!. Then a
native code backend could be effectively implemented to operate on the
GLIL or assembly level, relying on the Tree-IL compiler's previous
resolution of high-level operations (i.e., vector-set!) to low-level
instructions. I think we have the space in the VM, and if we group all
of the vector instructions at the end, we shouldn't affect the
instruction cache too much.
So that's my plan. Thoughts?
Cheers,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data-crunching in guile
2009-06-24 12:03 data-crunching in guile Andy Wingo
@ 2009-06-25 7:26 ` Neil Jerram
2009-06-25 8:04 ` Ludovic Courtès
2009-06-26 12:09 ` Andy Wingo
2 siblings, 0 replies; 8+ messages in thread
From: Neil Jerram @ 2009-06-25 7:26 UTC (permalink / raw)
To: Andy Wingo; +Cc: guile-devel
Andy Wingo <wingo@pobox.com> writes:
> Hey all,
>
> I'm pretty displeased about the way that the brainfuck compilation
> works. Check this:
>
> brainfuck@(guile-user)> ,c ++]
>
> (The trailing ] is to note the end of the program. It's kindof a hack,
> but it works fine.)
Yes, seems OK. Presumably in real brainfuck end of program is
indicated by EOF?
> What I'm getting at is that I think we should have VM ops for working on
> vectors -- both generic vectors, and specific ops for bytevectors, and
> probably an op for string-ref as well, and possibly string-set!. Then a
> native code backend could be effectively implemented to operate on the
> GLIL or assembly level, relying on the Tree-IL compiler's previous
> resolution of high-level operations (i.e., vector-set!) to low-level
> instructions. I think we have the space in the VM, and if we group all
> of the vector instructions at the end, we shouldn't affect the
> instruction cache too much.
>
> So that's my plan. Thoughts?
Sounds good to me. Perhaps the generic operation could be ref or set
at a fixed offset of an object's SMOB data?
However I don't yet understand the possible downside - i.e. what you
say about space and the instruction cache.
Neil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data-crunching in guile
2009-06-24 12:03 data-crunching in guile Andy Wingo
2009-06-25 7:26 ` Neil Jerram
@ 2009-06-25 8:04 ` Ludovic Courtès
2009-06-25 21:08 ` Andy Wingo
2009-06-26 12:09 ` Andy Wingo
2 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2009-06-25 8:04 UTC (permalink / raw)
To: guile-devel
Hi,
Andy Wingo <wingo@pobox.com> writes:
> What I'm getting at is that I think we should have VM ops for working on
> vectors -- both generic vectors, and specific ops for bytevectors, and
> probably an op for string-ref as well, and possibly string-set!. Then a
> native code backend could be effectively implemented to operate on the
> GLIL or assembly level, relying on the Tree-IL compiler's previous
> resolution of high-level operations (i.e., vector-set!) to low-level
> instructions. I think we have the space in the VM, and if we group all
> of the vector instructions at the end, we shouldn't affect the
> instruction cache too much.
Why not, but...
It looks to me like a failure to optimize the general case, which we
work around by specializing the instruction set.
Maybe we could have a more generic approach to the problem. For
instance, there could be a way to annotate primitive procedures with
information useful to the compiler, such as "returns-single-value",
"return-value-unspecified", etc., which would allow the generated code
to take shortcuts. But maybe this is just unrealistic. What do you
think?
Thanks,
Ludo'.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data-crunching in guile
2009-06-25 8:04 ` Ludovic Courtès
@ 2009-06-25 21:08 ` Andy Wingo
2009-06-25 22:47 ` Neil Jerram
0 siblings, 1 reply; 8+ messages in thread
From: Andy Wingo @ 2009-06-25 21:08 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guile-devel
Howdy,
I didn't mean to push those commits I just pushed -- my girlfriend was
looking for the web browser, but was on a full-screen Emacs window with
magit, and somehow clicking around pushed some commits! Perhaps this is
Marius' way of contributing to Guile these days ;-)
But I think they might be fine. I'll look and revert if necessary.
On Thu 25 Jun 2009 10:04, ludo@gnu.org (Ludovic Courtès) writes:
> Andy Wingo <wingo@pobox.com> writes:
>
>> What I'm getting at is that I think we should have VM ops for working on
>> vectors -- both generic vectors, and specific ops for bytevectors
>
> Why not, but...
>
> It looks to me like a failure to optimize the general case, which we
> work around by specializing the instruction set.
Yeah, I wonder about that sometimes. In this particular case, I think
we're OK -- and the reason is, it's about data, not about procedures.
We have a number of primitive data structures in Guile, and we need to
provide fast access to those structures. We do favors to no one if we
force the machine to call out to functions to access our basic data
structures.
That is even more the case with bytevectors, because it's the data
structure of the hardware. Code is fast when it is inlined, when it is
linear and close to hardware operations -- and that is particularly the
case for bytevector ops.
But, regarding the overhead of a mv-call instead of a call -- I think
that overhead is mostly in space and not in time -- or, that is to say,
it is in time to the degree that its space pushes instructions out of
the data cache. (But see below, regarding "dark arts".)
So given that we really support a more functional style of
programming...
> Maybe we could have a more generic approach to the problem. For
> instance, there could be a way to annotate primitive procedures with
> information useful to the compiler, such as "returns-single-value",
> "return-value-unspecified", etc., which would allow the generated code
> to take shortcuts. But maybe this is just unrealistic. What do you
> think?
I don't know about this. It's probably not worth it. If you're calling
out to a procedure bound in `(guile)' anyway, that would do strange
things if the procedure is rebound. Depending on implementation, this
could increase the size of the subrs. It's certainly more complicated.
On the other hand, in the grander scheme, "the unspecified value" is as
stupid an idea as I've ever heard. I recognize its necessity given where
we're at now, but one would think that in principle such functions could
just return `void'...
I guess my point is: this is the right thing to do for vectors. They are
a fundamental data type, just like we have car and cdr instructions,
because such ops need to be inlined to go fast. Regarding the general
subr case... yes, I guess you're right regarding marking the number of
values a subr should return (0, 1, N, any), but as long as we're talking
about the Right Thing here, we probably are talking about a better FFI
than we have now.
That's my opinion as it stands now anyway, subject to modification of
course :)
>> I think we have the space in the VM, and if we group all
>> of the vector instructions at the end, we shouldn't affect the
>> instruction cache too much.
I don't have Neil's mail open here, but my thought was this: getting a
fast VM is a dark art of feeling and instinct, My feeling is that a VM
is fast if it fits in the CPU's cache: the instruction cache and the
data cache. The data cache means that smaller code is better, hence my
resistance to word-sized instructions. The instruction cache means that
the VM itself should be small -- but if the code for vector ops is all
"at the end" of the VM, then only code that uses vector ops pays for the
increased "cache footprint" of the VM.
But like I say, all this is instinct, which may well be wrong.
Regards,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data-crunching in guile
2009-06-25 21:08 ` Andy Wingo
@ 2009-06-25 22:47 ` Neil Jerram
2009-06-26 14:37 ` Andy Wingo
0 siblings, 1 reply; 8+ messages in thread
From: Neil Jerram @ 2009-06-25 22:47 UTC (permalink / raw)
To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel
Andy Wingo <wingo@pobox.com> writes:
> I don't have Neil's mail open here, but my thought was this: getting a
> fast VM is a dark art of feeling and instinct, My feeling is that a VM
> is fast if it fits in the CPU's cache: the instruction cache and the
> data cache. The data cache means that smaller code is better, hence my
> resistance to word-sized instructions. The instruction cache means that
> the VM itself should be small -- but if the code for vector ops is all
> "at the end" of the VM, then only code that uses vector ops pays for the
> increased "cache footprint" of the VM.
Thanks, I see now. But presumably even VM code will frequently call
out to primitives all over libguile, won't it?
I completely agree that small code size can be important for
performance, but I doubt that it is the size of the VM on its own that
matters.
(Or am I still not understanding what you mean?)
> But like I say, all this is instinct, which may well be wrong.
Me too!
Neil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data-crunching in guile
2009-06-24 12:03 data-crunching in guile Andy Wingo
2009-06-25 7:26 ` Neil Jerram
2009-06-25 8:04 ` Ludovic Courtès
@ 2009-06-26 12:09 ` Andy Wingo
2 siblings, 0 replies; 8+ messages in thread
From: Andy Wingo @ 2009-06-26 12:09 UTC (permalink / raw)
To: guile-devel
Hi,
I've gone ahead and added vector and bytevector ops to the VM. That
should provide a considerable speedup to programs that use those data
structures.
As far as brainfuck goes:
On Wed 24 Jun 2009 14:03, Andy Wingo <wingo@pobox.com> writes:
> brainfuck@(guile-user)> ,c ++]
brainfuck@(guile-user)> ,c ++]
Disassembly of #<objcode b7073dc0>:
0 (make-int8:0) ;; 0
1 (load-symbol "make-vector") ;; make-vector
16 (link-now)
17 (variable-ref)
18 (make-int16 117 48) ;; 30000
21 (make-int8:0) ;; 0
22 (call 2)
24 (local-set 1)
26 (local-set 0)
The prolog, as before
28 (local-ref 1)
30 (local-ref 0)
32 (local-ref 1)
34 (local-ref 0)
36 (vector-ref)
37 (make-int8:1) ;; 1
38 (add)
39 (vector-set)
One +
40 (local-ref 1)
42 (local-ref 0)
44 (local-ref 1)
46 (local-ref 0)
48 (vector-ref)
49 (make-int8:1) ;; 1
50 (add)
51 (vector-set)
The other +.
52 (void)
53 (return)
Here we push "the unspecified value" and return.
As far as performance goes, I took the fibonacci program from
http://www.hevanet.com/cristofd/brainfuck/fib.b:
>++++++++++>+>+[
[+++++[>++++++++<-]>.<++++++[>--------<-]+<<<]>.>>[
[-]<[>+<-]>>[<<+>+>-]<[>+<-[>+<-[>+<-[>+<-[>+<-[>+<-
[>+<-[>+<-[>+<-[>[-]>+>+<<<-[>+<-]]]]]]]]]]]+>>>
]<<<
]
This program doesn't terminate; you will have to kill it.
Daniel B Cristofani (cristofdathevanetdotcom)
http://www.hevanet.com/cristofd/brainfuck/
and let it run for about 3 seconds, wall time. The last number that it
printed was:
271651113667355860622427102019076226843364138595316927370091609703323530354645264253440817026771386351610030043817601548972412867287338615983131
I then switched back to Scheme, did a (define x that-number), then:
(define (find-fibo n a b)
(let ((c (+ a b)))
(write c) (newline)
(if (= c x)
n
(find-fibo (1+ n) b c))))
,t (find-fibo 3 1 1)
=> 688
clock utime stime cutime cstime gctime
0.15 0.01 0.00 0.00 0.00 0.00
That is to say, brainfuck computed and printed the 688 first elements of
the fibonacci sequence in about 3 seconds.
The "write" is find-fibo in there to test output speed. Without it
find-fibo executes in 0.01 s. With it, and my stupid gnome-terminal,
0.15 or 0.20 s.
I think that means the brainfuck implementation is doing pretty well, to
do bignum math at that speed, only having 1+ and 1- as its operators...
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data-crunching in guile
2009-06-25 22:47 ` Neil Jerram
@ 2009-06-26 14:37 ` Andy Wingo
2009-06-26 21:25 ` Neil Jerram
0 siblings, 1 reply; 8+ messages in thread
From: Andy Wingo @ 2009-06-26 14:37 UTC (permalink / raw)
To: Neil Jerram; +Cc: Ludovic Courtès, guile-devel
Hi Neil!
On Fri 26 Jun 2009 00:47, Neil Jerram <neil@ossau.uklinux.net> writes:
> Andy Wingo <wingo@pobox.com> writes:
>
>> I don't have Neil's mail open here, but my thought was this: getting a
>> fast VM is a dark art of feeling and instinct, My feeling is that a VM
>> is fast if it fits in the CPU's cache: the instruction cache and the
>> data cache. The data cache means that smaller code is better, hence my
>> resistance to word-sized instructions. The instruction cache means that
>> the VM itself should be small -- but if the code for vector ops is all
>> "at the end" of the VM, then only code that uses vector ops pays for the
>> increased "cache footprint" of the VM.
>
> Thanks, I see now. But presumably even VM code will frequently call
> out to primitives all over libguile, won't it?
Over time, I'd say no. I see functions written in C migrating over to be
written in Scheme, like PLT did recently with `map'. We should port
srfi-1 back to Scheme I think :) Not to mention silly things like
string-any being in boot-9...
There's no reason for Scheme to be slow. Of course, this is the "over
time" view, currently we're not there...
And yet, disassemble the functions that you use regularly, and often you
find they just use VM ops, and don't call out to primitives. That shows
that the VM is, while virtual, still quite a good machine for
computation.
> I completely agree that small code size can be important for
> performance, but I doubt that it is the size of the VM on its own that
> matters.
Depends on what percent of time is spent in the VM. On previous
profiling runs, this was 30-60% of program run time -- well worth
optimizing.
Cheers,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: data-crunching in guile
2009-06-26 14:37 ` Andy Wingo
@ 2009-06-26 21:25 ` Neil Jerram
0 siblings, 0 replies; 8+ messages in thread
From: Neil Jerram @ 2009-06-26 21:25 UTC (permalink / raw)
To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel
Andy Wingo <wingo@pobox.com> writes:
> Hi Neil!
Hi Andy!
> On Fri 26 Jun 2009 00:47, Neil Jerram <neil@ossau.uklinux.net> writes:
>
>> Thanks, I see now. But presumably even VM code will frequently call
>> out to primitives all over libguile, won't it?
>
> Over time, I'd say no. I see functions written in C migrating over to be
> written in Scheme, like PLT did recently with `map'. We should port
> srfi-1 back to Scheme I think :) Not to mention silly things like
> string-any being in boot-9...
>
> There's no reason for Scheme to be slow. Of course, this is the "over
> time" view, currently we're not there...
>
> And yet, disassemble the functions that you use regularly, and often you
> find they just use VM ops, and don't call out to primitives. That shows
> that the VM is, while virtual, still quite a good machine for
> computation.
Fair enough, I can see that now. In other words, that there will be
long enough passages of pure VM code to make the VM code size important.
Neil
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-06-26 21:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-24 12:03 data-crunching in guile Andy Wingo
2009-06-25 7:26 ` Neil Jerram
2009-06-25 8:04 ` Ludovic Courtès
2009-06-25 21:08 ` Andy Wingo
2009-06-25 22:47 ` Neil Jerram
2009-06-26 14:37 ` Andy Wingo
2009-06-26 21:25 ` Neil Jerram
2009-06-26 12:09 ` Andy Wingo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).