From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: vm ops and threading discussion for guile-log Date: Tue, 24 Apr 2012 22:09:09 +0200 Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=14dae93411cd0987b704be72545b X-Trace: dough.gmane.org 1335298167 6883 80.91.229.3 (24 Apr 2012 20:09:27 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 24 Apr 2012 20:09:27 +0000 (UTC) To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Apr 24 22:09:25 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SMm39-0004KM-Tu for guile-devel@m.gmane.org; Tue, 24 Apr 2012 22:09:24 +0200 Original-Received: from localhost ([::1]:55666 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SMm39-0002AG-70 for guile-devel@m.gmane.org; Tue, 24 Apr 2012 16:09:23 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:49138) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SMm33-00021K-Lm for guile-devel@gnu.org; Tue, 24 Apr 2012 16:09:19 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SMm31-0007sk-2y for guile-devel@gnu.org; Tue, 24 Apr 2012 16:09:17 -0400 Original-Received: from mail-iy0-f169.google.com ([209.85.210.169]:49875) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SMm30-0007sC-Pu for guile-devel@gnu.org; Tue, 24 Apr 2012 16:09:15 -0400 Original-Received: by iajr24 with SMTP id r24so1858256iaj.0 for ; Tue, 24 Apr 2012 13:09:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=DjTauQ6ZyYFKpacLoBibbyAd2Kqb3vGSO/bGH4NToEI=; b=OSdQkvuSz4iMmjsGXbvQRR/yiv8eOjS16u5A908AJMn1od1h9G3ENrYnJLEKpC8KQc IkDvAFstbOFq3SWWC+th/PsZiW7P6KnnBUygxEokzsTlX1bwiECzVkIqVm/6+2LpGO+I hg0CHzXvB2U86zTgQeqm7nVEKpu2rKAbA2hM9zoX43Zw+3tot/yS4Sqxmmr+uxvYVV2x 437B2g909V667fza4sYLSVcBE8wmtCZOg5jFBtU0QQ0TOG/Nte7JkGJb1SA4ncetq5xY tF1VRN/P8J+UoNOfaE8pqUId4+5mhGuJOHuArKxaTJI/fGWJSlri6HBIdMmVLsW6Q+RF 1i1g== Original-Received: by 10.50.219.194 with SMTP id pq2mr11926890igc.18.1335298152302; Tue, 24 Apr 2012 13:09:12 -0700 (PDT) Original-Received: by 10.50.242.102 with HTTP; Tue, 24 Apr 2012 13:09:09 -0700 (PDT) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.210.169 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14307 Archived-At: --14dae93411cd0987b704be72545b Content-Type: text/plain; charset=ISO-8859-1 Hi, i hope that I do not bore you with continuing talking about guile-log. But one of the things brought up by Mark, and rightly so, are that having the fundaments of logic programming based on a stack compared to assq lists or similar more scalable functional data structures makes it complicated to take advantage of multiple cores and threads. To answer this I have now enabled the possibility to use multiple stacks by blending the two techniques together. The reason to take such a complicated approach are threefold. 1. Stack based techniques are faster and quite much so if we can take advantage of special vm operation or are able to compile the code. I find that assq based one will saturate and we would for the uniprocessor case get a difference around 10 in speed. 2. Stack based approaches can take advantage of the fact that it is a stack and implement variables that behaves like global variables but also like stack variables and can be reconstruted from a stored state. Now these data-structures look nonfunctional but they have similarities to variables living on the stack e.g. in order for reentrance to work they need to reference functional datastructures. Anyway I find that with them one can formulate many problem in a natural way which a functional approach cannot mach hence make sure that we have an expressive framework that preserves a functional behavior e.g. running stored code that does not reference global variables always return the same and also we are able to put all code in tail position and enable user space code to yield a return value simply by returning a value in the user space function. 3. We are able to trace rebuilding unwinding, this is nice with respect to debugging but also is a algorithmic enabler because you can design functionality taking advantage of this fact. Well there are some drawbacks. 1. Things are a little more complicated. 2. There might be some overhead in some situation, althogh selecting the right mix of a stack based approach and assq list based one will lead to good performance in most cases. ----------------------------------------------------------------------------------------------------------------------------------------- How the simple multithreading works: Each thread has a stack of their own and each created variable has an id indicating to which stack they belong. Now if a thread tries to set a variable from another thread then it will do this by consing on an assq list in stead of modifying the other stack and simply mimic kanrens strategy. ------------------------------------------------------------------------------------------------ VM OPS. -------------- Here one can work on different levels. But I choose to look into just reducing the overhead for function call's out into C land. You might write a direct op code that calls that function, but this means a lot of VM op's. And I therefore settled in a smaller set of op-codes that used function pointers stored in a array reachable form the VM. And then used op-codes like (fast-call-0 index) (fast-call-1 index a1) (fast-call-2 index a2) Then when we load the c functions like gp-car form the shared library I add a stub like, (fast-call-set! gp-car 2 5) (define-inlinable (gp-car x s) (fast-call-2 5 x s)) The first is a vm-op that transferes gp-car function pointer to the function pointer array at arguments 2, index 5. and then I overwrite the gp-car! with a suitable inline funciton. With this techniqe I can increase the speed of the stack based logic code by a factor 2-3. The assq based one is only marginally faster. I think I could win a factor of 2 in the number of vm ops if I could design into this framework the possibility to return several variables. e.g. Consider destructuring a cons cell in a car and a cdr. Currently this is done like, (let ((s (gp-pair!? x s))) (if s (let ((x-car (gp-car x s)) (x-cdr (gp-cdr x s))) ... I would like to be able to write (let-values (((s x-car x-cdr) (gp-pair-fast!? x s))) (if s ...)) And the op-codes would align quite nicely and reduce the op count. There is a further improvements that could be made but then everything is special and special vm op-codes would be needed and that bloat that should be handled by implementing A JIT engine or naitive compilation in stead. My conclusion is that it's possible to implement a tool in guile that would make it quite easy to improve the speed of vm based guile by a factor of 2-5 by using C based elements without the need to be a VM wizard and it would be nice to have something along the lines discussed above but with better integration into guile. I did also redesign the code so currently I'm at 80ms (for stack based guile-log) for the einstein case that basically tests the speed fo destructuring a pair in it's elements. Cheers Stefan --14dae93411cd0987b704be72545b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

i hope that I do not bore you with continuing talking about guil= e-log.

But one of the things brought up by Mark, and rightly so, are= that having the fundaments
of logic programming based on a stack compar= ed to assq lists or similar more
scalable functional data structures makes it complicated to take advantage = of multiple
cores and threads.

To answer this I have now enabled = the possibility to use multiple stacks by blending the two
techniques t= ogether. The reason to take such a complicated approach are threefold.

1. Stack based techniques are faster and quite much so if we can take a= dvantage of special
vm operation or are able to compile the code. I find= that assq based one will saturate and we
would for the uniprocessor ca= se get a difference around 10 in speed.

2. Stack based approaches can take advantage of the fact that it is a s= tack and implement
variables that behaves like global variables but also= like stack variables and can be reconstruted
from a stored state. Now t= hese data-structures look nonfunctional but they have similarities
to variables living on the stack e.g. in order for reentrance to work they = need to reference functional
datastructures. Anyway I find that with th= em one can formulate many problem in a natural way
which a functional ap= proach cannot mach hence make sure that we have an expressive framework
that preserves a functional behavior e.g. running stored code that does not= reference global variables always return
the same and also we are able= to put all code in tail position and enable user space code to yield a ret= urn
value simply by returning a value in the user space function.

3. We = are able to trace rebuilding unwinding, this is nice with respect to debugg= ing but also is a algorithmic
enabler because you can design functionali= ty taking advantage of this fact.

Well there are some drawbacks.
1. Things are a little more complicat= ed.
2. There might be some overhead in some situation, althogh selecting= the right mix of
a stack based approach and assq list based one will le= ad to good performance in most cases.
---------------------------------------------------------------------------= --------------------------------------------------------------

How t= he simple multithreading works:
Each thread has a stack of their own and= each created variable has an id indicating to which stack they belong.
Now if a thread tries to set a variable from another thread then it will do= this by consing on an assq list in stead
of modifying the other stack a= nd simply mimic kanrens strategy.
--------------------------------------= ----------------------------------------------------------

VM OPS.
--------------
Here one can work on different levels. But= I choose to look into just reducing the overhead for function call'sout into C land. You might write a direct op code that calls that functio= n, but this means a lot of VM op's.
And I therefore settled in a smaller set of op-codes that used function poi= nters stored in a array reachable
form the VM. And then used op-codes li= ke

(fast-call-0 index)
(fast-call-1 index a1)
(fast-call-2 ind= ex a2)

Then when we load the c functions like gp-car form the shared library I= add a stub like,

(fast-call-set! gp-car 2 5)
(define-inlinable (= gp-car x s)
=A0 (fast-call-2 5 x s))

The first is a vm-op that tr= ansferes gp-car function pointer to the function pointer array at arguments= 2,=A0 index 5.
and then I overwrite the gp-car! with a suitable inline funciton.

Wi= th this techniqe I can increase the speed of the stack based logic code by = a factor 2-3. The assq based one is only marginally
faster.

I thi= nk I could win a factor of 2 in the number of vm ops if I could design into= this framework the possibility to return several
variables. e.g.

Consider destructuring a cons cell in a car and a cd= r. Currently this is done like,

(let ((s (gp-pair!? x s)))
=A0 (i= f s
=A0=A0=A0 (let ((x-car=A0 (gp-car x s))
=A0=A0=A0=A0=A0=A0=A0=A0= =A0 (x-cdr=A0 (gp-cdr x s)))
=A0=A0=A0=A0=A0=A0 ...

I would like to be able to write

(let-= values (((s x-car x-cdr) (gp-pair-fast!? x s)))
=A0=A0 (if s ...))
And the op-codes would align quite nicely and reduce the op count.
There is a further improvements that could be made but then everything is<= br> special and special vm op-codes would be needed and that bloat that should = be handled
by implementing A JIT engine or naitive compilation in stead.=

My conclusion is that it's possible to implement a tool in guil= e that would make it quite
easy to improve the speed of vm based guile by a factor of 2-5 by using C b= ased elements
without the need to be a VM wizard and it would be nice to= have something along the lines
discussed above but with better integra= tion into guile.

I did also redesign the code so currently I'm at 80ms (for stack ba= sed guile-log) for the einstein case that basically tests
the speed fo d= estructuring a pair in it's elements.

Cheers
Stefan






--14dae93411cd0987b704be72545b--