Compilers for Guile (related to roadmap and goals)

From: Tanel Tammet <tammet@staff.ttu.ee>
Subject: Compilers for Guile (related to roadmap and goals)
Date: Tue, 23 Apr 2002 09:25:18 +0300	[thread overview]
Message-ID: <3CC4FE4E.1B343860@staff.ttu.ee> (raw)

Hi,

Thanks a lot for the numerous replies to my question about the Guile
roadmap and priorities. I did learn a lot. 

What was important for my possible involvment:
It looks like people are still interested about compilation and various
aspects of it, so I'll attempt to be of some help there (not clear
I will actually succeed in that respect :).

I'll try to describe my understanding of the situation in respect
to Guile compilation, and give a small short-term personal roadmap 
for discussion: would it be sensible to pursue this kind of approach?

This is a pretty long email. 

NB! Please read at last the very last part of the mail
(a question about compiler developers), if you do not have time
to read it all!

First, some background experiences about using Guile:

1) Ca one month ago I downloaded the last Guile distro and one
of the earlier ones, tried to compile these on RedHat. Both
failed. 

2) The last distro compilation claimed smth about a duplicate
definition (cannot remember exactly): one of the internet-oriented
functions ins Guile .c clashed with a standard one in the Linux
(gcc?) library. I could fix it in source, though.

3) The early distro failed to compile for the same reason, PLUS
a similar problem. I could fix both in source.

4) I understand that Guile people know about this, and have known
about this for some time. Still, you cannot compile Guile out
of box on RedHat. 

5) I find this very disturbing.
IMHO, one of the main goals for Guile (which is NOT an early 
beta system) should be to guarantee out-of-box compilability
on most distros. It is clear that problems like this push 
potential users away from Guile.

6) Licencing issue: Guile is licenced under GPL PLUS EXCEPTION
which allows one to link against it without the end product
being GPL-d. IMHO it is an extremely liberal and user-friendly
licence. However, when you donwload Guile, look into the manual,
look into COPYING, it is nowhere to be seen that there is
such an exception (which, for practical purposes, completely
removes the standard GPL obstacles). The only place the
exception is stated, is in the .c and .h file headers.
Users won't find the exception! They will be under the false
impression that any link against Guile is GPL-d. IMHO
it would be important to clearly state the exception in
the manual and in the top-level directory.

The first and obvious thing for me would be taking up 
the existing Guile port of Hobbit, and:

0) find if other people are working on compilers for Guile.
1) check if it is still working together with Guile
2) if not, try to fix it
3) bring in the recent improvements of Hobbit 
   into the Guile Hobbit
4) then ask for new guidelines ..

Some comments and questions regarding compilation follow.

- Is speed necessary? Sometimes people claim it is not.

  Obviously, many Guile developers do not think
  speed is important (otherwise Guile would be at least
  as fast as SCM and would have a compiler).

  Hence I attempt to convince you that speed IS
  highly important:

0) Just imagine using a 10 Mhz machine instead of your
   1000 Mhz machine. Would you be happy? Using a slow
   language is EXACTLY that. A 10 Mhz machine is perfectly
   usable. Some people would claim you need no more speed
   than that :)
1) Many scripting languages like Python, Perl, PHP are
   not compiled and are extremely slow. Often the 
   language itself is such that is is very hard to compile.
2) Because of this, scalable web apps are often built in
   Java or C: in a serious setting (many users, heavy loads,
   nontrivial programs) it DOES matter if your program becomes
   100 times faster (or slower).
3) Observe that with scheme as a scipting language we COULD
   give speeds near C or Java: this would be a major
   advantage over Python, Perl etc. We do not have other
   big advantages, hence why not take seriously this major
   advantage we are in position to have?
4) Sometimes people claim that the scripting language need
   not be fast. It is OTHER people who have to write fast code
   (databases, web servers, etc etc) and we only GLUE fast
   code together with our slow glue. 
   This is sometimes true, but it effectively says that we
   cannot write nontrivial apps in our language: it is stuck
   in the glue role: do not try to write a server, a specific
   database, an XML handler, a complex app in our language:
   use other languages for serious work.
5) I do not think we should tell users that our language 
   is meant only as a glue for quick hacks, not serious
   projects! We would lose in comparison to Python, Perl etc.

- Basically, there are two major ways to compile scheme:

* byte codes
* direct machine code

One of the Guile roadmap docs states:
-----------
New developments have made it possible to create very fast byte code
interpreters.  We are willing to replace the current Guile evaluator
when a suitable candidate byte code interpreters turns up.
------------

I am an avid supporter of direct machine code, not byte code,
for scheme. Why so:

1) Byte codes historically appeared as a way to remove the
   syntax parsing from the interpretation process. For example,
   you do not want to parse Pascal source code strings during
   interpretation: it is a way too big overhead. Instead, you can
   analyse syntax beforehand and then run the interpreter on
   the already syntax-analysed source code. This is byte codes
   (but not the full story, of course).
2) In the scheme context byte codes play a smaller role than
   in Pascal, Python, etc: when a scheme source is read in
   by the interpreter, the memory image is NOT strings any
   more. The reading mechanism does it for us, fairly efficiently.
   In some respect, what the Guile interpreter and scm do
   is already very similar to byte codes (add to this the
   memoization feature of scm!). 
3) Byte code interpretation can be improved tremendously by
   having a JIT compiler doing real compilation in runtime.
   This is what Java interpreters do. Initial releases of Java
   did not have JIT and were tremendously slow. Now we have
   very good JIT compilers and Java is fast (IMHO this
   is what the cited paragraph above is about).
4) However, JIT-s are very complex beasts. Look at how long
   it has taken for Sun and IBM to produce good Jit compilers.
   It is not very likely that the Guile community will manage
   to create a good JIT machine in a realistic time frame, if
   ever, even if we try.
5) If we cannot create a good JIT, then byte codes will be
   much slower than native code. They won't give a significant
   speed advantage over, say scm speed (except in some specific
   cases, which are IMHO not critical in practice, 
   see later sections of the mail).
6) If that is the case in a realistic scenario, then the only
   ways to get really good speed are:
   (a) using Java byte codes and Java VM (there IS a JIT already)
   (b) using native compilation.
7) Using Java byte codes means that we'd have to
   (a) include Java VM in a distro or request for the
   installation from  the user. Observe that Java VM
   is huge and has a very long start up time. 
   That is why server-side Java is
   done almost exclusively with Java servlet engines
   (tomcat, jboss, websphere etc)
   (b) completely rewrite the underlying stuff of Guile
   to provide integration with Java byte codes.
8) To summarize: Java bytecodes will create several problems and
   are very hard to implement.   
9) As with JIT compilers, doing a full native compiler is
   a major work, not realistic in the Guile scope. Doing
   scheme-to-c compilation is much easier.
10) Hence we will need a Guile-to-c compiler, be it Hobbit
    or something different.

- Which scheme-to-c compiler to take, how to proceed with it.

1) For me personally it would be easy to base the work on
   Hobbit: I know it, it works nicely with SCM, is now
   included in the standard SCM distro, etc.
   It has been ported to Guile once (although I gather
   from the discussion and other information, that
   this port is no longer supported for Guile).

2) There are many scheme-to-c compilers. There is a 
   potential option to take some other scheme-to-c
   compiler instead of Hobbit and base the Guile
   compiler on this (I have no problems with that:
   hobbit is still in SCM distro :).
   IMHO the crucial aspect here is whether anybody
   is willing to (a) make the port (b) support the port.
   All the other aspects are less relevant than finding
   a competent developer willing to work with some 
   particular compiler.

- about Hobbit:

The roadmap/FAQ docs state:
----------
Hobbit doesn't support all of the Guile language, produces inefficient
code, and is a very unstructured program which can't be developed
further.

It iss very important that the compiler and interpreter agree as much
as possible on the language they're implementing.  Users should be
able to write code, run it in the interpreter, and then just switch
over to the compiler and have everything work just as it did before.

To make this possible, the compiler and interpreter should share as
much code as possible.  For example, the module system should be
designed to support both.  They should use the same parser and the
same macro expander.
-----------

I'd differ on some respects, but not all:

1) Hobbit not supporting all of the Guile language.

   This is correct. Here is how we could approach it:

   - It would probably be trivial to make Hobbit understand
     guile primitives. For example, Aubrey hacked Hobbit
     to support the last versions of SCM. I can do that too :)   

   - It becomes harder when we think about hygienic macros, 
     force-delay,  eval and other such stuff which is inherently 
     done during intepretation. 

     In many (but not all) such cases Hobbit reverts to calling
     interpreter from compiled code, which is a last resort
     approach.

     Many interpret-time special operations cannot be compiled
     efficiently, for inherent reasons. Think about eval.

     The more the language relies upon these, the worse are
     the perspectives for efficient compilation. IMHO it would
     be important for Guile to make a clear dictinction where
     such mechanisms are called and where not, and how to
     avoid them if you want to. It is pretty clear for SCM,
     I haven't looked into Guile deeply enough yet to understand
     if and how they differ in that respect.

   - Even if we sometimes use special interpretation mechanisms, 
     etc etc, we do not do it all the time, and normally we
     do not (should not) need to. 

     Hence it is possible to compile PARTS of your code and
     organise the whole system so that speed-critical parts
     get compiled.

   - Think about adding hand-crafted C functions to Guile.
     This is, after all, one of the main motivations behind
     Guile: that it should be easy to add such functions
     (easier than in SCM :)

     When you can do that, then of course you can code most
     of your program in scheme instead, and use a scheme-to-c
     compiler to produce the C code you want to integrate with
     Guile. This may be a tremendous help, even if you cannot
     compile full Guile (you would not hand-code force, delay,
     hygienic macros etc in c anyway :)

2) Hobbit produces inefficient code.

   This is simply not true. The scm Hobbit produces code
   which runs 10 to 100 times faster than interpreted
   scm (which is already faster than Guile). The speed
   approaches C speed.

   For example, I am using Hobbit and scm for my
   Gandalf theorem prover (one of the leading theorem
   provers in the world). Speed is absolutely critical.
   I have not seen a reason to start hand-coding in C.

   People have experimented with Bigloo vs Hobbit for
   Gandalf. Bigloo-compiled code has about the same
   speed, but is normally ca 20% slower than Hobbit-compiled
   code. Of course, Gandalf has been written in a way
   as to facilitate good scheme->c conversion.

   There are three kinds of code where Hobbit loses in a major
   way to Stalin etc:

   - code using call/cc in speed-critical parts. The slowness
     of this stems from the way scm (and guile) are interpreted
     (using stack), NOT hobbit. You can make such code compile
     nicely only with a completely different (non-stack) approach,
     and many byte-code compilers do this (hence being fast
     on call/cc).

   - code using complex higher-order functions in speed-critical
     parts. Hobbit loses here because it is optimised for
     code which is similar to C in some respects. It could
     be improved somewhat, though, but it is hard to do.
     "Ordinary" higher-order functions, like some, every,
     map, filters, sorters, etc are compiled to fast code, though.   

   - code relying on fast (proper) implementation of tailrecursion
     for mutually tail recursive functions. Hobbit (and Bigloo, for
     example) only recognise tail recursion inside a function, not
     mutual tail tail recursion between different functions.

3) Hobbit is a very unstructured program which 
   can't be developed further.

   I do not think so. It is unstructured in the sense that
   everything is in one big file (this is done on purpose).
   It would be trivial to split it into several files, should
   it help.

   Howbbit is quite nicely commented, uses long variable
   and function names, etc etc.

   A lot of people have successfully hacked Hobbit and 
   produced improvements in several areas, without
   asking me at all. So it can be done.

   Sure Hobbit is hard to develop, because a compiler is
   inherently a complex program, doing by necessity some
   fancy things, not easy to understand if you haven't
   worked with compilers before. This holds for any
   non-toy compiler. 

4) It is very important that the compiler and interpreter
   agree as much as possible on the language they're implementing.

   This is exactly how Hobbit works together with SCM: it
   calls SCM c functions. It allows interpretation calls from
   compiled code. It allows adding compiled functions to the
   interpreter. It is as tight an integration as you could get.

And now, the final question after the long story:
is anybody from the Guile community right now hacking
Hobbit or some other scheme-to-c compiler for Guile?
Has anybody done that last year? If yes, please take
contact so that we could perhaps coordinate future work.

Regards,
         Tanel Tammet

_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel