From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Tanel Tammet Newsgroups: gmane.lisp.guile.devel Subject: Compilers for Guile (related to roadmap and goals) Date: Tue, 23 Apr 2002 09:25:18 +0300 Sender: guile-devel-admin@gnu.org Message-ID: <3CC4FE4E.1B343860@staff.ttu.ee> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: main.gmane.org 1019543611 30540 127.0.0.1 (23 Apr 2002 06:33:31 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Tue, 23 Apr 2002 06:33:31 +0000 (UTC) Return-path: Original-Received: from fencepost.gnu.org ([199.232.76.164]) by main.gmane.org with esmtp (Exim 3.33 #1 (Debian)) id 16ztrm-0007wT-00 for ; Tue, 23 Apr 2002 08:33:31 +0200 Original-Received: from localhost ([127.0.0.1] helo=fencepost.gnu.org) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 16ztrO-0007H2-00; Tue, 23 Apr 2002 02:33:06 -0400 Original-Received: from staff.ttu.ee ([193.40.254.217]) by fencepost.gnu.org with smtp (Exim 3.34 #1 (Debian)) id 16ztp2-0005hy-00 for ; Tue, 23 Apr 2002 02:30:40 -0400 Original-Received: (qmail 8138 invoked from network); 23 Apr 2002 06:30:38 -0000 Original-Received: from tammet.cc.ttu.ee (HELO staff.ttu.ee) (193.40.254.59) by staff.ttu.ee with SMTP; 23 Apr 2002 06:30:38 -0000 X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: et,en,cs Original-To: guile-devel@gnu.org Errors-To: guile-devel-admin@gnu.org X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.0.9 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Developers list for Guile, the GNU extensibility library List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.lisp.guile.devel:467 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:467 Hi, Thanks a lot for the numerous replies to my question about the Guile roadmap and priorities. I did learn a lot. What was important for my possible involvment: It looks like people are still interested about compilation and various aspects of it, so I'll attempt to be of some help there (not clear I will actually succeed in that respect :). I'll try to describe my understanding of the situation in respect to Guile compilation, and give a small short-term personal roadmap for discussion: would it be sensible to pursue this kind of approach? This is a pretty long email. NB! Please read at last the very last part of the mail (a question about compiler developers), if you do not have time to read it all! First, some background experiences about using Guile: 1) Ca one month ago I downloaded the last Guile distro and one of the earlier ones, tried to compile these on RedHat. Both failed. 2) The last distro compilation claimed smth about a duplicate definition (cannot remember exactly): one of the internet-oriented functions ins Guile .c clashed with a standard one in the Linux (gcc?) library. I could fix it in source, though. 3) The early distro failed to compile for the same reason, PLUS a similar problem. I could fix both in source. 4) I understand that Guile people know about this, and have known about this for some time. Still, you cannot compile Guile out of box on RedHat. 5) I find this very disturbing. IMHO, one of the main goals for Guile (which is NOT an early beta system) should be to guarantee out-of-box compilability on most distros. It is clear that problems like this push potential users away from Guile. 6) Licencing issue: Guile is licenced under GPL PLUS EXCEPTION which allows one to link against it without the end product being GPL-d. IMHO it is an extremely liberal and user-friendly licence. However, when you donwload Guile, look into the manual, look into COPYING, it is nowhere to be seen that there is such an exception (which, for practical purposes, completely removes the standard GPL obstacles). The only place the exception is stated, is in the .c and .h file headers. Users won't find the exception! They will be under the false impression that any link against Guile is GPL-d. IMHO it would be important to clearly state the exception in the manual and in the top-level directory. The first and obvious thing for me would be taking up the existing Guile port of Hobbit, and: 0) find if other people are working on compilers for Guile. 1) check if it is still working together with Guile 2) if not, try to fix it 3) bring in the recent improvements of Hobbit into the Guile Hobbit 4) then ask for new guidelines .. Some comments and questions regarding compilation follow. - Is speed necessary? Sometimes people claim it is not. Obviously, many Guile developers do not think speed is important (otherwise Guile would be at least as fast as SCM and would have a compiler). Hence I attempt to convince you that speed IS highly important: 0) Just imagine using a 10 Mhz machine instead of your 1000 Mhz machine. Would you be happy? Using a slow language is EXACTLY that. A 10 Mhz machine is perfectly usable. Some people would claim you need no more speed than that :) 1) Many scripting languages like Python, Perl, PHP are not compiled and are extremely slow. Often the language itself is such that is is very hard to compile. 2) Because of this, scalable web apps are often built in Java or C: in a serious setting (many users, heavy loads, nontrivial programs) it DOES matter if your program becomes 100 times faster (or slower). 3) Observe that with scheme as a scipting language we COULD give speeds near C or Java: this would be a major advantage over Python, Perl etc. We do not have other big advantages, hence why not take seriously this major advantage we are in position to have? 4) Sometimes people claim that the scripting language need not be fast. It is OTHER people who have to write fast code (databases, web servers, etc etc) and we only GLUE fast code together with our slow glue. This is sometimes true, but it effectively says that we cannot write nontrivial apps in our language: it is stuck in the glue role: do not try to write a server, a specific database, an XML handler, a complex app in our language: use other languages for serious work. 5) I do not think we should tell users that our language is meant only as a glue for quick hacks, not serious projects! We would lose in comparison to Python, Perl etc. - Basically, there are two major ways to compile scheme: * byte codes * direct machine code One of the Guile roadmap docs states: ----------- New developments have made it possible to create very fast byte code interpreters. We are willing to replace the current Guile evaluator when a suitable candidate byte code interpreters turns up. ------------ I am an avid supporter of direct machine code, not byte code, for scheme. Why so: 1) Byte codes historically appeared as a way to remove the syntax parsing from the interpretation process. For example, you do not want to parse Pascal source code strings during interpretation: it is a way too big overhead. Instead, you can analyse syntax beforehand and then run the interpreter on the already syntax-analysed source code. This is byte codes (but not the full story, of course). 2) In the scheme context byte codes play a smaller role than in Pascal, Python, etc: when a scheme source is read in by the interpreter, the memory image is NOT strings any more. The reading mechanism does it for us, fairly efficiently. In some respect, what the Guile interpreter and scm do is already very similar to byte codes (add to this the memoization feature of scm!). 3) Byte code interpretation can be improved tremendously by having a JIT compiler doing real compilation in runtime. This is what Java interpreters do. Initial releases of Java did not have JIT and were tremendously slow. Now we have very good JIT compilers and Java is fast (IMHO this is what the cited paragraph above is about). 4) However, JIT-s are very complex beasts. Look at how long it has taken for Sun and IBM to produce good Jit compilers. It is not very likely that the Guile community will manage to create a good JIT machine in a realistic time frame, if ever, even if we try. 5) If we cannot create a good JIT, then byte codes will be much slower than native code. They won't give a significant speed advantage over, say scm speed (except in some specific cases, which are IMHO not critical in practice, see later sections of the mail). 6) If that is the case in a realistic scenario, then the only ways to get really good speed are: (a) using Java byte codes and Java VM (there IS a JIT already) (b) using native compilation. 7) Using Java byte codes means that we'd have to (a) include Java VM in a distro or request for the installation from the user. Observe that Java VM is huge and has a very long start up time. That is why server-side Java is done almost exclusively with Java servlet engines (tomcat, jboss, websphere etc) (b) completely rewrite the underlying stuff of Guile to provide integration with Java byte codes. 8) To summarize: Java bytecodes will create several problems and are very hard to implement. 9) As with JIT compilers, doing a full native compiler is a major work, not realistic in the Guile scope. Doing scheme-to-c compilation is much easier. 10) Hence we will need a Guile-to-c compiler, be it Hobbit or something different. - Which scheme-to-c compiler to take, how to proceed with it. 1) For me personally it would be easy to base the work on Hobbit: I know it, it works nicely with SCM, is now included in the standard SCM distro, etc. It has been ported to Guile once (although I gather from the discussion and other information, that this port is no longer supported for Guile). 2) There are many scheme-to-c compilers. There is a potential option to take some other scheme-to-c compiler instead of Hobbit and base the Guile compiler on this (I have no problems with that: hobbit is still in SCM distro :). IMHO the crucial aspect here is whether anybody is willing to (a) make the port (b) support the port. All the other aspects are less relevant than finding a competent developer willing to work with some particular compiler. - about Hobbit: The roadmap/FAQ docs state: ---------- Hobbit doesn't support all of the Guile language, produces inefficient code, and is a very unstructured program which can't be developed further. It iss very important that the compiler and interpreter agree as much as possible on the language they're implementing. Users should be able to write code, run it in the interpreter, and then just switch over to the compiler and have everything work just as it did before. To make this possible, the compiler and interpreter should share as much code as possible. For example, the module system should be designed to support both. They should use the same parser and the same macro expander. ----------- I'd differ on some respects, but not all: 1) Hobbit not supporting all of the Guile language. This is correct. Here is how we could approach it: - It would probably be trivial to make Hobbit understand guile primitives. For example, Aubrey hacked Hobbit to support the last versions of SCM. I can do that too :) - It becomes harder when we think about hygienic macros, force-delay, eval and other such stuff which is inherently done during intepretation. In many (but not all) such cases Hobbit reverts to calling interpreter from compiled code, which is a last resort approach. Many interpret-time special operations cannot be compiled efficiently, for inherent reasons. Think about eval. The more the language relies upon these, the worse are the perspectives for efficient compilation. IMHO it would be important for Guile to make a clear dictinction where such mechanisms are called and where not, and how to avoid them if you want to. It is pretty clear for SCM, I haven't looked into Guile deeply enough yet to understand if and how they differ in that respect. - Even if we sometimes use special interpretation mechanisms, etc etc, we do not do it all the time, and normally we do not (should not) need to. Hence it is possible to compile PARTS of your code and organise the whole system so that speed-critical parts get compiled. - Think about adding hand-crafted C functions to Guile. This is, after all, one of the main motivations behind Guile: that it should be easy to add such functions (easier than in SCM :) When you can do that, then of course you can code most of your program in scheme instead, and use a scheme-to-c compiler to produce the C code you want to integrate with Guile. This may be a tremendous help, even if you cannot compile full Guile (you would not hand-code force, delay, hygienic macros etc in c anyway :) 2) Hobbit produces inefficient code. This is simply not true. The scm Hobbit produces code which runs 10 to 100 times faster than interpreted scm (which is already faster than Guile). The speed approaches C speed. For example, I am using Hobbit and scm for my Gandalf theorem prover (one of the leading theorem provers in the world). Speed is absolutely critical. I have not seen a reason to start hand-coding in C. People have experimented with Bigloo vs Hobbit for Gandalf. Bigloo-compiled code has about the same speed, but is normally ca 20% slower than Hobbit-compiled code. Of course, Gandalf has been written in a way as to facilitate good scheme->c conversion. There are three kinds of code where Hobbit loses in a major way to Stalin etc: - code using call/cc in speed-critical parts. The slowness of this stems from the way scm (and guile) are interpreted (using stack), NOT hobbit. You can make such code compile nicely only with a completely different (non-stack) approach, and many byte-code compilers do this (hence being fast on call/cc). - code using complex higher-order functions in speed-critical parts. Hobbit loses here because it is optimised for code which is similar to C in some respects. It could be improved somewhat, though, but it is hard to do. "Ordinary" higher-order functions, like some, every, map, filters, sorters, etc are compiled to fast code, though. - code relying on fast (proper) implementation of tailrecursion for mutually tail recursive functions. Hobbit (and Bigloo, for example) only recognise tail recursion inside a function, not mutual tail tail recursion between different functions. 3) Hobbit is a very unstructured program which can't be developed further. I do not think so. It is unstructured in the sense that everything is in one big file (this is done on purpose). It would be trivial to split it into several files, should it help. Howbbit is quite nicely commented, uses long variable and function names, etc etc. A lot of people have successfully hacked Hobbit and produced improvements in several areas, without asking me at all. So it can be done. Sure Hobbit is hard to develop, because a compiler is inherently a complex program, doing by necessity some fancy things, not easy to understand if you haven't worked with compilers before. This holds for any non-toy compiler. 4) It is very important that the compiler and interpreter agree as much as possible on the language they're implementing. This is exactly how Hobbit works together with SCM: it calls SCM c functions. It allows interpretation calls from compiled code. It allows adding compiled functions to the interpreter. It is as tight an integration as you could get. And now, the final question after the long story: is anybody from the Guile community right now hacking Hobbit or some other scheme-to-c compiler for Guile? Has anybody done that last year? If yes, please take contact so that we could perhaps coordinate future work. Regards, Tanel Tammet _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel