From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Martin Grabmueller Newsgroups: gmane.lisp.guile.devel Subject: Re: fyi, a reading of guile source Date: Thu, 16 May 2002 09:49:48 +0200 Sender: guile-devel-admin@gnu.org Message-ID: References: <02051521435901.26010@locke.free-expression.org> NNTP-Posting-Host: localhost.gmane.org X-Trace: main.gmane.org 1021538857 28847 127.0.0.1 (16 May 2002 08:47:37 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Thu, 16 May 2002 08:47:37 +0000 (UTC) Cc: guile-devel@gnu.org, guile-user@gnu.org Return-path: Original-Received: from fencepost.gnu.org ([199.232.76.164]) by main.gmane.org with esmtp (Exim 3.33 #1 (Debian)) id 178GvA-0007VA-00 for ; Thu, 16 May 2002 10:47:36 +0200 Original-Received: from localhost ([127.0.0.1] helo=fencepost.gnu.org) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 178GPz-0002ku-00; Thu, 16 May 2002 04:15:23 -0400 Original-Received: from mgrabmue.home.cs.tu-berlin.de ([130.149.148.77]) by fencepost.gnu.org with smtp (Exim 3.34 #1 (Debian)) id 178GMh-0002WS-00; Thu, 16 May 2002 04:12:00 -0400 Original-Received: from mgrabmue by mgrabmue.home.cs.tu-berlin.de with local (Exim 3.12 #1 (Debian)) id 178G1E-0000E2-00; Thu, 16 May 2002 09:49:48 +0200 Original-To: owinebar@free-expression.org In-Reply-To: <02051521435901.26010@locke.free-expression.org> (message from Lynn Winebarger on Wed, 15 May 2002 21:43:59 -0500) Errors-To: guile-devel-admin@gnu.org X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.0.9 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Developers list for Guile, the GNU extensibility library List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.lisp.guile.devel:619 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:619 > From: Lynn Winebarger > Date: Wed, 15 May 2002 21:43:59 -0500 > > On Wednesday 15 May 2002 19:01, Thien-Thi Nguyen wrote: > > this has been checked into $workbook/journal/owinebar.notes, btw. > > i learned a lot reading it. > > > Thanks. With no response, I had thought everyone else thought > it was too basic and obvious to mention - or that prospective guile > modifiers _should_ have to explore these things themselves. Or > both. For those interested in this: some time ago (when I was trying to understand this code myself), I wrote down some notes on the data representation in Guile. I have appended them, but can't guarantee whether they are still correct, but I guarantee that they are incomplete. 'martin ===File ~/data-layout.text================================== -*-outline-*- * Guile's internal representation ** Introduction The following code snippet was used for producing the internal representation dumps in the following sections. (use-modules (srfi srfi-13)) (define (bin o) (string-pad (number->string (object-address o) 2) 32 #\0)) ** General All Scheme values are represented by a 32-bit word. This word encodes the data type (at least the group of types) a value belongs to. Additionally, this word may contain information about the value, either directly in the data word, or in another memory region. In the letter case, the 32-bit word includes a pointer to that region. So for determining an object's type, it is first necessary to check the contents of this 32-bit word, and then the memory location pointed to, if it contains a pointer. This section focuses on this 32-bit word every value must have. The following sections will describe how to interpret the referenced memory regions. *** A word on Scheme objects. It is normally not allowed in a variable of type SCM to have a 1 in bit #0. There are exceptions for values in the CAR of cells, but more on that later. So if examining a value of type SCM, you can check if the lowest bit is 1. If it is, you don't have a valid Scheme value. **** Most objects 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' **** glocs and structs Values like this are only valid in the CAR of heap cells. 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |1| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' *** Scheme objects Immediates can be determined by checking the bits #1 and #2. If one of them is 1, it is an immediate, otherwise it is a heap pointer. **** Immediates integers (fixnums) Immediate integers are 30-bit two's-complement integers (aka fixnums), encoded as immediates for performance reasons. Access to them is very fast, and creating them does not involve allocation of heap storage. 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |1|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `---------------------------------------------------------' fixnum value in two's-complement **** Immediates (no heap memory associated) Characters, immediate codes etc. are represented as immediates with 100 in bit 2-0. 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |1|0|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------' type-dependent information **** Heap pointers Pointers into the Scheme heap can be determined by checking whether the three lowest bits are zero. The complete 32-bit value can then be dereferenced, and the two (or four for double cells) words pointed to can be examined to get more detailed type information. Normally, it is the first word of the heap region which contains the additional type information. The complete word is a valid pointer into heap storage because heap cells are aligned on 8-byte boundaries. 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|0|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' pointer into the cell heap ** Immediates *** Fixnums Fixnums are integers limited to the range -2^{29} to 2^29-1. Larger integers are represented as `Bignums', described below. Fixnums are also known as INUMs (immediate numbers). They always have the value 10 in the two lowest bits. That is why the test for INUMs is as follows: #define SCM_INUMP(x) (2 & SCM_UNPACK (x)) guile> (bin 0) "00000000000000000000000000000010" guile> (bin 1) "00000000000000000000000000000110" guile> (bin -1) "11111111111111111111111111111110" guile> (bin 343434) "00000000000101001111011000101010" 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |1|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `---------------------------------------------------------' `--' fixnum fixnum value indicator *** Characters Characters are immediates with 100 in bits #2-#0, and 11110 in the bits #7-#3. The character code is stored in bits #15-#8 and the rest of the word is unused. Switching to Unicode representation could easily be done by simply using the upper 16 bits for the character code. The test for checking for characters tests whether the lowest 8 bits are equal to the 8-bit character type code, which is 0xf4. #define SCM_CHARP(x) (SCM_ITAG8(x) == scm_tc8_char) guile> (bin #\a) "00000000000000000110000111110100" guile> (bin #\b) "00000000000000000110001011110100" guile> (bin #\nul) "00000000000000000000000011110100" 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | |1|1|1|1|0|1|0|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-----------------------------' `-------------' `-------------' unused character character value indicator *** Ilocs Ilocs are used for specifying lexical variables. They are substituted for symbols which happen to be such variables by the evaluator. The test for checking for ilocs tests whether the lowest 8 bits are equal to the 8-bit iloc type code, which is 0xfc. 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | |1|1|1|1|1|1|0|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `---------------------------------------------' `-------------' FIXME: check out bit numbers iloc indicator *** Special symbols Special symbols are inserted into cons pairs when the evaluator finds them. FIXME: Details. ** Heap objects Heap objects are thos objects which occupy memory on the Scheme heap. The SCM values are tagged with three zeros in the lowest bits and the whole SCM value is to be interpreted as a pointer to a heap cell. There are two types of heap cells. The normal cells occupy two words each, and the double cells occupy four words. Double cells are used for objects which require more than two words, but for which malloc()ing another heap block is too expensive, for example real numbers. Other bigger objects, strings, vectors and bignums for example, store information in separate memory regions. *** Strings guile> (bin "") "00001000000001001010000000100000" guile> (bin (string)) "01000000001010010000000000101000" guile> (bin (string #\a)) "01000000001010001110010001001000" guile> (bin "foo") "01000000001010010100101110110000" The SCM of a string: 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|0|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' pointer into cell heap The cell layout for strings: 31 24|23 16|15 8|7 0 string length string type code .---------------------------------------------. .-----------. .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | |0|0|1|0|1|0|1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' pointer to malloc()ed region *** Symbols Symbols are represented similarly to strings, but they occupy double cells. The third word stores the hash value of the symbol name, and the fourth word is the head of a property list. The CAR of this list holds the symbol's function slot (only used for emulating other Lisp dialects) and the CDR an association list of the symbol's properties. The SCM of a symbol: 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|0|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' pointer into cell heap The double cell layout for symbols: 31 24|23 16|15 8|7 0 symbol length symbol type code .---------------------------------------------. .-----------. .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | |0|0|0|0|1|0|1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `-------------------------------------------------------------' . pointer to malloc()ed region . . . . symbol's hash value as a raw integer . .-------------------------------------------------------------. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' property list *** Vectors Vectors (normal and weak vectors) are represented similar to strings. The 7-bit type codes are 13 (normal vectors) and 15 (weak vectors). The SCM of a vector: 31 24|23 16|15 8|7 0 .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|0|0| `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' pointer into cell heap The cell layout for vectors: 31 24|23 16|15 8|7 0 vector length vector type code .---------------------------------------------. .-----------. .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | |0|0|0|1|1|0|1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' pointer to malloc()ed region There are several types of weak vectors. The exact type (weak vector, weak hash table, value/key/both) is encoded in the malloc()ed memory block, by allocating the vector one element bigger, and then pointing to the second element. The type is then stored one word before the vector contents (FIXME: how is it represented?) The cell layout for weak vectors: 31 24|23 16|15 8|7 0 vector length weak vector type code .---------------------------------------------. .-----------. .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. | | | | | | | | | | | | | | | | | | | | | | | | | |0|0|0|1|1|1|1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' pointer to malloc()ed region *** Real numbers Real numbers are stored in double cells. In the first word of the cell, the type code for smobs is stored in the 8 lowest bits. Numbers (reals, complex numbers and bignums) are special smobs, because they occupy the first three smob codes. This makes it efficient to test whether a pointer into a heap points to a number. 31 24|23 16|15 8|7 0 real smob type code .-. .-------------. .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|1|1|1|1|1|1|1|1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `-------------------------------------------------------------' . empty . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-' `-------------------------------------------------------------' real number as a C double *** Bignums 31 24|23 16|15 8|7 0 bignum smob type code .-. .-------------. .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|1|1|1|1|1|1|1|1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `-------------------------------------------------------------' pointer to digit memory *** Complex numbers 31 24|23 16|15 8|7 0 complex smob type code .-. .-------------. .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `-------------------------------------------------------------' pointer to `struct complex' *** Smobs Smobs (abbreviation for ``small objects'') are used for adding data types to Guile dynamically. An extension library can request a new smob type code dynamically and can then create smobs of this type, passing these values around like all builtin types. The Guile library can determine all necessary information from the smob number stored in each smob. For example the procedures for marking used smobs and finalizing them can be retrieved from the smob table using this number. 31 24|23 16|15 8|7 0 smob number smob type code .-------------. .-------------. .-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-. |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| | | | | | | | |1|1|1|1|1|1|1|1| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ `-------------------------------------------------------------' dependant on the smob type ============================================================ _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel