From: Tom Lord <lord@regexps.com>
Subject: for example
Date: Sat, 3 Aug 2002 19:20:13 -0700 (PDT) [thread overview]
Message-ID: <200208040220.TAA14265@morrowfield.regexps.com> (raw)
In-Reply-To: 200208040205.TAA14205@morrowfield.regexps.com
ok -- these are input to simple awk scripts (couple k lines) -- though
I'm probably sending you versions that don't currently compile:
And yes --- I'm being cryptic (a side effect of trying to compress a
lot of info down to a short message) --- but the basic message is
"bye", so don't waste time complaining.
-t
! Register Types
This file is part of the source code for the Hackerlab C library.
It's translated to C by the awk program "./register-tags.awk".
VM registers have small _external tags_ so they can hold a
limited selection of unboxed values.
This file declares the register tags and the union type
for (various kinds of) register.
*> core-registers register-type scm_register 2
**> holds scm s
**> holds scm_u u
**> holds scm_i i
**> holds scm_f f
--------------
! The Bit Tag Spec File
A popular misconception is that a tagging system simply maps values
to a set of densely-packed, small integer tags, each tag representing
a type. You'll see people write:
<<<
struct generic_object_header;
{
t_uint tag;
...
};
>>>
but that's really _oversimplified._ A good tagging system does
much more than that.
For example, here is an outline that is processed automatically to
produce `enum' declarations for a system of "staggered tags" (see
SCM, for example), predicate functions, and case labels.
Since scheme is important enough that we should (at least casually)
worry about the length and complexity of the bootstrap path from
front-panel toggle-switches to full hosting environment, this file
is designed to processed by a small `awk' script (i.e., a script for
a little language with hash tables, loops, conditionals, and regexps
but not much more).
CLS note: I'm not sure how clear this will be to people who aren't
looking at the rest of its context. I hope the notation is clear
enough to be puzzled out....
*> scm-tags tags scm
**: decodes-to scm_u
We're going to define tags with a basename "scm" for values
of type "scm".
**> split-tag val 2
The smallest in-line tag will be two bits.
***> tag bibop_object (00)
****: tags-by-mask
****: decodes-to t_scm_sptr;
Bibop objects are the lightest weight in terms of meta-data overhead
(e.g., they don't necessarily have reference counts) and (if a
direct pointer representation is used) alignment requirements (they
are 4-byte aligned)
Bibop objects share storage with page objects (see below).
***> tag cow_bibop_object (..)
****: tags-by-mask
****: decodes-to t_scm_cow_sptr;
Lazy-linear bibop objects.
There is a one bit reference count for each bibop object.
When the first cow reference to an object is formed, that
reference count is 0. If the cow reference is copied (to
produce a second cow reference), the count is 1. If
yet another cow-copy is made, the new copy is in fact a
shallow-copy of the object with reference count 0 (references
in shallow copy are cow references). When fetching a
possibly cow field, programs can request a non-cow reference
to a stable object which the field will continue to hold
with a cow reference: if, before the fetch, the field held
a cow reference to an object with (possibly) more than one
cow reference, then a shallow copy is made and the field updated
before the fetch returns.
***> split-tag heavy_pointer 2 (..)
Bibop pointers have the nice property of being small (if implemented
as direct pointers) but the drawback that reclamation of objects
weakly held by bibop pointers can not occur until a scan has updated
all pointers to the object.
At the opposite extreme are object-table pointers and fat pointers:
more or less interchangable ways to obtain cheap (in time)
weak references and even cheaply destroyable objects.
****> tag vm_object (....)
****: tags-by-mask
****: decodes-to t_scm_obj
The heap format of an object is quite complicated and is dcoumented
in other files.
****> tag cow_vm_object (....)
****: tags-by-mask
****: decodes-to t_scm_cow_obj
Lazy-linear vm objects. Similar to cow bibop objects,
except that the reference count is larger.
****> tag vm_object_promise (....)
****: tags-by-mask
****: decodes-to t_scm_promise_obj
Lazy, memoized, and referencer-memoized vm objects.
****> split-tag vm_page 1 (....)
*****: tags-by-mask
*****: decodes-to t_scm_page
A modest pool of very-large-alignment (256 bytes) types.
*****> split-tag vm_direcct_page 2 (....)
******> tag vm_page16 (......)
******> tag vm_page128 (......)
******> tag vm_page512 (......)
******> tag vm_page1024 (......)
*****> split-tag vm_cow_page 2 (....)
******> tag vm_cow_page16 (......)
******> tag vm_cow_page128 (......)
******> tag vm_cow_page512 (......)
******> tag vm_cow_page1024 (......)
***> split-tag immediate 1 (..)
Characters want to be "unicode+bucky bits" which adds up to _at
least_ 24 bit and more comfortably to 29.
Numbers are weird. Do we want one or two big-as-possible immediate
integer types? or do we want to cram in lots of little types
for tiny immediate rationals and complex numbers? How much
of it should make sense in 16-bit environments?
Atomic values: I don't care much about them. `nil' is the
0 non-immediate value. I wouldn't horribly miss `#t'
or seeing it become a non-immediate -- almost nothing low-level
ever dispatches on #t specifically. Indeed -- it's easy for an
allocator to create disjoint, immutable, non-referencing objects
that can be re-used across all VM instances and have well-known
fixed addresses per-process. Use values of that sort
for atomics: one extra memory fetch for eq? test (to look up
the well-known address) but otherwise just as good.
So it's a two way battle: numbers v. characters. Characters have
the stricter data-size demands: let's give them half the remaining
values:
****> tag character (...)
*****: decodes-to t_unicode
*****: decodes-exp (t_unicode)( ((val >> scm_tag_width_character) & (((scm)1 << scm_char_code_bits) - 1)) \
| (val & ((((scm)1 << scm_bucky_bits) - 1) << (scm_bits - scm_bucky_bits))))
In a 32-bit or larger environment, we get 29 bits for
immediate characters -- enough for 21-bits of
Unicode plus bucky-bits {left,right}x{shift,ctl,meta,alt}.
Sweet.
In the expanded (at least 32-bit) form, we keep the bucky bits in
the high-order 8 bits.
****> tag immediate_signed (111) signed!
*****: decodes-to scm_i
*> scm-fast-dispatchers scm
**> dispatcher is_bibop
**> return 0 for bibop_object cow_bibop_objec
**> return 1 otherwise
**> dispatcher is_vm_object
**> return 1 for vm_object cow_vm_object vm_object_promise
**> return 0 otherwise
**> dispatcher vm_obj_discipline ?
***> return cow for vm_cow_page16 vm_cow_page128 vm_cow_page512vm_cow_page1024
***> return cow for cow_vm_object cow_bibop_object
***> return promise for vm_object_promise
***> return immediate for immediate
***> return regular otherwise
this should generate
[extern]enum scm_vm_obj_discipline scm_vm_obj_discipline(scm? obj) { switch (scm_tag(..)) ... }
and
inline t_uint8 scm_vm_obj_discipline_switch(scm? obj) { return scm_tag (...); }
#define SCM_VM_OBJ_DISCIPLINE_COW_CASE ...
#define SCM_VM_OBJ_DISCIPLINE_PROMISE_CASE ...
#define SCM_VM_OBJ_DISCIPLINE_IMMEDIATE_CASE ...
#define SCM_VM_OBJ_DISCIPLINE_REGULAR_CASE ...
add a way to make pointer types for tags (e.g. vm_object) and for any
binary dispatcher + a conversion function that return nil on "wrong
type". (`scm_as_vm_obj(scm val) => `t_scm_vm_obj').
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel
next prev parent reply other threads:[~2002-08-04 2:20 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-08-02 22:42 status: separation of expansion/optimization/memoization/execution Dirk Herrmann
2002-08-02 23:15 ` Rob Browning
2002-08-02 23:47 ` Han-Wen
2002-08-02 23:20 ` Dale P. Smith
2002-08-03 12:12 ` Han-Wen
2002-08-04 1:51 ` Dirk Herrmann
2002-08-04 2:03 ` Han-Wen
2002-08-04 2:05 ` Tom Lord
2002-08-04 2:11 ` Tom Lord
2002-08-04 2:20 ` Tom Lord [this message]
2002-08-04 2:27 ` i know -- let's play bridge! Tom Lord
2002-08-04 2:46 ` Tom Lord
2002-08-04 2:50 ` Thomas Bushnell, BSG
2002-08-04 2:57 ` Tom Lord
2002-08-04 3:04 ` Thomas Bushnell, BSG
2002-08-04 3:43 ` Tom Lord
2002-08-04 3:53 ` Thomas Bushnell, BSG
2002-08-04 4:03 ` Tom Lord
2002-08-04 4:10 ` Tom Lord
2002-08-04 3:50 ` Tom Lord
2002-08-04 3:55 ` Tom Lord
2002-08-04 3:58 ` Tom Lord
2002-08-05 18:15 ` status: separation of expansion/optimization/memoization/execution Marius Vollmer
2002-08-05 18:11 ` Marius Vollmer
2002-08-07 20:51 ` Dirk Herrmann
2002-08-10 13:01 ` Marius Vollmer
2002-08-14 19:30 ` Dirk Herrmann
2002-08-26 22:11 ` Marius Vollmer
2002-08-05 18:36 ` Neil Jerram
2002-08-07 20:55 ` Dirk Herrmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200208040220.TAA14265@morrowfield.regexps.com \
--to=lord@regexps.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).