* The relationship between SCM and scm_t_bits. @ 2004-05-03 15:06 Marius Vollmer 2004-05-03 16:10 ` Marius Vollmer ` (5 more replies) 0 siblings, 6 replies; 25+ messages in thread From: Marius Vollmer @ 2004-05-03 15:06 UTC (permalink / raw) Hi, I just got confused about our two 'fundamental' types, SCM and scm_t_bits. Variables and function arguments are declared to be of type SCM, while the words in a cell are of type scm_t_bits. This results in problems when dealing with pointers: we can not cleanly cast a pointer to SCM to a pointer to scm_t_bits, but we might want to do so. For example, consider a list that is pointed to by a global variable and some fairly standard way of dealing with singly-linked lists in C: SCM head; void delete_some () { SCM *node_ptr = &head; if (should_delete (*node_ptr)) *node_ptr = SCM_CDR (*node_ptr); else node_ptr = SCM_CDRLOC (*node_ptr); } What should the definition of SCM_CDRLOC be? Right now it is: #define SCM_CDRLOC(x) ((SCM *) SCM_CELL_WORD_LOC ((x), 1)) I.e., it casts a pointer to scm_t_bits to a pointer to SCM. That, however, breaks the SCM_PACK/SCM_UNPACK abstraction. A scm_t_bits value can only be converted to a SCM value via SCM_PACK, but by using SCM_CDRLOC, you can sidestep this conversion. Luckily, code like this works on the platforms that Guile is used on, but it still is quite unclean, I'd say. For example, scm_t_bits might be larger than SCM, or SCM_PACK and SCM_UNPACK might one day need to become non-trivial on certain platforms (like they were on Crays, I think.) When the encodings of SCM and scm_t_bits do indeed differ, we should allow only one of them to be the canonical encoding that is recognized by the garbage collector. Right now, this is the SCM encoding (since scm_mark_locations uses a pointer to SCM to read the stack, etc.) I propose to remove the need to convert between scm_t_bits* and SCM* and to allow only SCMs to be in memory. The words in a scm_t_cell would be of type SCM. This would mean that SCM_CELL_WORD_LOC would be removed and replaced with SCM_CELL_OBJECT_LOC. Also, SCM_SET_SMOB_DATA (etc) might not be able to store all scm_t_bits values that it is handed (because scm_t_bits could be larger than a pointer). We could make a new guarantee that says that SCM_SET_SMOB_DATA (etc) can store any pointer that is cast to a scm_t_bits and any integer that fits into 'unsigned int', say. The type scm_t_bits would be restricted to temporary values that are mostly used to test tag bits etc. They would usually not stored in data structures and when they are, they can not be expected to protected the SCM value that they encode when they are scanned conservatively. Should we (gradually and with deprecation and everyhing) remove scm_t_bits from the smob API completely? I have not thought this thru, but we might and with something that is not really an improvement, just different. Opinions? _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer @ 2004-05-03 16:10 ` Marius Vollmer 2004-05-03 16:21 ` Paul Jarc ` (4 subsequent siblings) 5 siblings, 0 replies; 25+ messages in thread From: Marius Vollmer @ 2004-05-03 16:10 UTC (permalink / raw) Marius Vollmer <marius.vollmer@uni-dortmund.de> writes: > [...] SCM_PACK and SCM_UNPACK might one day need to become > non-trivial on certain platforms (like they were on Crays, I think.) Ok, I just (re-)discovered that this is a non-issue: these encoding changes are done by SCM2PTR and PTR2SCM. Right now, I have a version of Guile that doesn't store scm_t_bits into the heap, and it does feel a cleaner since no scm_t_bits* <-> SCM* casts are done. Now, I'll try to completely remove the use of the scm_t_cell type (only the use, not the definition). This one is rather confusing as well since it is not used consequently (I think). _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer 2004-05-03 16:10 ` Marius Vollmer @ 2004-05-03 16:21 ` Paul Jarc 2004-05-04 13:53 ` Marius Vollmer ` (3 subsequent siblings) 5 siblings, 0 replies; 25+ messages in thread From: Paul Jarc @ 2004-05-03 16:21 UTC (permalink / raw) Cc: guile-devel Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote: > For example, consider a list that is pointed to by a global variable > and some fairly standard way of dealing with singly-linked lists in C: I guess the simplest solution would be "just don't do that". (I.e., the C code would use an scm_t_bits* variable with SCM_PACK/SCM_UNPACK instead of using SCM directly.) > Luckily, code like this works on the platforms that Guile is used on, > but it still is quite unclean, I'd say. I agree. > We could make a new guarantee that says that SCM_SET_SMOB_DATA (etc) > can store any pointer that is cast to a scm_t_bits and any integer > that fits into 'unsigned int', say. That would be enough for my code, I think. Of course it should remain source-compatible, too: #define SCM_SET_SMOB_DATA(smob, bits) SCM_SET_SMOB_OBJECT(smob, SCM_PACK(bits)) > Should we (gradually and with deprecation and everyhing) remove > scm_t_bits from the smob API completely? I have not thought this > thru, but we might and with something that is not really an > improvement, just different. I think it would be an improvement, in conceptual cleanliness if nothing else. It'll make the C interface that much easier to work with. paul _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer 2004-05-03 16:10 ` Marius Vollmer 2004-05-03 16:21 ` Paul Jarc @ 2004-05-04 13:53 ` Marius Vollmer 2004-05-04 17:16 ` Paul Jarc 2004-05-10 13:42 ` Marius Vollmer ` (2 subsequent siblings) 5 siblings, 1 reply; 25+ messages in thread From: Marius Vollmer @ 2004-05-04 13:53 UTC (permalink / raw) Marius Vollmer <marius.vollmer@uni-dortmund.de> writes: > Hi, > > I just got confused about our two 'fundamental' types, SCM and > scm_t_bits. [More random thoughts follow...] Right now, the documented distinction between scm_t_bits and SCM is that... A variable of type `SCM' is guaranteed to hold a valid Scheme object. A variable of type `scm_t_bits', on the other hand, may hold a representation of a `SCM' value as a C integral type, but may also hold any C value, even if it does not correspond to a valid Scheme object. This is a useful distinction, no doubt. However, it creates the situation where user variables have a different type as the cell slots in the heap. Now, the distinction between scm_t_bits and SCM is only then practical when converting between them has zero cost. SCM_PACK and SCM_UNPACK can really only be casts that reinterpret the bits. If this would not be the case, we would have to remove the distinction to get performance back. Thus, scm_t_bits and SCM can be pretty much identical and we can allow the casting of pointers to them, too. Better yet, we provide some markup that does the casting for the user. That is, I now propose to _add_ SCM_CELL_OBJECT_LOC and to _keep_ SCM_CELL_WORD_LOC. That way, we take over responsibility for getting the casting right. Still, I'd want to change the type of fields in the heap from scm_t_bits to SCM, just so that pointers to SCM are guaranteed to work across the heap and user variables. Pointers to scm_t_bits might still fail on strange platforms and we might then consider removing SCM_CELL_WORD_LOC on those platforms. Of course, this whole tedious ponderings can be avoided, as Paul points out, by just disallowing pointers to SCM or to scm_t_bits entirely. But we would need a very good reason for this: using pointers the way delete_some does is completely reasonable right now. One reason to disallow pointers would be that a precise garbage collector that moves objects around can not easily deal with random pointers into the middle of objects. However, that's only a theoretical reason, I don't see getting there any time soon, if at all. So, unless a very good reason comes up, I want to allow pointers to SCM. > For example, consider a list that is pointed to by a global variable > and some fairly standard way of dealing with singly-linked lists in C: > > SCM head; > > void > delete_some () > { > SCM *node_ptr = &head; > if (should_delete (*node_ptr)) > *node_ptr = SCM_CDR (*node_ptr); > else > node_ptr = SCM_CDRLOC (*node_ptr); > } [There should be a loop in that function of course. ] _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-04 13:53 ` Marius Vollmer @ 2004-05-04 17:16 ` Paul Jarc 2004-05-04 17:49 ` Marius Vollmer 0 siblings, 1 reply; 25+ messages in thread From: Paul Jarc @ 2004-05-04 17:16 UTC (permalink / raw) Cc: guile-devel Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote: > Now, the distinction between scm_t_bits and SCM is only then practical > when converting between them has zero cost. SCM_PACK and SCM_UNPACK > can really only be casts that reinterpret the bits. Looking at the case of SCM_DEBUG_TYPING_STRICTNESS == 2, I'd expect that scm_pack might be optimized away, so it would have no run-time cost. (At least, the compiler has enough information to do so, and the C standard allows it.) If that isn't happening already, maybe marking it as inline would help? > Thus, scm_t_bits and SCM can be pretty much identical and we can allow > the casting of pointers to them, too. The C standard does not allow accessing a value through a pointer to a different type. Newer versions of gcc have optimizations depending on that restriction included in -O2. You can disable those optimizations with -fno-strict-aliasing, but maybe those optimizations would outweigh some nonzero-cost conversion between scm_t_bits and SCM. Some profiling would be useful. > Pointers to scm_t_bits might still fail on strange platforms and we > might then consider removing SCM_CELL_WORD_LOC on those platforms. Better to make Guile the same on all platforms, I think, and so remove it on all platforms if it doesn't work on some. Granted that it's useful to have both SCM and scm_t_bits, what exactly is the advantage in using those two types to alias the same bytes in memory? What do we gain here over your previous use-SCM-everywhere suggestion? > But we would need a very good reason for this: using pointers the > way delete_some does is completely reasonable right now. Well, it's expected to be reasonable, but turns out to be not quite so, right? Hence the issue. paul _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-04 17:16 ` Paul Jarc @ 2004-05-04 17:49 ` Marius Vollmer 2004-05-04 18:35 ` Paul Jarc 0 siblings, 1 reply; 25+ messages in thread From: Marius Vollmer @ 2004-05-04 17:49 UTC (permalink / raw) prj@po.cwru.edu (Paul Jarc) writes: > Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote: >> Now, the distinction between scm_t_bits and SCM is only then practical >> when converting between them has zero cost. SCM_PACK and SCM_UNPACK >> can really only be casts that reinterpret the bits. > > Looking at the case of SCM_DEBUG_TYPING_STRICTNESS == 2, I'd expect > that scm_pack might be optimized away, so it would have no run-time > cost. (At least, the compiler has enough information to do so, and > the C standard allows it.) If that isn't happening already, maybe > marking it as inline would help? SCM_DEBUG_TYPING_STRICTNESS == 2 is not meant for production code (I have to review it anyway, there has been a bug report about it not working at all, I think), so when compiling with that setting, the performance argument doesn't count. >> Thus, scm_t_bits and SCM can be pretty much identical and we can allow >> the casting of pointers to them, too. > > The C standard does not allow accessing a value through a pointer to a > different type. Newer versions of gcc have optimizations depending on > that restriction included in -O2. Yes, we are pretty much non-standard when doing this, but it is OK when it 'just works' in practice. The optimizations should only bite when we access the same memory location thru to differently typed pointers, but we would not be doing this (I think). A given location is always only accessed as one type. Err. Or so I thought. We have SCM_CELL_OBJECT (x, n) === SCM_PACK (SCM_CELL_WORD (x, n)) which would cause trouble when used together with *(SCM_CELL_OBJECT_LOC (x, n)) === *((SCM *)(SCM_CELL_WORD_LOC (x, n))) Well. I'm still confused... > You can disable those optimizations with -fno-strict-aliasing, but > maybe those optimizations would outweigh some nonzero-cost > conversion between scm_t_bits and SCM. Some profiling would be > useful. We already aim to be -fstrict-aliasing safe: see for example scm_double_cell in inline.h. >> Pointers to scm_t_bits might still fail on strange platforms and we >> might then consider removing SCM_CELL_WORD_LOC on those platforms. > > Better to make Guile the same on all platforms, I think, and so remove > it on all platforms if it doesn't work on some. Yes. > Granted that it's useful to have both SCM and scm_t_bits, what exactly > is the advantage in using those two types to alias the same bytes in > memory? What do we gain here over your previous use-SCM-everywhere > suggestion? We would have to remove SCM_CELL_WORD_LOC from the API. >> But we would need a very good reason for this: using pointers the >> way delete_some does is completely reasonable right now. > > Well, it's expected to be reasonable, but turns out to be not quite > so, right? Hence the issue. Hmm. The (my) reasoning goes like this: first, we only had SCM and everything was simple. Then we added scm_t_bits to emphasize the opaqueness of SCM but still be able to interpret its bits. That gave us the possibility to use these two types for the two kinds of words in the heap: some are interpreted as a SCM (with tag bits and everything), some are not interpreted and can contain anything that fits into one word. This makes SCM a subtype of scm_t_bits: all SCM values can be represented with a scm_t_bits, but not all scm_t_bits values are valid SCM values. Thus, the basic type of heap words is scm_t_bits. That, however, creates the issue with having pointers to SCM. When we can only have one type of pointer (either SCM* and scm_t_bits*), I would want to have pointers to SCM since SCM is the type that the users works with. That would mean that heap words are of type SCM. That then also means that SCM_PACK would be used to store 'invalid' values into a SCM and can no longer be used to mark places where only valid SCM values should appear. Maybe we should have a new SCM_PACK_BITS for this, or maybe not. Damn, I can't get my head around this... My instincts tell me to just remove scm_t_bits from the public API completely, but that is rather drastic, maybe for 1.10... _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-04 17:49 ` Marius Vollmer @ 2004-05-04 18:35 ` Paul Jarc 2004-05-05 10:00 ` Marius Vollmer 0 siblings, 1 reply; 25+ messages in thread From: Paul Jarc @ 2004-05-04 18:35 UTC (permalink / raw) Cc: guile-devel Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote: > SCM_DEBUG_TYPING_STRICTNESS == 2 is not meant for production code (I > have to review it anyway, there has been a bug report about it not > working at all, I think), so when compiling with that setting, the > performance argument doesn't count. What I'm saying is that (ISTM) we might be able to get maximum type-safety and still have zero run-time conversion cost. Isn't that the ideal? But actually, SCM_DEBUG_TYPING_STRICTNESS == 1 looks like it's just as safe as 2 anyway. (I think it would be nice to have SCM_UNPACK type-check its argument as SCM_PACK does; let the caller explicitly cast to scm_t_bits to silence the warning.) > A given location is always only accessed as one type. If that's true, then a union containing SCM and scm_t_bits would give us standard-conformance. > We already aim to be -fstrict-aliasing safe Ok, good. >> What do we gain here over your previous use-SCM-everywhere >> suggestion? > > We would have to remove SCM_CELL_WORD_LOC from the API. Is there much code outside of Guile itself using that? I guess Guile itself will always need something like scm_t_bits to check type tags, but I wonder how useful WORD_LOC is for other code. > When we can only have one type of pointer (either SCM* and > scm_t_bits*), I would want to have pointers to SCM since SCM is the > type that the users works with. That would mean that heap words are > of type SCM. I agree. > That then also means that SCM_PACK would be used to store 'invalid' > values into a SCM and can no longer be used to mark places where only > valid SCM values should appear. Maybe we should have a new > SCM_PACK_BITS for this, or maybe not. SCM_MAKINUM? :) Or do we need the full range? More precisely, do we need the full range and zero-cost conversion simultaneously? Even more precisely, do we need that in the API, for code outside of Guile itself? > My instincts tell me to just remove scm_t_bits from the public API > completely, but that is rather drastic, maybe for 1.10... Yes, if it's to be done, there should certinaly be a significant deprecation period to give people time to adapt. paul _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-04 18:35 ` Paul Jarc @ 2004-05-05 10:00 ` Marius Vollmer 2004-05-05 14:58 ` Paul Jarc 0 siblings, 1 reply; 25+ messages in thread From: Marius Vollmer @ 2004-05-05 10:00 UTC (permalink / raw) prj@po.cwru.edu (Paul Jarc) writes: > Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote: >> SCM_DEBUG_TYPING_STRICTNESS == 2 is not meant for production code (I >> have to review it anyway, there has been a bug report about it not >> working at all, I think), so when compiling with that setting, the >> performance argument doesn't count. > > What I'm saying is that (ISTM) we might be able to get maximum > type-safety and still have zero run-time conversion cost. Isn't that > the ideal? Yes, that's the ideal. I haven't looked myself but I think that STRICTNESS == 2 wont put SCM values into registers or something. > But actually, SCM_DEBUG_TYPING_STRICTNESS == 1 looks like it's just as > safe as 2 anyway. Isn't there the added advantage with level 2 that it doesn't accept SCM values in conditions? >> A given location is always only accessed as one type. > > If that's true, then a union containing SCM and scm_t_bits would give > us standard-conformance. But not the best performance, I'm afraid. Also, I'm no longer sure that "A given location is always only accessed as one type". >>> What do we gain here over your previous use-SCM-everywhere >>> suggestion? >> >> We would have to remove SCM_CELL_WORD_LOC from the API. > > Is there much code outside of Guile itself using that? I guess Guile > itself will always need something like scm_t_bits to check type tags, > but I wonder how useful WORD_LOC is for other code. Since there is no OBJECT_LOC, people _have_ to use WORD_LOC (and probably cast it to SCM* immediately). But yes, the usage should be pretty rare, hopefully. >> When we can only have one type of pointer (either SCM* and >> scm_t_bits*), I would want to have pointers to SCM since SCM is the >> type that the users works with. That would mean that heap words are >> of type SCM. > > I agree. > >> That then also means that SCM_PACK would be used to store 'invalid' >> values into a SCM and can no longer be used to mark places where only >> valid SCM values should appear. Maybe we should have a new >> SCM_PACK_BITS for this, or maybe not. > > SCM_MAKINUM? :) Or do we need the full range? More precisely, do we > need the full range and zero-cost conversion simultaneously? Even > more precisely, do we need that in the API, for code outside of Guile > itself? Yes, to all. We need it to store arbitrary pointers for smobs. When you define a new smob type, you get to decide what to do with the words of the smob cell: you can use them for SCMs, or for anything else that fits into a SCM. Well, the reason I brought this all up is that I'm adding new macros for the use with double smobs: SCM_SMOB_DATA2 and SCM_SMOB_DATA3. Then I have converted all code in Guile to use these macros instead of SCM_CELL_WORD_2, etc. But some code uses SCM_CELL_WORD_LOC, and then I noticed that there is some kinkiness there... I didn't feel like rewriting the code to not use pointers, but maybe that's the cleanest thing after all. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-05 10:00 ` Marius Vollmer @ 2004-05-05 14:58 ` Paul Jarc 0 siblings, 0 replies; 25+ messages in thread From: Paul Jarc @ 2004-05-05 14:58 UTC (permalink / raw) Cc: guile-devel Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote: > I haven't looked myself but I think that STRICTNESS == 2 wont put > SCM values into registers or something. Ah, ok. Well, that depends on the compiler - maybe more recent compilers are better with this. (Though we still have to think about older ones too for a while.) The definition of SCM for STRICTNESS == 2 also has the advantage that an SCM object can be accessed through an scm_t_bits* pointer. > Isn't there the added advantage with level 2 that it doesn't accept > SCM values in conditions? Oops, right. I had only been thinking of getting a diagnostic when using one type where the other specifically was expected - assignments, function arguments, etc. > Yes, to all. We need it to store arbitrary pointers for smobs. When > you define a new smob type, you get to decide what to do with the > words of the smob cell: you can use them for SCMs, or for anything > else that fits into a SCM. Hmm. A union including void* would be perfect, if not for the register problem. paul _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer ` (2 preceding siblings ...) 2004-05-04 13:53 ` Marius Vollmer @ 2004-05-10 13:42 ` Marius Vollmer 2004-05-15 7:31 ` Dirk Herrmann 2004-05-15 15:00 ` Dirk Herrmann 5 siblings, 0 replies; 25+ messages in thread From: Marius Vollmer @ 2004-05-10 13:42 UTC (permalink / raw) Marius Vollmer <marius.vollmer@uni-dortmund.de> writes: > Hi, > > I just got confused about our two 'fundamental' types, SCM and > scm_t_bits. Ok, so this is what I have done now: * Changes to the C interface ** SCM_CELL_WORD_LOC has been deprecated. Use the new macro SCM_CELL_OBJECT_LOC instead, which return a pointer to a SCM, as opposed to a pointer to a scm_t_bits. This was done to allow the correct use of pointers into the Scheme heap. Previously, the heap words were of type scm_t_bits and local variables and function arguments were of type SCM, making it non-standards-conformant to have a pointer that can point to both. ** New macros SCM_SMOB_DATA_2, SM_SMOB_DATA_3, etc. These macros should be used instead of SCM_CELL_WORD_2/3 to access the second and third words of double smobs. Likewise for SCM_SET_SMOB_DATA_2 and SCM_SET_SMOB_DATA_3. Also, there is SCM_SMOB_FLAGS and SCM_SET_SMOB_FLAGS that should be used to get and set the 16 exra bits in the zeroth word of a smob. And finally, there is SCM_SMOB_OBJECT and SCM_SMOB_SET_OBJECT for accesing the first immediate word of a smob as a SCM value, and there is SCM_SMOB_OBJECT_LOC for getting a pointer to the first immediate smob words. Like wise for SCM_SMOB_OBJECT_2, etc. I've also updated the smob documentation in the manual. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer ` (3 preceding siblings ...) 2004-05-10 13:42 ` Marius Vollmer @ 2004-05-15 7:31 ` Dirk Herrmann 2004-05-17 18:09 ` Marius Vollmer 2004-05-15 15:00 ` Dirk Herrmann 5 siblings, 1 reply; 25+ messages in thread From: Dirk Herrmann @ 2004-05-15 7:31 UTC (permalink / raw) Cc: guile-devel Marius Vollmer wrote: >Hi, > >I just got confused about our two 'fundamental' types, SCM and >scm_t_bits. > The two types were introduced in order to create a level of abstraction between code, that operates on the encoding of scheme types, and code that only requires on API elements to do so. The background was, that SCM formerly was just a typedef to "long". This had the disadvantage that the compiler did not do proper type checking when dealing with SCM types. For example, you could by mistake write "if (some_SCM_value) ..." if instead you had intended to write "if (!SCM_FALSEP (some_SCM_value))". Guile had a number of bugs of this or a similar type at that time. Thus, we wanted to introduce a better type checking, but on the other hand be able to deal with the bit representation in low level code. Thus the separation into SCM and scm_t_bits. The idea is, to have SCM normally defined to be some performance efficient type, like a pointer to an unknown struct. This will inhibit most errors, like passing an integer where a SCM is expected, or performing arithmetic operations on a SCM value. It does not, however, inhibit the "if (some_SCM_value)" bug, since the compiler will accept a pointer there. Thus, from time to time, just for type-checking purposes, we can have SCM defined as a struct or union, thus inhibiting even those problems. You will find more information about that if you are looking at the use of SCM_DEBUG_TYPING_STRICTNESS in tags.h and the explanation in __scm.h. Unfortunately, compiling with SCM_DEBUG_TYPING_STRICTNESS set to 2 does not compile without errors, due to some tricky problems in some files. Nevertheless it should still be quite helpful for most parts of guile. Using it, I remember to have fixed about 5 to 10 bugs in guile at that time, which would not have been easy to find without thorough code inspection (or, if someone would have run into them). >For example, consider a list that is pointed to by a global variable >and some fairly standard way of dealing with singly-linked lists in C: > > SCM head; > > void > delete_some () > { > SCM *node_ptr = &head; > if (should_delete (*node_ptr)) > *node_ptr = SCM_CDR (*node_ptr); > else > node_ptr = SCM_CDRLOC (*node_ptr); > } > Such code should be avoided. Otherwise I think it is difficult to move to generational garbage collection: Generational garbage collection is based on the assumption, that newer objects only can point to older objects. This is true, as long as you don't modifiy existing objects. Therefore, write accesses to existing objects must pass through a write barrier, which is some code that helps the gc to work correctly even in case of references from old to new. By modifying SCM values through SCM pointers you skip the write barrier. When switching to generational GC, such code would have to be fixed, either to avoid using the pointer access, or to add some operations that perform, what the write barrier would have done. Thus, in order to keep the number of places small which require such fixing later, I try to avoid using SCM_CDRLOC etc. Best regards, Dirk Herrmann _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-15 7:31 ` Dirk Herrmann @ 2004-05-17 18:09 ` Marius Vollmer 0 siblings, 0 replies; 25+ messages in thread From: Marius Vollmer @ 2004-05-17 18:09 UTC (permalink / raw) Cc: guile-devel Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes: > Marius Vollmer wrote: > >>Hi, >> >>I just got confused about our two 'fundamental' types, SCM and >>scm_t_bits. >> > The two types were introduced in order to create a level of > abstraction between code, that operates on the encoding of scheme > types, and code that only requires on API elements to do so. > [...] Yep, thanks for reinforcing this. The purpose of scm_t_bits as it is currenly documented in the manual is different, however: it is the type that can hold arbitrary words, not just valid SCM values. The first word of a smob would be a scm_t_bits since it is not a valid SCM value, for example. I was confused by these two distinctions between the two types. >>For example, consider a list that is pointed to by a global variable >>and some fairly standard way of dealing with singly-linked lists in C: >> >> SCM head; >> >> void >> delete_some () >> { >> SCM *node_ptr = &head; >> if (should_delete (*node_ptr)) >> *node_ptr = SCM_CDR (*node_ptr); >> else >> node_ptr = SCM_CDRLOC (*node_ptr); >> } >> > Such code should be avoided. [...] Yes, probably. But we currently support this style and Guile uses it itself. However, the way we were casting pointers around did not properly respect the disctinction between SCM and scm_t_bits: we didn't make the guarantee that you could meaningfully cast a scm_t_bits* to a SCM*, yet we would rely on it. Instead of disallowing working with SCM* pointers, I have fixed the type confusion. This was the smaller change by far. But of course, this does not mean that we can not at one time disallow the use of SCM* pointers. But that would be a much bigger decision than I would want to make now, while (slowly) zooming in on 1.8. > By modifying SCM values through SCM pointers you skip the write > barrier. When switching to generational GC, such code would have to > be fixed, either to avoid using the pointer access, or to add some > operations that perform, what the write barrier would have > done. Thus, in order to keep the number of places small which > require such fixing later, I try to avoid using SCM_CDRLOC etc. Yes, I fully agree here. -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer ` (4 preceding siblings ...) 2004-05-15 7:31 ` Dirk Herrmann @ 2004-05-15 15:00 ` Dirk Herrmann 2004-05-15 16:42 ` Dirk Herrmann 2004-05-17 19:22 ` Marius Vollmer 5 siblings, 2 replies; 25+ messages in thread From: Dirk Herrmann @ 2004-05-15 15:00 UTC (permalink / raw) Cc: guile-devel Marius Vollmer wrote: > I propose to remove the need to convert between scm_t_bits* and SCM* > and to allow only SCMs to be in memory. > > > The words in a scm_t_cell would be of type SCM. This would mean that > SCM_CELL_WORD_LOC would be removed and replaced with > SCM_CELL_OBJECT_LOC. Also, SCM_SET_SMOB_DATA (etc) might not be able > to store all scm_t_bits values that it is handed (because scm_t_bits > could be larger than a pointer). We could make a new guarantee that > says that SCM_SET_SMOB_DATA (etc) can store any pointer that is cast > to a scm_t_bits and any integer that fits into 'unsigned int', say. > > The type scm_t_bits would be restricted to temporary values that are > mostly used to test tag bits etc. They would usually not stored in > data structures and when they are, they can not be expected to > protected the SCM value that they encode when they are scanned > conservatively. > > Should we (gradually and with deprecation and everyhing) remove > scm_t_bits from the smob API completely? I have not thought this > thru, but we might and with something that is not really an > improvement, just different. I have tried to give it some more thought: Since the introduction of scm_t_bits, it has been stated about SCM variables, that every SCM variable is known to hold a valid scheme object, while scm_t_bits variables may also hold arbitrary data. On the heap, however, cells do typically _not_ hold valid scheme objects. One of the exceptions is the pair object, which has the property, that both of its cell entries happen to hold valid scheme objects. This is the reason why SCM_CARLOC and SCM_CDRLOC work on pairs. If you use SCM_CARLOC or SCM_CDRLOC on anything else than a pair object, you will most likely crash the system, because the SCM value that you get will not hold a valid scheme object. From this perspective, it is inconsistent to define scm_t_cell to hold SCM objects. I have not yet given it a try, but I found the suggestion to use a union quite appaling: typedef struct scm_t_cell { union { scm_t_bits word_0; SCM object_0; } element_0; union { scm_t_bits word_1; SCM object_1; } element_1; } scm_t_cell; or even a more generic version, which would cover single cells as well as double cells and even allow to cleanly iterate over all cell elements in a loop: typedef struct scm_t_cell { union { scm_t_bits word; SCM object; } elements[]; } scm_t_cell; I see no reason why either of the above solutions should have a negative influence on performance. All of our cell accessing macros should be easily re-definable with those types, in particular: #define SCM_GC_CELL_WORD(x, n) (((SCM2PTR (x)) [n]).word) #define SCM_GC_CELL_OBJECT(x, n) (((SCM2PTR (x)) [n]).object) which does not require any casting to be performed any more :-) Best regards, Dirk Herrmann _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-15 15:00 ` Dirk Herrmann @ 2004-05-15 16:42 ` Dirk Herrmann 2004-05-17 19:22 ` Marius Vollmer 1 sibling, 0 replies; 25+ messages in thread From: Dirk Herrmann @ 2004-05-15 16:42 UTC (permalink / raw) Cc: guile-devel Dirk Herrmann wrote: > I have not yet given it a try, but I found the suggestion to use a > union quite appaling: > [some syntactically incorrect examples deleted :-)] However, I have now given it a try, and it seems to work quite nicely. I enclose a patch, which I would be glad to apply, if you, Marius, agree. I did not do it yet, though, in order to give everybody the chance to check whether it has any negative influence on performance. Best regards, Dirk Diff agains the latest CVS: Index: NEWS =================================================================== RCS file: /cvsroot/guile/guile/guile-core/NEWS,v retrieving revision 1.427 diff -u -b -r1.427 NEWS --- NEWS 6 May 2004 17:04:04 -0000 1.427 +++ NEWS 15 May 2004 16:02:26 -0000 @@ -584,16 +584,6 @@ * Changes to the C interface -** SCM_CELL_WORD_LOC has been deprecated. - -Use the new macro SCM_CELL_OBJECT_LOC instead, which return a pointer -to a SCM, as opposed to a pointer to a scm_t_bits. - -This was done to allow the correct use of pointers into the Scheme -heap. Previously, the heap words were of type scm_t_bits and local -variables and function arguments were of type SCM, making it -non-standards-conformant to have a pointer that can point to both. - ** New macros SCM_SMOB_DATA_2, SM_SMOB_DATA_3, etc. These macros should be used instead of SCM_CELL_WORD_2/3 to access the Index: libguile/ChangeLog =================================================================== RCS file: /cvsroot/guile/guile/guile-core/libguile/ChangeLog,v retrieving revision 1.2070 diff -u -b -r1.2070 ChangeLog --- libguile/ChangeLog 15 May 2004 15:50:24 -0000 1.2070 +++ libguile/ChangeLog 15 May 2004 16:02:26 -0000 @@ -1,5 +1,23 @@ 2004-05-15 Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> + * deprecated.h (SCM_CELL_WORD_LOC): Un-deprecated. + + * gc.h (scm_t_cell): Redefined to hold a union, since each cell + element either holds a scm_t_bits value or a SCM value. + + (SCM_GC_CARD_BVEC, SCM_GC_SET_CARD_BVEC, SCM_GC_GET_CARD_FLAGS, + SCM_GC_SET_CARD_FLAGS, SCM_GC_CELL_OBJECT, SCM_GC_CELL_WORD, + SCM_GC_SET_CELL_OBJECT, SCM_GC_SET_CELL_WORD): Modified to work + with the new scm_t_cell. + + (SCM_CELL_WORD_LOC): Un-deprecated. + + * numbers.h (SCM_I_BIG_MPZ): Use SCM_CELL_WORD_LOC instead of + SCM_CELL_OBJECT_LOC, since we are not dealing with scheme objects + here. + +2004-05-15 Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> + * numbers.h (SCM_SLOPPY_FRACTIONP): Removed. It was not used throughout guile, has not been part of an official release yet, and the concept of sloppy predicates has never been a good idea. Index: libguile/deprecated.h =================================================================== RCS file: /cvsroot/guile/guile/guile-core/libguile/deprecated.h,v retrieving revision 1.19 diff -u -b -r1.19 deprecated.h --- libguile/deprecated.h 6 May 2004 16:42:50 -0000 1.19 +++ libguile/deprecated.h 15 May 2004 16:02:26 -0000 @@ -300,11 +300,6 @@ #define SCM_VALIDATE_OPDIR(pos, port) SCM_MAKE_VALIDATE (pos, port, OPDIRP) -/* Deprecated because we can not safely cast a SCM* to a scm_t_bits* - */ - -#define SCM_CELL_WORD_LOC(x, n) ((scm_t_bits*)SCM_CELL_OBJECT_LOC((x),(n))) - void scm_i_init_deprecated (void); #endif Index: libguile/gc.h =================================================================== RCS file: /cvsroot/guile/guile/guile-core/libguile/gc.h,v retrieving revision 1.116 diff -u -b -r1.116 gc.h --- libguile/gc.h 6 May 2004 16:41:08 -0000 1.116 +++ libguile/gc.h 15 May 2004 16:02:26 -0000 @@ -36,8 +36,10 @@ typedef struct scm_t_cell { - SCM word_0; - SCM word_1; + union { + scm_t_bits word; + SCM object; + } elements[2]; } scm_t_cell; /* @@ -73,12 +75,14 @@ #define SCM_GC_CARD_N_CELLS 256 #define SCM_GC_SIZEOF_CARD SCM_GC_CARD_N_CELLS * sizeof (scm_t_cell) -#define SCM_GC_CARD_BVEC(card) ((scm_t_c_bvec_long *) ((card)->word_0)) +#define SCM_GC_CARD_BVEC(card) \ + ((scm_t_c_bvec_long *) ((card)->elements[0].word)) #define SCM_GC_SET_CARD_BVEC(card, bvec) \ - ((card)->word_0 = (SCM) (bvec)) -#define SCM_GC_GET_CARD_FLAGS(card) ((long) ((card)->word_1)) + ((card)->elements[0].word = (scm_t_bits) (bvec)) +#define SCM_GC_GET_CARD_FLAGS(card) \ + ((long) ((card)->elements[1].word)) #define SCM_GC_SET_CARD_FLAGS(card, flags) \ - ((card)->word_1 = (SCM) (flags)) + ((card)->elements[1].word = (flags)) #define SCM_GC_GET_CARD_FLAG(card, shift) \ (SCM_GC_GET_CARD_FLAGS (card) & (1L << (shift))) @@ -141,12 +145,13 @@ * in debug mode. In particular these macros will even work for free cells, * which should never be encountered by user code. */ -#define SCM_GC_CELL_OBJECT(x, n) (((SCM *)SCM2PTR (x)) [n]) -#define SCM_GC_CELL_WORD(x, n) (SCM_UNPACK (SCM_GC_CELL_OBJECT ((x), (n)))) +#define SCM_GC_CELL_WORD(x, n) (((SCM2PTR (x))->elements[n]).word) +#define SCM_GC_CELL_OBJECT(x, n) (((SCM2PTR (x))->elements[n]).object) -#define SCM_GC_SET_CELL_OBJECT(x, n, v) ((((SCM *)SCM2PTR (x)) [n]) = (v)) +#define SCM_GC_SET_CELL_OBJECT(x, n, v) \ + (((SCM2PTR (x))->elements[n]).object = (v)) #define SCM_GC_SET_CELL_WORD(x, n, v) \ - (SCM_GC_SET_CELL_OBJECT ((x), (n), SCM_PACK (v))) + (((SCM2PTR (x))->elements[n]).word = (v)) #define SCM_GC_CELL_TYPE(x) (SCM_GC_CELL_OBJECT ((x), 0)) @@ -214,6 +219,7 @@ (SCM_GC_SET_CELL_OBJECT ((x), 1, (v))) +#define SCM_CELL_WORD_LOC(x, n) (&SCM_CELL_WORD ((x), (n))) #define SCM_CELL_OBJECT_LOC(x, n) (&SCM_CELL_OBJECT ((x), (n))) #define SCM_CARLOC(x) (SCM_CELL_OBJECT_LOC ((x), 0)) #define SCM_CDRLOC(x) (SCM_CELL_OBJECT_LOC ((x), 1)) Index: libguile/numbers.h =================================================================== RCS file: /cvsroot/guile/guile/guile-core/libguile/numbers.h,v retrieving revision 1.82 diff -u -b -r1.82 numbers.h --- libguile/numbers.h 15 May 2004 15:50:24 -0000 1.82 +++ libguile/numbers.h 15 May 2004 16:02:26 -0000 @@ -144,7 +144,7 @@ #define SCM_COMPLEX_IMAG(x) (SCM_COMPLEX_MEM (x)->imag) /* Each bignum is just an mpz_t stored in a double cell starting at word 1. */ -#define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_OBJECT_LOC((x),1)))) +#define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_WORD_LOC ((x), 1)))) #define SCM_BIGP(x) (!SCM_IMP (x) && SCM_TYP16 (x) == scm_tc16_big) #define SCM_NUMBERP(x) (SCM_INUMP(x) || SCM_NUMP(x)) _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-15 15:00 ` Dirk Herrmann 2004-05-15 16:42 ` Dirk Herrmann @ 2004-05-17 19:22 ` Marius Vollmer 2004-05-17 20:17 ` Paul Jarc 2004-05-21 19:37 ` Dirk Herrmann 1 sibling, 2 replies; 25+ messages in thread From: Marius Vollmer @ 2004-05-17 19:22 UTC (permalink / raw) Cc: guile-devel Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes: > I have not yet given it a try, but I found the suggestion to use a > union quite appaling: [ I think you mean "appealing". :-) I used to mix up the adjectives "content" and "contempt"... [ And its "appalling" with double-el. I hope you don't mind this little public correction. [ I think I can get away with it since I make a ton of mistakes myself... ] ] ] > [...] > > typedef struct scm_t_cell > { > union { > scm_t_bits word; > SCM object; > } elements[]; > } scm_t_cell; Yes, but consider how we use the heap: we fetch a word and then must decide whether it is a SCM or a scm_t_bits, we don't know this in advance in every case. This is not really supported by a union: I don't think you can store into one member and then (portably) assume anything about the value read from a different member. This is very much like storing into one memory location thru one pointer and reading that same location through a differently-typed pointer. I therefore don't think that using a union is an improvement. (It will work on all 'normal' platforms, of course, just like casting the pointers worked, but I would like to get away from relying on this non-portable behavior.) Thus, I think we are better off by just declaring the heap words to be of type SCM and always accessing them as this type. Converting between SCM and scm_t_bits will happen with SCM_PACK and SCM_UNPACK. That way, we don't need to assume that a SCM and a scm_t_bits are stored identically in memory. (And incidentally, we don't use scm_t_cell that much in Guile anyway. We should maybe remove it. I tried recently but didn't finish since it turned out to be quite some amount of work to get all size statistics in the GC correct.) -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-17 19:22 ` Marius Vollmer @ 2004-05-17 20:17 ` Paul Jarc 2004-05-21 19:37 ` Dirk Herrmann 1 sibling, 0 replies; 25+ messages in thread From: Paul Jarc @ 2004-05-17 20:17 UTC (permalink / raw) Cc: guile-devel Marius Vollmer <mvo@zagadka.de> wrote: > I don't think you can store into one member and then (portably) > assume anything about the value read from a different member. This > is very much like storing into one memory location thru one pointer > and reading that same location through a differently-typed pointer. True in general, although some cases are legal. Any type can be accessed as an array of unsigned char; two structure types with a common initial sequence of members can have those members accessed; the first element of a structure, or any element of a union, can be accessed directly through the element type instead of explicitly as a member of the structure or union. So, for example, if SCM is simply a struct containing scm_t_bits, then you can access that scm_t_bits value without the SCM wrapping, and given a union of scm_t_bits and SCM, you can access the scm_t_bits member, even if the SCM member was the last one assigned to. paul _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-17 19:22 ` Marius Vollmer 2004-05-17 20:17 ` Paul Jarc @ 2004-05-21 19:37 ` Dirk Herrmann 2004-05-21 20:30 ` Paul Jarc 1 sibling, 1 reply; 25+ messages in thread From: Dirk Herrmann @ 2004-05-21 19:37 UTC (permalink / raw) Cc: guile-devel Marius Vollmer wrote: > Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes: > > > I have not yet given it a try, but I found the suggestion to use a > > union quite appaling: > > [ I think you mean "appealing". :-) I used to mix up the adjectives > "content" and "contempt"... [ And its "appalling" with double-el. I > hope you don't mind this little public correction. [ I think I can > get away with it since I make a ton of mistakes myself... ] ] ] :-) No problem, thanks for the hint. In fact, I did not even know the word "appalling". Reminds me of a situation when I read the word "hostile" and thought it came from "host". Think about someone thanking his host for their "hostility" :-) > > [...] > > > > typedef struct scm_t_cell { union { scm_t_bits word; SCM object; } > > elements[]; } scm_t_cell; > > Yes, but consider how we use the heap: we fetch a word and then must > decide whether it is a SCM or a scm_t_bits, we don't know this in > advance in every case. This is not really supported by a union: I > don't think you can store into one member and then (portably) assume > anything about the value read from a different member. This is very > much like storing into one memory location thru one pointer and > reading that same location through a differently-typed pointer. I > therefore don't think that using a union is an improvement. I don't see a problem here: The rule is, if you don't know better in advance, always access your memory as a scm_t_bits variable. This is exactly the way we determine, whether a cell really holds a pair: As long as it is just a cell, we check the bits. Only if we know its a pair, we dare to access it as a pair of SCM values. > Thus, I think we are better off by just declaring the heap words to > be of type SCM and always accessing them as this type. Converting > between SCM and scm_t_bits will happen with SCM_PACK and SCM_UNPACK. > That way, we don't need to assume that a SCM and a scm_t_bits are > stored identically in memory. Then, again, we have to rather stay on the safe side and assume to have only scm_t_bits variables on the heap: If a variable of type SCM and a variable of type scm_t_bits would _really_ look different, then the heap _must_ hold elements of type scm_t_bits, since all non-pair objects can store arbitrary data in their cells. Thus, in such a case accessing the heap via SCM pointers would be plain wrong. However, I would not be too restrictive: I don't think that the distinction between SCM and scm_t_bits should go in the direction that SCM and scm_t_bits might be represented in completely different ways: It was introduced as a means to provide better type checking in guile. On that way it brought (almost coincidentally) a nice distinction between code that operates on higher levels and code that doesn't. The fact that some code does not yet use that abstraction barrier correctly (it may be that this is the case for scm_mark_locations, which you gave as an example) could also mean that this code needs to be fixed. Another, more general note: The whole discussion only came up since there are places in guile or in client code where people want to access the heap via pointers. Before we adapt one of our central structures for such uses, we should first think, whether that usage is correct or not. In the context of generational gc, I think we should be very careful about such uses. Let's rather try to get rid of such code, and encourage users to do the same. Note that, ..._WORD_LOC write accesses may be perfectly safe: If the data that is being pointed to does not hold scheme objects and also no other data that introduces gc-relevant dependencies, you can safely write to the heap in this way. The acess in numbers.h that I modified in my patch for example is no problem: The heap holds only references to gmp-data, no references back into the heap. On the contrary, ..._OBJECT_LOC write accesses are always a problem with respect to generational gc. Best regards Dirk _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-21 19:37 ` Dirk Herrmann @ 2004-05-21 20:30 ` Paul Jarc 2004-05-22 6:48 ` Dirk Herrmann 0 siblings, 1 reply; 25+ messages in thread From: Paul Jarc @ 2004-05-21 20:30 UTC (permalink / raw) Cc: guile-devel, Marius Vollmer Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> wrote: > As long as it is just a cell, we check the bits. Only if we know its > a pair, we dare to access it as a pair of SCM values. But according to the C standard, we shouldn't dare to access it as scm_t_bits either. unsigned char is the only universally safe aliasing access type. If it wouldn't be too much trouble to access the type bits through unsigned char, then we could store SCM values on the heap and work with pointers to SCM (even if only for reading, not writing). We would use the type bits to decide whether to access a given word/cell as SCM or scm_t_bits. (It'd be nice for smobs if void* could be another option there.) That would conform to the C standard. paul _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-21 20:30 ` Paul Jarc @ 2004-05-22 6:48 ` Dirk Herrmann 2004-05-23 15:03 ` Paul Jarc 2004-08-09 21:09 ` Marius Vollmer 0 siblings, 2 replies; 25+ messages in thread From: Dirk Herrmann @ 2004-05-22 6:48 UTC (permalink / raw) Cc: guile-devel, Marius Vollmer Paul Jarc wrote: > Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> wrote: > > > As long as it is just a cell, we check the bits. Only if we know > > its a pair, we dare to access it as a pair of SCM values. > > But according to the C standard, we shouldn't dare to access it as > scm_t_bits either. unsigned char is the only universally safe > aliasing access type. This is highly academic. Within guile there are so many places where we perform operations that are not defined according to the C standard: conservative garbage collection, copying stacks for continuations, assembly code, ... Certainly, the way we convert between scm_t_bits and SCM is implementation dependent. However, the definitions for scm_t_bits and SCM are IMO a very good way to provide an abstraction of some of this uncleanlyness. And, with today's definitions of scm_t_bits and SCM, the heap _must_ hold scm_t_bits variables. Please explain, why you think that it is cleaner to say it only holds scheme objects if in fact it does not. I am still missing the background for the whole discussion: The standard says that the result of casting pointers to scm_t_bits to pointers to SCM gives implementation dependent results, right? Or does the standard explicitly forbid such casts? What is the reason to change a paradigm, which has for several years worked quite nicely, is easily understood, and has helped to find and probably also to avoid a bunch of errors? Is there any _real_ reason that any of you is facing, like that guile does not compile or work on one of your machines? Best regards Dirk _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-22 6:48 ` Dirk Herrmann @ 2004-05-23 15:03 ` Paul Jarc 2004-08-09 21:09 ` Marius Vollmer 1 sibling, 0 replies; 25+ messages in thread From: Paul Jarc @ 2004-05-23 15:03 UTC (permalink / raw) Cc: Marius Vollmer, guile-devel Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> wrote: > This is highly academic. Within guile there are so many places where > we perform operations that are not defined according to the C > standard: conservative garbage collection, copying stacks for > continuations, assembly code, ... Well, my impression is that Marius wants to get rid of at least this particular bit of nonconformance. Marius? > Please explain, why you think that it is cleaner to say it only holds > scheme objects if in fact it does not. That's not exactly it. The words that hold Scheme values would be accessed using the SCM type, instead of coneverting to/from scm_t_bits. Other words, holding non-Scheme values, would still be accessed as scm_t_bits. (AIUI.) > The standard says that the result of casting pointers to scm_t_bits > to pointers to SCM gives implementation dependent results, right? Or > does the standard explicitly forbid such casts? Casting the pointer is ok in itself (well, unless there is an alignment mismatch). Accessing the pointed-to memory gives undefined behvaior if there isn't actually an object there of the type you're using to access it. If SCM were defined as a struct containing scm_t_bits, then it would certainly be legal to access an SCM object via scm_t_bits*. > What is the reason to change a paradigm, which has for several > years worked quite nicely, is easily understood, and has helped to > find and probably also to avoid a bunch of errors? Is there any > _real_ reason that any of you is facing, like that guile does not > compile or work on one of your machines? I think Marius is worried that there may be systems where the bits of an SCM value are different from the bits of the corresponding scm_t_bits value. I don't know whether any have actually been found. paul _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-05-22 6:48 ` Dirk Herrmann 2004-05-23 15:03 ` Paul Jarc @ 2004-08-09 21:09 ` Marius Vollmer 2004-08-20 19:17 ` Dirk Herrmann 1 sibling, 1 reply; 25+ messages in thread From: Marius Vollmer @ 2004-08-09 21:09 UTC (permalink / raw) Cc: Paul Jarc, guile-devel Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes: > Certainly, the way we convert between scm_t_bits and SCM is > implementation dependent. However, the definitions for scm_t_bits > and SCM are IMO a very good way to provide an abstraction of some of > this uncleanlyness. And, with today's definitions of scm_t_bits and > SCM, the heap _must_ hold scm_t_bits variables. Please explain, why > you think that it is cleaner to say it only holds scheme objects if > in fact it does not. The reason is that there exits code that does essentially this: scm_t_bits heap_field; SCM value = whatever (); SCM *ptr = (SCM *)&heap_field; *ptr = value; This is quite unclean. This variant is cleaner and standards conformant: SCM heap_field; SCM value = whatever (); SCM *ptr = &heap_field; *ptr = value; > What is the reason to change a paradigm, which has for several years > worked quite nicely, is easily understood, and has helped to find > and probably also to avoid a bunch of errors? I don't think that the paradigm has changed fundamentally. It has been strengthened, if you will. The distinction between scm_t_bits and SCM is still there. We don't just cast between SCM and scm_t_bits, we use SCM_PACK and SCM_UNPACK. Except sometimes a scm_t_bits variable is stored into via a SCM* pointer, totally ruining the care PACk/UNPACK abstraction. That exception has now been removed. I see that as an unconditional improvement, don't you? -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-08-09 21:09 ` Marius Vollmer @ 2004-08-20 19:17 ` Dirk Herrmann 2004-08-21 16:16 ` Marius Vollmer 0 siblings, 1 reply; 25+ messages in thread From: Dirk Herrmann @ 2004-08-20 19:17 UTC (permalink / raw) Cc: Paul Jarc, guile-devel [-- Attachment #1: Type: text/plain, Size: 3790 bytes --] Marius Vollmer wrote: > Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes: > > > Certainly, the way we convert between scm_t_bits and SCM is > > implementation dependent. However, the definitions for scm_t_bits > > and SCM are IMO a very good way to provide an abstraction of some > > of this uncleanlyness. And, with today's definitions of scm_t_bits > > and SCM, the heap _must_ hold scm_t_bits variables. Please > > explain, why you think that it is cleaner to say it only holds > > scheme objects if in fact it does not. > > The reason is that there exits code that does essentially this: > > scm_t_bits heap_field; > > SCM value = whatever (); SCM *ptr = (SCM *)&heap_field; *ptr = value; I assume that you mean that heap_field is actually an element of the heap. We already had the discussion that I suggest to discourage this style of coding since it violates a potential write barrier and will lead to problems if we ever switch to a generational garbage collection. Despite of this discussion, you nevertheless seem to have decided that you do not want to discourage this style of coding. That's OK, it's perfectly fair to make that decision if you are aware of the consequences. I would just prefer if such decisions were explicitly stated, in order to avoid confusion for both guile developers and users. Please consider that, as long as such a decision is not official, people who have ever followed discussions about the implications of such coding on generational gc might put some effort into avoiding such code. We can well spare them and ourselves this effort then. > > What is the reason to change a paradigm, which has for several > > years worked quite nicely, is easily understood, and has helped to > > find and probably also to avoid a bunch of errors? > > I don't think that the paradigm has changed fundamentally. It has > been strengthened, if you will. The distinction between scm_t_bits > and SCM is still there. > > We don't just cast between SCM and scm_t_bits, we use SCM_PACK and > SCM_UNPACK. Except sometimes a scm_t_bits variable is stored into > via a SCM* pointer, totally ruining the care PACk/UNPACK abstraction. > That exception has now been removed. I see that as an unconditional > improvement, don't you? As said above, I accept if you decide to allow such a coding style. Then, you are right, people should have the option to have SCM pointers into the heap. I just doubt that the current solution is elegant. On the contrary: I think that it introduces an uncleanlyness for exactly the other type of scenario, namely if someone needs to have a scm_t_bits pointer into the heap. In particular, I have a problem with the following lines of code. In gc.h: #define SCM_GC_CELL_WORD(x, n) (SCM_UNPACK (SCM_GC_CELL_OBJECT ((x), (n)))) This expression has a SCM value as an intermediate result, which is definitely unclean, since the SCM value might (in contrast to the definition of SCM) not represent a valid scheme object. In numbers.h: #define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_OBJECT_LOC((x),1)))) This expression has a SCM* as an intermediate result, although in this case we _know_ that we are actually pointing to a scm_t_bits value. My suggestion is just, to remove this uncleanlyness by going one step further, namely to make scm_t_cell a union of scm_t_bits and SCM values. IMO, this is the most elegant approach anyway, since it reflects the actual situation best. I enclose a patch to accomplish this. Please consider the patch and, if you like, comment on it. I had it presented before, but if I remember correctly, there has not been a definite decision on it. Thus, I would just go ahead and apply it within the next couple of days. Best regards, Dirk [-- Attachment #2: diff --] [-- Type: text/plain, Size: 6278 bytes --] Index: guile-core/NEWS =================================================================== RCS file: /cvsroot/guile/guile/guile-core/NEWS,v retrieving revision 1.438 diff -u -r1.438 NEWS --- guile-core/NEWS 19 Aug 2004 17:54:37 -0000 1.438 +++ guile-core/NEWS 20 Aug 2004 17:44:57 -0000 @@ -744,16 +744,6 @@ SCM_SYMBOL_HASH -> scm_hashq SCM_SYMBOL_INTERNED_P -> scm_symbol_interned_p -** SCM_CELL_WORD_LOC has been deprecated. - -Use the new macro SCM_CELL_OBJECT_LOC instead, which return a pointer -to a SCM, as opposed to a pointer to a scm_t_bits. - -This was done to allow the correct use of pointers into the Scheme -heap. Previously, the heap words were of type scm_t_bits and local -variables and function arguments were of type SCM, making it -non-standards-conformant to have a pointer that can point to both. - ** New macros SCM_SMOB_DATA_2, SCM_SMOB_DATA_3, etc. These macros should be used instead of SCM_CELL_WORD_2/3 to access the Index: guile-core/libguile/ChangeLog =================================================================== RCS file: /cvsroot/guile/guile/guile-core/libguile/ChangeLog,v retrieving revision 1.2128 diff -u -r1.2128 ChangeLog --- guile-core/libguile/ChangeLog 20 Aug 2004 13:33:39 -0000 1.2128 +++ guile-core/libguile/ChangeLog 20 Aug 2004 17:45:05 -0000 @@ -1,3 +1,21 @@ +2004-05-24 Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> + + * deprecated.h (SCM_CELL_WORD_LOC): Un-deprecated. + + * gc.h (scm_t_cell): Redefined to hold a union, since each cell + element either holds a scm_t_bits value or a SCM value. + + (SCM_GC_CARD_BVEC, SCM_GC_SET_CARD_BVEC, SCM_GC_GET_CARD_FLAGS, + SCM_GC_SET_CARD_FLAGS, SCM_GC_CELL_OBJECT, SCM_GC_CELL_WORD, + SCM_GC_SET_CELL_OBJECT, SCM_GC_SET_CELL_WORD): Modified to work + with the new scm_t_cell. + + (SCM_CELL_WORD_LOC): Un-deprecated. + + * numbers.h (SCM_I_BIG_MPZ): Use SCM_CELL_WORD_LOC instead of + SCM_CELL_OBJECT_LOC, since we are not dealing with scheme objects + here. + 2004-08-20 Marius Vollmer <marius.vollmer@uni-dortmund.de> * eval.c (scm_lookupcar1): Report "Variable used before given a Index: guile-core/libguile/deprecated.h =================================================================== RCS file: /cvsroot/guile/guile/guile-core/libguile/deprecated.h,v retrieving revision 1.29 diff -u -r1.29 deprecated.h --- guile-core/libguile/deprecated.h 19 Aug 2004 16:49:42 -0000 1.29 +++ guile-core/libguile/deprecated.h 20 Aug 2004 17:45:05 -0000 @@ -291,11 +291,6 @@ #define SCM_VALIDATE_OPDIR(pos, port) SCM_MAKE_VALIDATE (pos, port, OPDIRP) -/* Deprecated because we can not safely cast a SCM* to a scm_t_bits* - */ - -#define SCM_CELL_WORD_LOC(x, n) ((scm_t_bits*)SCM_CELL_OBJECT_LOC((x),(n))) - /* Users shouldn't know about INUMs. */ Index: guile-core/libguile/gc.h =================================================================== RCS file: /cvsroot/guile/guile/guile-core/libguile/gc.h,v retrieving revision 1.118 diff -u -r1.118 gc.h --- guile-core/libguile/gc.h 19 Aug 2004 16:48:37 -0000 1.118 +++ guile-core/libguile/gc.h 20 Aug 2004 17:45:05 -0000 @@ -36,8 +36,10 @@ typedef struct scm_t_cell { - SCM word_0; - SCM word_1; + union { + scm_t_bits word; + SCM object; + } elements[2]; } scm_t_cell; /* @@ -70,15 +72,17 @@ #define SCM_GC_CARD_N_HEADER_CELLS 1 -#define SCM_GC_CARD_N_CELLS 256 -#define SCM_GC_SIZEOF_CARD SCM_GC_CARD_N_CELLS * sizeof (scm_t_cell) +#define SCM_GC_CARD_N_CELLS 256 +#define SCM_GC_SIZEOF_CARD SCM_GC_CARD_N_CELLS * sizeof (scm_t_cell) -#define SCM_GC_CARD_BVEC(card) ((scm_t_c_bvec_long *) ((card)->word_0)) +#define SCM_GC_CARD_BVEC(card) \ + ((scm_t_c_bvec_long *) ((card)->elements[0].word)) #define SCM_GC_SET_CARD_BVEC(card, bvec) \ - ((card)->word_0 = (SCM) (bvec)) -#define SCM_GC_GET_CARD_FLAGS(card) ((long) ((card)->word_1)) + ((card)->elements[0].word = (scm_t_bits) (bvec)) +#define SCM_GC_GET_CARD_FLAGS(card) \ + ((long) ((card)->elements[1].word)) #define SCM_GC_SET_CARD_FLAGS(card, flags) \ - ((card)->word_1 = (SCM) (flags)) + ((card)->elements[1].word = (flags)) #define SCM_GC_GET_CARD_FLAG(card, shift) \ (SCM_GC_GET_CARD_FLAGS (card) & (1L << (shift))) @@ -141,12 +145,13 @@ * in debug mode. In particular these macros will even work for free cells, * which should never be encountered by user code. */ -#define SCM_GC_CELL_OBJECT(x, n) (((SCM *)SCM2PTR (x)) [n]) -#define SCM_GC_CELL_WORD(x, n) (SCM_UNPACK (SCM_GC_CELL_OBJECT ((x), (n)))) +#define SCM_GC_CELL_WORD(x, n) (((SCM2PTR (x))->elements[n]).word) +#define SCM_GC_CELL_OBJECT(x, n) (((SCM2PTR (x))->elements[n]).object) -#define SCM_GC_SET_CELL_OBJECT(x, n, v) ((((SCM *)SCM2PTR (x)) [n]) = (v)) +#define SCM_GC_SET_CELL_OBJECT(x, n, v) \ + (((SCM2PTR (x))->elements[n]).object = (v)) #define SCM_GC_SET_CELL_WORD(x, n, v) \ - (SCM_GC_SET_CELL_OBJECT ((x), (n), SCM_PACK (v))) + (((SCM2PTR (x))->elements[n]).word = (v)) #define SCM_GC_CELL_TYPE(x) (SCM_GC_CELL_OBJECT ((x), 0)) @@ -214,7 +219,10 @@ (SCM_GC_SET_CELL_OBJECT ((x), 1, (v))) -#define SCM_CELL_OBJECT_LOC(x, n) (SCM_VALIDATE_CELL((x), &SCM_GC_CELL_OBJECT ((x), (n)))) +#define SCM_CELL_WORD_LOC(x, n) \ + (SCM_VALIDATE_CELL((x), &SCM_GC_CELL_WORD ((x), (n)))) +#define SCM_CELL_OBJECT_LOC(x, n) \ + (SCM_VALIDATE_CELL((x), &SCM_CELL_OBJECT ((x), (n)))) #define SCM_CARLOC(x) (SCM_CELL_OBJECT_LOC ((x), 0)) #define SCM_CDRLOC(x) (SCM_CELL_OBJECT_LOC ((x), 1)) Index: guile-core/libguile/numbers.h =================================================================== RCS file: /cvsroot/guile/guile/guile-core/libguile/numbers.h,v retrieving revision 1.93 diff -u -r1.93 numbers.h --- guile-core/libguile/numbers.h 9 Aug 2004 23:32:14 -0000 1.93 +++ guile-core/libguile/numbers.h 20 Aug 2004 17:45:05 -0000 @@ -142,7 +142,7 @@ #define SCM_COMPLEX_IMAG(x) (SCM_COMPLEX_MEM (x)->imag) /* Each bignum is just an mpz_t stored in a double cell starting at word 1. */ -#define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_OBJECT_LOC((x),1)))) +#define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_WORD_LOC ((x), 1)))) #define SCM_BIGP(x) (!SCM_IMP (x) && SCM_TYP16 (x) == scm_tc16_big) #define SCM_NUMBERP(x) (SCM_I_INUMP(x) || SCM_NUMP(x)) [-- Attachment #3: Type: text/plain, Size: 143 bytes --] _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-08-20 19:17 ` Dirk Herrmann @ 2004-08-21 16:16 ` Marius Vollmer 2004-10-03 9:09 ` Dirk Herrmann 0 siblings, 1 reply; 25+ messages in thread From: Marius Vollmer @ 2004-08-21 16:16 UTC (permalink / raw) Cc: Paul Jarc, guile-devel Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes: >> The reason is that there exits code that does essentially this: >> >> scm_t_bits heap_field; >> >> SCM value = whatever (); SCM *ptr = (SCM *)&heap_field; *ptr = value; > > I assume that you mean that heap_field is actually an element of the heap. Yes. > We already had the discussion that I suggest to discourage this > style of coding since it violates a potential write barrier and will > lead to problems if we ever switch to a generational garbage > collection. Yes, that is the bigger issue. What we are discussing here are quite minor points, I'd say. There might be a time when we do want to have a write-barrier and then we can revisit whether to provide the *LOC accessors or not. Right now, removing them is not necessary. We should only remove them when there is an immediate benefit. > In particular, I have a problem with the following lines of code. > > In gc.h: > > #define SCM_GC_CELL_WORD(x, n) (SCM_UNPACK (SCM_GC_CELL_OBJECT > ((x), (n)))) > > This expression has a SCM value as an intermediate result, which > is definitely unclean, since the SCM value might (in contrast to the > definition of SCM) not represent a valid scheme object. Yes, that troubles me also a bit. But I get over it by realizing that we only really have one type, the type 'machine word', and SCM and scm_t_bits are essentially this same type, used to provide markup for different uses of the basic type 'machine word'. (In my view, it is essential that Scheme values are represented as a machine word. Using some other type that doesn't fit into a machine register, for example, would not be good enough.) As far as the ordinary user is concerned, we only have one type to represent a Scheme value, SCM. We don't say what a SCM is (whether it is a pointer, an integer, a struct, etc), only that you can assign it with '='. The internals of Guile, and unfortunately also a user that works with smobs, need to know more about SCM: that it really is a machine word and can be treated as an integral type. To treat it as such, a SCM is reinterpreted as a scm_t_bits. I think we need to make the following guarantees: - a SCM and a scm_t_bits have the same size in the sense that they can store exactly the same things. We always have SCM scm; scm_is_eq (SCM_PACK (SCM_UNPACK (scm)), scm) and scm_t_bits bits; SCM_UNPACK (SCM_PACK (bits)) == bits (*) - a size_t can be cast to scm_t_bits and back without losing information. (This is for storing integers in heap words.) - a void* can be cast to scm_t_bits and back without losing information. (This is for storing pointers in heap words.) - a scm_t_bits can be cast to void* and back without losing information. (This is for storing SCMs in void* locations provided by external code.) This is not as elegant and clean as dropping the guarantee (*), but it allows heap words to be declared as type SCM which is desirable since local variables and function arguments are also declared to be of type SCM. The reason that SCM is distinct from scm_t_bits at all is to get some help from the C compiler in type checking. > In numbers.h: > > #define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_OBJECT_LOC((x),1)))) > > This expression has a SCM* as an intermediate result, although in > this case we _know_ that we are actually pointing to a scm_t_bits > value. No, we point at an array of three SCMs... ;) This is actually a separate issue: the memory used by SCM_I_BIG_MPZ is always used as only one type, as an mpz_t. The reason that I changed all heap words to be declared as SCM was that previously some heap words would be written as a SCM and then read as a scm_t_bits. This is also the reason why I think that a union does not help at all: with such a union, we would write into one member and then read from the other. This is just as unclean as casting a pointer to scm_t_bits to a pointer to SCM. > Thus, I would just go ahead and apply it within the next couple of > days. Please do not apply it. We are not completely clean, true, but I doubt that we can attain perfect cleanliness anyway. Using a union would just complicate the issue without giving any benefit (that I could see). Things started out simple, and got more complicated with the introduction of scm_t_bits as an alias of SCM. Let's not continue this trend by pretending that SCM and scm_t_bits are actually separate types. They are not, they are the same type essentially, but one allows certain low-level operations that the other prevents. -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-08-21 16:16 ` Marius Vollmer @ 2004-10-03 9:09 ` Dirk Herrmann 2004-10-04 14:12 ` Marius Vollmer 0 siblings, 1 reply; 25+ messages in thread From: Dirk Herrmann @ 2004-10-03 9:09 UTC (permalink / raw) Cc: guile-devel Marius Vollmer wrote: > Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes: > > >> The reason is that there exits code that does essentially this: > >> > >> scm_t_bits heap_field; > >> > >> SCM value = whatever (); SCM *ptr = (SCM*)&heap_field; *ptr = > >> value; > > > > I assume that you mean that heap_field is actually an element of > > the heap. > > Yes. > > > We already had the discussion that I suggest to discourage this > > style of coding since it violates a potential write barrier and > > will lead to problems if we ever switch to a generational garbage > > collection. > > Yes, that is the bigger issue. What we are discussing here are quite > minor points, I'd say. There might be a time when we do want to > have a write-barrier and then we can revisit whether to provide the > *LOC accessors or not. Right now, removing them is not necessary. > We should only remove them when there is an immediate benefit. This did not answer my question from my previous mail, where I asked you for a clear statement about whether this style of coding is to be considered discouraged: Is there a strategy towards a generational gc, or not? Shall new code make use of the *LOC accessors or shall developers avoid it? > I think we need to make the following guarantees: > > - a SCM and a scm_t_bits have the same size in the sense that they > can store exactly the same things. We always have > > SCM scm; scm_is_eq (SCM_PACK (SCM_UNPACK (scm)), scm) > > and > > scm_t_bits bits; SCM_UNPACK (SCM_PACK (bits)) == bits > (*) > > - a size_t can be cast to scm_t_bits and back without losing > information. (This is for storing integers in heap words.) > > - a void* can be cast to scm_t_bits and back without losing > information. (This is for storing pointers in heap words.) > > - a scm_t_bits can be cast to void* and back without losing > information. (This is for storing SCMs in void* locations provided > by external code.) > > This is not as elegant and clean as dropping the guarantee (*), but > it allows heap words to be declared as type SCM which is desirable > since local variables and function arguments are also declared to be > of type SCM. I am not quite sure, what you really want to state with (*): A) For a variable "bits" of type "scm_t_bits" that holds a representation of a SCM value the expression "SCM_UNPACK (SCM_PACK (bits)) == bits" is guaranteed to evaluate to true. B) For any variable "bits" of type "scm_t_bits" the expression "SCM_UNPACK (SCM_PACK (bits)) == bits" is guaranteed to evaluate to true. If you only meant to say A), then I agree that we should make the above guarantees, including (*), if it was more accurately formulated. If you meant to say B), then I disagree. It is one thing if Guile internally makes use of the fact that things happen to work that way (like you have done with your modification to scm_t_cell). It is another thing to make this an official guarantee. > Please do not apply it. We are not completely clean, true, but I > doubt that we can attain perfect cleanliness anyway. Using a union > would just complicate the issue without giving any benefit (that I > could see). The effects of the patch are quite local. No other line of code in guile apart from the ones seen in the patch have to be modified. Benefits are: - Ease of documentation, since the current implementation needs to be documented in gc.h as a local uncleanlyness of the otherwise quite consequently used distinction between SCM and scm_t_bits. Maybe I am assuming too much here, but you had planned to document this, right? :-) - Possibility to modify SCM_PACK such that it does an assertion check on the value before actually converting it. > Things started out simple, and got more complicated with the > introduction of scm_t_bits as an alias of SCM. Let's not continue > this trend by pretending that SCM and scm_t_bits are actually > separate types. They are not, they are the same type essentially, > but one allows certain low-level operations that the other prevents. I think you are confusing a type as a collection of values, and the underlying representation. The type SCM is distinct from the type scm_t_bits, since only a subset of the values of type scm_t_bits can be used to create a valid SCM. It is directly comparable to having a type "natural numbers" implemented using a C unsigned int, and a type "even natural numbers", also implemented as a C unsigned int. Both have the same underlying representation, but are different types. Despite the poor type checking offered by standard C, by using tools like splint it is even possible to get warnings about undesired mixes between different typedefs in C code. Best regards, Dirk _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: The relationship between SCM and scm_t_bits. 2004-10-03 9:09 ` Dirk Herrmann @ 2004-10-04 14:12 ` Marius Vollmer 0 siblings, 0 replies; 25+ messages in thread From: Marius Vollmer @ 2004-10-04 14:12 UTC (permalink / raw) Cc: guile-devel Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes: > This did not answer my question from my previous mail, where I asked > you for a clear statement about whether this style of coding is to be > considered discouraged: Is there a strategy towards a generational gc, > or not? Shall new code make use of the *LOC accessors or shall > developers avoid it? There is no strategy towards a generational GC or any other scheme that requires a write-barrier. I'd say we could 'discourage' the *LOC accessors. We then need to have Guile itself stop using them. (See NEWS for an explanation of 'discouraged'.) I will answer the rest later. I really don't want to put much more thought into this, anyway. To me, SCM and scm_t_bits are the same type (the machine word), only that SCM is rigged to trick the compiler into not allowing certain operations that we don't want to allow. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2004-10-04 14:12 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer 2004-05-03 16:10 ` Marius Vollmer 2004-05-03 16:21 ` Paul Jarc 2004-05-04 13:53 ` Marius Vollmer 2004-05-04 17:16 ` Paul Jarc 2004-05-04 17:49 ` Marius Vollmer 2004-05-04 18:35 ` Paul Jarc 2004-05-05 10:00 ` Marius Vollmer 2004-05-05 14:58 ` Paul Jarc 2004-05-10 13:42 ` Marius Vollmer 2004-05-15 7:31 ` Dirk Herrmann 2004-05-17 18:09 ` Marius Vollmer 2004-05-15 15:00 ` Dirk Herrmann 2004-05-15 16:42 ` Dirk Herrmann 2004-05-17 19:22 ` Marius Vollmer 2004-05-17 20:17 ` Paul Jarc 2004-05-21 19:37 ` Dirk Herrmann 2004-05-21 20:30 ` Paul Jarc 2004-05-22 6:48 ` Dirk Herrmann 2004-05-23 15:03 ` Paul Jarc 2004-08-09 21:09 ` Marius Vollmer 2004-08-20 19:17 ` Dirk Herrmann 2004-08-21 16:16 ` Marius Vollmer 2004-10-03 9:09 ` Dirk Herrmann 2004-10-04 14:12 ` Marius Vollmer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).