unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* The relationship between SCM and scm_t_bits.
@ 2004-05-03 15:06 Marius Vollmer
  2004-05-03 16:10 ` Marius Vollmer
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Marius Vollmer @ 2004-05-03 15:06 UTC (permalink / raw)


Hi,

I just got confused about our two 'fundamental' types, SCM and
scm_t_bits.

Variables and function arguments are declared to be of type SCM, while
the words in a cell are of type scm_t_bits.  This results in problems
when dealing with pointers: we can not cleanly cast a pointer to SCM
to a pointer to scm_t_bits, but we might want to do so.

For example, consider a list that is pointed to by a global variable
and some fairly standard way of dealing with singly-linked lists in C:

  SCM head;

  void
  delete_some ()
  {
    SCM *node_ptr = &head;
    if (should_delete (*node_ptr))
      *node_ptr = SCM_CDR (*node_ptr);
    else
      node_ptr = SCM_CDRLOC (*node_ptr);
  }

What should the definition of SCM_CDRLOC be?  Right now it is:

  #define SCM_CDRLOC(x) ((SCM *) SCM_CELL_WORD_LOC ((x), 1))

I.e., it casts a pointer to scm_t_bits to a pointer to SCM.  That,
however, breaks the SCM_PACK/SCM_UNPACK abstraction.  A scm_t_bits
value can only be converted to a SCM value via SCM_PACK, but by using
SCM_CDRLOC, you can sidestep this conversion.

Luckily, code like this works on the platforms that Guile is used on,
but it still is quite unclean, I'd say.

For example, scm_t_bits might be larger than SCM, or SCM_PACK and
SCM_UNPACK might one day need to become non-trivial on certain
platforms (like they were on Crays, I think.)

When the encodings of SCM and scm_t_bits do indeed differ, we should
allow only one of them to be the canonical encoding that is recognized
by the garbage collector.  Right now, this is the SCM encoding (since
scm_mark_locations uses a pointer to SCM to read the stack, etc.)


I propose to remove the need to convert between scm_t_bits* and SCM*
and to allow only SCMs to be in memory.


The words in a scm_t_cell would be of type SCM.  This would mean that
SCM_CELL_WORD_LOC would be removed and replaced with
SCM_CELL_OBJECT_LOC.  Also, SCM_SET_SMOB_DATA (etc) might not be able
to store all scm_t_bits values that it is handed (because scm_t_bits
could be larger than a pointer).  We could make a new guarantee that
says that SCM_SET_SMOB_DATA (etc) can store any pointer that is cast
to a scm_t_bits and any integer that fits into 'unsigned int', say.

The type scm_t_bits would be restricted to temporary values that are
mostly used to test tag bits etc.  They would usually not stored in
data structures and when they are, they can not be expected to
protected the SCM value that they encode when they are scanned
conservatively.

Should we (gradually and with deprecation and everyhing) remove
scm_t_bits from the smob API completely?  I have not thought this
thru, but we might and with something that is not really an
improvement, just different.


Opinions?


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer
@ 2004-05-03 16:10 ` Marius Vollmer
  2004-05-03 16:21 ` Paul Jarc
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Marius Vollmer @ 2004-05-03 16:10 UTC (permalink / raw)


Marius Vollmer <marius.vollmer@uni-dortmund.de> writes:

> [...] SCM_PACK and SCM_UNPACK might one day need to become
> non-trivial on certain platforms (like they were on Crays, I think.)

Ok, I just (re-)discovered that this is a non-issue: these encoding
changes are done by SCM2PTR and PTR2SCM.


Right now, I have a version of Guile that doesn't store scm_t_bits
into the heap, and it does feel a cleaner since no scm_t_bits* <->
SCM* casts are done.

Now, I'll try to completely remove the use of the scm_t_cell type
(only the use, not the definition).  This one is rather confusing as
well since it is not used consequently (I think).


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer
  2004-05-03 16:10 ` Marius Vollmer
@ 2004-05-03 16:21 ` Paul Jarc
  2004-05-04 13:53 ` Marius Vollmer
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Paul Jarc @ 2004-05-03 16:21 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote:
> For example, consider a list that is pointed to by a global variable
> and some fairly standard way of dealing with singly-linked lists in C:

I guess the simplest solution would be "just don't do that".  (I.e.,
the C code would use an scm_t_bits* variable with
SCM_PACK/SCM_UNPACK instead of using SCM directly.)

> Luckily, code like this works on the platforms that Guile is used on,
> but it still is quite unclean, I'd say.

I agree.

> We could make a new guarantee that says that SCM_SET_SMOB_DATA (etc)
> can store any pointer that is cast to a scm_t_bits and any integer
> that fits into 'unsigned int', say.

That would be enough for my code, I think.  Of course it should remain
source-compatible, too:
#define SCM_SET_SMOB_DATA(smob, bits) SCM_SET_SMOB_OBJECT(smob, SCM_PACK(bits))

> Should we (gradually and with deprecation and everyhing) remove
> scm_t_bits from the smob API completely?  I have not thought this
> thru, but we might and with something that is not really an
> improvement, just different.

I think it would be an improvement, in conceptual cleanliness if
nothing else.  It'll make the C interface that much easier to work
with.


paul


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer
  2004-05-03 16:10 ` Marius Vollmer
  2004-05-03 16:21 ` Paul Jarc
@ 2004-05-04 13:53 ` Marius Vollmer
  2004-05-04 17:16   ` Paul Jarc
  2004-05-10 13:42 ` Marius Vollmer
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Marius Vollmer @ 2004-05-04 13:53 UTC (permalink / raw)


Marius Vollmer <marius.vollmer@uni-dortmund.de> writes:

> Hi,
>
> I just got confused about our two 'fundamental' types, SCM and
> scm_t_bits.

[More random thoughts follow...]

Right now, the documented distinction between scm_t_bits and SCM is
that...

     A variable of type `SCM' is guaranteed to hold a valid Scheme
  object.  A variable of type `scm_t_bits', on the other hand, may
  hold a representation of a `SCM' value as a C integral type, but may
  also hold any C value, even if it does not correspond to a valid
  Scheme object.

This is a useful distinction, no doubt.  However, it creates the
situation where user variables have a different type as the cell slots
in the heap.

Now, the distinction between scm_t_bits and SCM is only then practical
when converting between them has zero cost.  SCM_PACK and SCM_UNPACK
can really only be casts that reinterpret the bits.  If this would not
be the case, we would have to remove the distinction to get
performance back.

Thus, scm_t_bits and SCM can be pretty much identical and we can allow
the casting of pointers to them, too.

Better yet, we provide some markup that does the casting for the user.

That is, I now propose to _add_ SCM_CELL_OBJECT_LOC and to _keep_
SCM_CELL_WORD_LOC.  That way, we take over responsibility for getting
the casting right.

Still, I'd want to change the type of fields in the heap from
scm_t_bits to SCM, just so that pointers to SCM are guaranteed to work
across the heap and user variables.  Pointers to scm_t_bits might
still fail on strange platforms and we might then consider removing
SCM_CELL_WORD_LOC on those platforms.

Of course, this whole tedious ponderings can be avoided, as Paul
points out, by just disallowing pointers to SCM or to scm_t_bits
entirely.  But we would need a very good reason for this: using
pointers the way delete_some does is completely reasonable right now.

One reason to disallow pointers would be that a precise garbage
collector that moves objects around can not easily deal with random
pointers into the middle of objects.  However, that's only a
theoretical reason, I don't see getting there any time soon, if at
all.

So, unless a very good reason comes up, I want to allow pointers to
SCM.



> For example, consider a list that is pointed to by a global variable
> and some fairly standard way of dealing with singly-linked lists in C:
>
>   SCM head;
>
>   void
>   delete_some ()
>   {
>     SCM *node_ptr = &head;
>     if (should_delete (*node_ptr))
>       *node_ptr = SCM_CDR (*node_ptr);
>     else
>       node_ptr = SCM_CDRLOC (*node_ptr);
>   }

[There should be a loop in that function of course. ]


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-04 13:53 ` Marius Vollmer
@ 2004-05-04 17:16   ` Paul Jarc
  2004-05-04 17:49     ` Marius Vollmer
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Jarc @ 2004-05-04 17:16 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote:
> Now, the distinction between scm_t_bits and SCM is only then practical
> when converting between them has zero cost.  SCM_PACK and SCM_UNPACK
> can really only be casts that reinterpret the bits.

Looking at the case of SCM_DEBUG_TYPING_STRICTNESS == 2, I'd expect
that scm_pack might be optimized away, so it would have no run-time
cost.  (At least, the compiler has enough information to do so, and
the C standard allows it.)  If that isn't happening already, maybe
marking it as inline would help?

> Thus, scm_t_bits and SCM can be pretty much identical and we can allow
> the casting of pointers to them, too.

The C standard does not allow accessing a value through a pointer to a
different type.  Newer versions of gcc have optimizations depending on
that restriction included in -O2.  You can disable those optimizations
with -fno-strict-aliasing, but maybe those optimizations would
outweigh some nonzero-cost conversion between scm_t_bits and SCM.
Some profiling would be useful.

> Pointers to scm_t_bits might still fail on strange platforms and we
> might then consider removing SCM_CELL_WORD_LOC on those platforms.

Better to make Guile the same on all platforms, I think, and so remove
it on all platforms if it doesn't work on some.

Granted that it's useful to have both SCM and scm_t_bits, what exactly
is the advantage in using those two types to alias the same bytes in
memory?  What do we gain here over your previous use-SCM-everywhere
suggestion?

> But we would need a very good reason for this: using pointers the
> way delete_some does is completely reasonable right now.

Well, it's expected to be reasonable, but turns out to be not quite
so, right?  Hence the issue.


paul


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-04 17:16   ` Paul Jarc
@ 2004-05-04 17:49     ` Marius Vollmer
  2004-05-04 18:35       ` Paul Jarc
  0 siblings, 1 reply; 25+ messages in thread
From: Marius Vollmer @ 2004-05-04 17:49 UTC (permalink / raw)


prj@po.cwru.edu (Paul Jarc) writes:

> Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote:
>> Now, the distinction between scm_t_bits and SCM is only then practical
>> when converting between them has zero cost.  SCM_PACK and SCM_UNPACK
>> can really only be casts that reinterpret the bits.
>
> Looking at the case of SCM_DEBUG_TYPING_STRICTNESS == 2, I'd expect
> that scm_pack might be optimized away, so it would have no run-time
> cost.  (At least, the compiler has enough information to do so, and
> the C standard allows it.)  If that isn't happening already, maybe
> marking it as inline would help?

SCM_DEBUG_TYPING_STRICTNESS == 2 is not meant for production code (I
have to review it anyway, there has been a bug report about it not
working at all, I think), so when compiling with that setting, the
performance argument doesn't count.

>> Thus, scm_t_bits and SCM can be pretty much identical and we can allow
>> the casting of pointers to them, too.
>
> The C standard does not allow accessing a value through a pointer to a
> different type.  Newer versions of gcc have optimizations depending on
> that restriction included in -O2.

Yes, we are pretty much non-standard when doing this, but it is OK
when it 'just works' in practice.  The optimizations should only bite
when we access the same memory location thru to differently typed
pointers, but we would not be doing this (I think).  A given location
is always only accessed as one type.

Err.  Or so I thought.  We have

   SCM_CELL_OBJECT (x, n) === SCM_PACK (SCM_CELL_WORD (x, n))

which would cause trouble when used together with

   *(SCM_CELL_OBJECT_LOC (x, n)) === *((SCM *)(SCM_CELL_WORD_LOC (x, n)))

Well.  I'm still confused...

> You can disable those optimizations with -fno-strict-aliasing, but
> maybe those optimizations would outweigh some nonzero-cost
> conversion between scm_t_bits and SCM.  Some profiling would be
> useful.

We already aim to be -fstrict-aliasing safe: see for example
scm_double_cell in inline.h.

>> Pointers to scm_t_bits might still fail on strange platforms and we
>> might then consider removing SCM_CELL_WORD_LOC on those platforms.
>
> Better to make Guile the same on all platforms, I think, and so remove
> it on all platforms if it doesn't work on some.

Yes.

> Granted that it's useful to have both SCM and scm_t_bits, what exactly
> is the advantage in using those two types to alias the same bytes in
> memory?  What do we gain here over your previous use-SCM-everywhere
> suggestion?

We would have to remove SCM_CELL_WORD_LOC from the API.

>> But we would need a very good reason for this: using pointers the
>> way delete_some does is completely reasonable right now.
>
> Well, it's expected to be reasonable, but turns out to be not quite
> so, right?  Hence the issue.

Hmm.  The (my) reasoning goes like this: first, we only had SCM and
everything was simple.  Then we added scm_t_bits to emphasize the
opaqueness of SCM but still be able to interpret its bits.  That gave
us the possibility to use these two types for the two kinds of words
in the heap: some are interpreted as a SCM (with tag bits and
everything), some are not interpreted and can contain anything that
fits into one word.  This makes SCM a subtype of scm_t_bits: all SCM
values can be represented with a scm_t_bits, but not all scm_t_bits
values are valid SCM values.  Thus, the basic type of heap words is
scm_t_bits.

That, however, creates the issue with having pointers to SCM.


When we can only have one type of pointer (either SCM* and
scm_t_bits*), I would want to have pointers to SCM since SCM is the
type that the users works with.  That would mean that heap words are
of type SCM.

That then also means that SCM_PACK would be used to store 'invalid'
values into a SCM and can no longer be used to mark places where only
valid SCM values should appear.  Maybe we should have a new
SCM_PACK_BITS for this, or maybe not.

Damn, I can't get my head around this...  My instincts tell me to just
remove scm_t_bits from the public API completely, but that is rather
drastic, maybe for 1.10...


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-04 17:49     ` Marius Vollmer
@ 2004-05-04 18:35       ` Paul Jarc
  2004-05-05 10:00         ` Marius Vollmer
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Jarc @ 2004-05-04 18:35 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote:
> SCM_DEBUG_TYPING_STRICTNESS == 2 is not meant for production code (I
> have to review it anyway, there has been a bug report about it not
> working at all, I think), so when compiling with that setting, the
> performance argument doesn't count.

What I'm saying is that (ISTM) we might be able to get maximum
type-safety and still have zero run-time conversion cost.  Isn't that
the ideal?

But actually, SCM_DEBUG_TYPING_STRICTNESS == 1 looks like it's just as
safe as 2 anyway.  (I think it would be nice to have SCM_UNPACK
type-check its argument as SCM_PACK does; let the caller explicitly
cast to scm_t_bits to silence the warning.)

> A given location is always only accessed as one type.

If that's true, then a union containing SCM and scm_t_bits would give
us standard-conformance.

> We already aim to be -fstrict-aliasing safe

Ok, good.

>> What do we gain here over your previous use-SCM-everywhere
>> suggestion?
>
> We would have to remove SCM_CELL_WORD_LOC from the API.

Is there much code outside of Guile itself using that?  I guess Guile
itself will always need something like scm_t_bits to check type tags,
but I wonder how useful WORD_LOC is for other code.

> When we can only have one type of pointer (either SCM* and
> scm_t_bits*), I would want to have pointers to SCM since SCM is the
> type that the users works with.  That would mean that heap words are
> of type SCM.

I agree.

> That then also means that SCM_PACK would be used to store 'invalid'
> values into a SCM and can no longer be used to mark places where only
> valid SCM values should appear.  Maybe we should have a new
> SCM_PACK_BITS for this, or maybe not.

SCM_MAKINUM? :)  Or do we need the full range?  More precisely, do we
need the full range and zero-cost conversion simultaneously?  Even
more precisely, do we need that in the API, for code outside of Guile
itself?

> My instincts tell me to just remove scm_t_bits from the public API
> completely, but that is rather drastic, maybe for 1.10...

Yes, if it's to be done, there should certinaly be a significant
deprecation period to give people time to adapt.


paul


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-04 18:35       ` Paul Jarc
@ 2004-05-05 10:00         ` Marius Vollmer
  2004-05-05 14:58           ` Paul Jarc
  0 siblings, 1 reply; 25+ messages in thread
From: Marius Vollmer @ 2004-05-05 10:00 UTC (permalink / raw)


prj@po.cwru.edu (Paul Jarc) writes:

> Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote:
>> SCM_DEBUG_TYPING_STRICTNESS == 2 is not meant for production code (I
>> have to review it anyway, there has been a bug report about it not
>> working at all, I think), so when compiling with that setting, the
>> performance argument doesn't count.
>
> What I'm saying is that (ISTM) we might be able to get maximum
> type-safety and still have zero run-time conversion cost.  Isn't that
> the ideal?

Yes, that's the ideal.  I haven't looked myself but I think that
STRICTNESS == 2 wont put SCM values into registers or something.

> But actually, SCM_DEBUG_TYPING_STRICTNESS == 1 looks like it's just as
> safe as 2 anyway.

Isn't there the added advantage with level 2 that it doesn't accept
SCM values in conditions?

>> A given location is always only accessed as one type.
>
> If that's true, then a union containing SCM and scm_t_bits would give
> us standard-conformance.

But not the best performance, I'm afraid.  Also, I'm no longer sure
that "A given location is always only accessed as one type".

>>> What do we gain here over your previous use-SCM-everywhere
>>> suggestion?
>>
>> We would have to remove SCM_CELL_WORD_LOC from the API.
>
> Is there much code outside of Guile itself using that?  I guess Guile
> itself will always need something like scm_t_bits to check type tags,
> but I wonder how useful WORD_LOC is for other code.

Since there is no OBJECT_LOC, people _have_ to use WORD_LOC (and
probably cast it to SCM* immediately).  But yes, the usage should be
pretty rare, hopefully.

>> When we can only have one type of pointer (either SCM* and
>> scm_t_bits*), I would want to have pointers to SCM since SCM is the
>> type that the users works with.  That would mean that heap words are
>> of type SCM.
>
> I agree.
>
>> That then also means that SCM_PACK would be used to store 'invalid'
>> values into a SCM and can no longer be used to mark places where only
>> valid SCM values should appear.  Maybe we should have a new
>> SCM_PACK_BITS for this, or maybe not.
>
> SCM_MAKINUM? :)  Or do we need the full range?  More precisely, do we
> need the full range and zero-cost conversion simultaneously?  Even
> more precisely, do we need that in the API, for code outside of Guile
> itself?

Yes, to all.  We need it to store arbitrary pointers for smobs.  When
you define a new smob type, you get to decide what to do with the
words of the smob cell: you can use them for SCMs, or for anything
else that fits into a SCM.


Well, the reason I brought this all up is that I'm adding new macros
for the use with double smobs: SCM_SMOB_DATA2 and SCM_SMOB_DATA3.
Then I have converted all code in Guile to use these macros instead of
SCM_CELL_WORD_2, etc.  But some code uses SCM_CELL_WORD_LOC, and then
I noticed that there is some kinkiness there...  I didn't feel like
rewriting the code to not use pointers, but maybe that's the cleanest
thing after all.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-05 10:00         ` Marius Vollmer
@ 2004-05-05 14:58           ` Paul Jarc
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Jarc @ 2004-05-05 14:58 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer <marius.vollmer@uni-dortmund.de> wrote:
> I haven't looked myself but I think that STRICTNESS == 2 wont put
> SCM values into registers or something.

Ah, ok.  Well, that depends on the compiler - maybe more recent
compilers are better with this.  (Though we still have to think about
older ones too for a while.)  The definition of SCM for STRICTNESS ==
2 also has the advantage that an SCM object can be accessed through an
scm_t_bits* pointer.

> Isn't there the added advantage with level 2 that it doesn't accept
> SCM values in conditions?

Oops, right.  I had only been thinking of getting a diagnostic when
using one type where the other specifically was expected -
assignments, function arguments, etc.

> Yes, to all.  We need it to store arbitrary pointers for smobs.  When
> you define a new smob type, you get to decide what to do with the
> words of the smob cell: you can use them for SCMs, or for anything
> else that fits into a SCM.

Hmm.  A union including void* would be perfect, if not for the
register problem.


paul


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer
                   ` (2 preceding siblings ...)
  2004-05-04 13:53 ` Marius Vollmer
@ 2004-05-10 13:42 ` Marius Vollmer
  2004-05-15  7:31 ` Dirk Herrmann
  2004-05-15 15:00 ` Dirk Herrmann
  5 siblings, 0 replies; 25+ messages in thread
From: Marius Vollmer @ 2004-05-10 13:42 UTC (permalink / raw)


Marius Vollmer <marius.vollmer@uni-dortmund.de> writes:

> Hi,
>
> I just got confused about our two 'fundamental' types, SCM and
> scm_t_bits.

Ok, so this is what I have done now:

    * Changes to the C interface

    ** SCM_CELL_WORD_LOC has been deprecated.

    Use the new macro SCM_CELL_OBJECT_LOC instead, which return a
    pointer to a SCM, as opposed to a pointer to a scm_t_bits.

    This was done to allow the correct use of pointers into the Scheme
    heap.  Previously, the heap words were of type scm_t_bits and
    local variables and function arguments were of type SCM, making it
    non-standards-conformant to have a pointer that can point to both.

    ** New macros SCM_SMOB_DATA_2, SM_SMOB_DATA_3, etc.

    These macros should be used instead of SCM_CELL_WORD_2/3 to access
    the second and third words of double smobs.  Likewise for
    SCM_SET_SMOB_DATA_2 and SCM_SET_SMOB_DATA_3.

    Also, there is SCM_SMOB_FLAGS and SCM_SET_SMOB_FLAGS that should
    be used to get and set the 16 exra bits in the zeroth word of a
    smob.

    And finally, there is SCM_SMOB_OBJECT and SCM_SMOB_SET_OBJECT for
    accesing the first immediate word of a smob as a SCM value, and
    there is SCM_SMOB_OBJECT_LOC for getting a pointer to the first
    immediate smob words.  Like wise for SCM_SMOB_OBJECT_2, etc.

I've also updated the smob documentation in the manual.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer
                   ` (3 preceding siblings ...)
  2004-05-10 13:42 ` Marius Vollmer
@ 2004-05-15  7:31 ` Dirk Herrmann
  2004-05-17 18:09   ` Marius Vollmer
  2004-05-15 15:00 ` Dirk Herrmann
  5 siblings, 1 reply; 25+ messages in thread
From: Dirk Herrmann @ 2004-05-15  7:31 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer wrote:

>Hi,
>
>I just got confused about our two 'fundamental' types, SCM and
>scm_t_bits.
>
The two types were introduced in order to create a level of abstraction 
between code, that operates on the encoding of scheme types, and code 
that only requires on API elements to do so.  The background was, that 
SCM formerly was just a typedef to "long". This had the disadvantage 
that the compiler did not do proper type checking when dealing with SCM 
types. For example, you could by mistake write "if (some_SCM_value) ..." 
if instead you had intended to write "if (!SCM_FALSEP 
(some_SCM_value))". Guile had a number of bugs of this or a similar type 
at that time. Thus, we wanted to introduce a better type checking, but 
on the other hand be able to deal with the bit representation in low 
level code. Thus the separation into SCM and scm_t_bits.

The idea is, to have SCM normally defined to be some performance 
efficient type, like a pointer to an unknown struct. This will inhibit 
most errors, like passing an integer where a SCM is expected, or 
performing arithmetic operations on a SCM value. It does not, however, 
inhibit the "if (some_SCM_value)" bug, since the compiler will accept a 
pointer there. Thus, from time to time, just for type-checking purposes, 
we can have SCM defined as a struct or union, thus inhibiting even those 
problems. You will find more information about that if you are looking 
at the use of SCM_DEBUG_TYPING_STRICTNESS in tags.h and the explanation 
in __scm.h. Unfortunately, compiling with SCM_DEBUG_TYPING_STRICTNESS 
set to 2 does not compile without errors, due to some tricky problems in 
some files. Nevertheless it should still be quite helpful for most parts 
of guile. Using it, I remember to have fixed about 5 to 10 bugs in guile 
at that time, which would not have been easy to find without thorough 
code inspection (or, if someone would have run into them).

>For example, consider a list that is pointed to by a global variable
>and some fairly standard way of dealing with singly-linked lists in C:
>
>  SCM head;
>
>  void
>  delete_some ()
>  {
>    SCM *node_ptr = &head;
>    if (should_delete (*node_ptr))
>      *node_ptr = SCM_CDR (*node_ptr);
>    else
>      node_ptr = SCM_CDRLOC (*node_ptr);
>  }
>
Such code should be avoided. Otherwise I think it is difficult to move 
to generational garbage collection: Generational garbage collection is 
based on the assumption, that newer objects only can point to older 
objects. This is true, as long as you don't modifiy existing objects. 
Therefore, write accesses to existing objects must pass through a write 
barrier, which is some code that helps the gc to work correctly even in 
case of references from old to new.

By modifying SCM values through SCM pointers you skip the write barrier. 
When switching to generational GC, such code would have to be fixed, 
either to avoid using the pointer access, or to add some operations that 
perform, what the write barrier would have done. Thus, in order to keep 
the number of places small which require such fixing later, I try to 
avoid using SCM_CDRLOC etc.

Best regards,
Dirk Herrmann



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer
                   ` (4 preceding siblings ...)
  2004-05-15  7:31 ` Dirk Herrmann
@ 2004-05-15 15:00 ` Dirk Herrmann
  2004-05-15 16:42   ` Dirk Herrmann
  2004-05-17 19:22   ` Marius Vollmer
  5 siblings, 2 replies; 25+ messages in thread
From: Dirk Herrmann @ 2004-05-15 15:00 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer wrote:

>  I propose to remove the need to convert between scm_t_bits* and SCM*
>  and to allow only SCMs to be in memory.
>
>
>  The words in a scm_t_cell would be of type SCM. This would mean that
>  SCM_CELL_WORD_LOC would be removed and replaced with
>  SCM_CELL_OBJECT_LOC. Also, SCM_SET_SMOB_DATA (etc) might not be able
>  to store all scm_t_bits values that it is handed (because scm_t_bits
>  could be larger than a pointer). We could make a new guarantee that
>  says that SCM_SET_SMOB_DATA (etc) can store any pointer that is cast
>  to a scm_t_bits and any integer that fits into 'unsigned int', say.
>
>  The type scm_t_bits would be restricted to temporary values that are
>  mostly used to test tag bits etc. They would usually not stored in
>  data structures and when they are, they can not be expected to
>  protected the SCM value that they encode when they are scanned
>  conservatively.
>
>  Should we (gradually and with deprecation and everyhing) remove
>  scm_t_bits from the smob API completely? I have not thought this
>  thru, but we might and with something that is not really an
>  improvement, just different.

I have tried to give it some more thought:

Since the introduction of scm_t_bits, it has been stated about SCM 
variables,
that every SCM variable is known to hold a valid scheme object, while 
scm_t_bits
variables may also hold arbitrary data.

On the heap, however, cells do typically _not_ hold valid scheme 
objects. One of
the exceptions is the pair object, which has the property, that both of
its cell entries happen to hold valid scheme objects. This is the reason why
SCM_CARLOC and SCM_CDRLOC work on pairs. If you use SCM_CARLOC or
SCM_CDRLOC on anything else than a pair object, you will most likely crash
the system, because the SCM value that you get will not hold a valid scheme
object.

 From this perspective, it is inconsistent to define scm_t_cell to hold 
SCM objects.


I have not yet given it a try, but I found the suggestion to use a union 
quite appaling:

typedef struct scm_t_cell
{
  union {
    scm_t_bits word_0;
    SCM object_0;
  } element_0;
  union {
    scm_t_bits word_1;
    SCM object_1;
  } element_1;
} scm_t_cell;

or even a more generic version, which would cover single cells as well 
as double
cells and even allow to cleanly iterate over all cell elements in a loop:

typedef struct scm_t_cell
{
  union {
    scm_t_bits word;
    SCM object;
  } elements[];
} scm_t_cell;

I see no reason why either of the above solutions should have a negative
influence on performance. All of our cell accessing macros should be easily
re-definable with those types, in particular:

#define SCM_GC_CELL_WORD(x, n) (((SCM2PTR (x)) [n]).word)
#define SCM_GC_CELL_OBJECT(x, n) (((SCM2PTR (x)) [n]).object)

which does not require any casting to be performed any more :-)

Best regards,
Dirk Herrmann




_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-15 15:00 ` Dirk Herrmann
@ 2004-05-15 16:42   ` Dirk Herrmann
  2004-05-17 19:22   ` Marius Vollmer
  1 sibling, 0 replies; 25+ messages in thread
From: Dirk Herrmann @ 2004-05-15 16:42 UTC (permalink / raw)
  Cc: guile-devel

Dirk Herrmann wrote:

>  I have not yet given it a try, but I found the suggestion to use a
>  union quite appaling:
 > [some syntactically incorrect examples deleted :-)]

However, I have now given it a try, and it seems to work quite nicely.
I enclose a patch, which I would be glad to apply, if you, Marius, agree.
I did not do it yet, though, in order to give everybody the chance to
check whether it has any negative influence on performance.

Best regards,
Dirk


Diff agains the latest CVS:

Index: NEWS
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/NEWS,v
retrieving revision 1.427
diff -u -b -r1.427 NEWS
--- NEWS    6 May 2004 17:04:04 -0000    1.427
+++ NEWS    15 May 2004 16:02:26 -0000
@@ -584,16 +584,6 @@
 
 * Changes to the C interface
 
-** SCM_CELL_WORD_LOC has been deprecated.
-
-Use the new macro SCM_CELL_OBJECT_LOC instead, which return a pointer
-to a SCM, as opposed to a pointer to a scm_t_bits.
-
-This was done to allow the correct use of pointers into the Scheme
-heap.  Previously, the heap words were of type scm_t_bits and local
-variables and function arguments were of type SCM, making it
-non-standards-conformant to have a pointer that can point to both.
-
 ** New macros SCM_SMOB_DATA_2, SM_SMOB_DATA_3, etc.
 
 These macros should be used instead of SCM_CELL_WORD_2/3 to access the
Index: libguile/ChangeLog
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/libguile/ChangeLog,v
retrieving revision 1.2070
diff -u -b -r1.2070 ChangeLog
--- libguile/ChangeLog    15 May 2004 15:50:24 -0000    1.2070
+++ libguile/ChangeLog    15 May 2004 16:02:26 -0000
@@ -1,5 +1,23 @@
 2004-05-15  Dirk Herrmann  <dirk@dirk-herrmanns-seiten.de>
 
+    * deprecated.h (SCM_CELL_WORD_LOC): Un-deprecated.
+
+    * gc.h (scm_t_cell): Redefined to hold a union, since each cell
+    element either holds a scm_t_bits value or a SCM value.
+
+    (SCM_GC_CARD_BVEC, SCM_GC_SET_CARD_BVEC, SCM_GC_GET_CARD_FLAGS,
+    SCM_GC_SET_CARD_FLAGS, SCM_GC_CELL_OBJECT, SCM_GC_CELL_WORD,
+    SCM_GC_SET_CELL_OBJECT, SCM_GC_SET_CELL_WORD): Modified to work
+    with the new scm_t_cell.
+
+    (SCM_CELL_WORD_LOC): Un-deprecated.
+
+    * numbers.h (SCM_I_BIG_MPZ): Use SCM_CELL_WORD_LOC instead of
+    SCM_CELL_OBJECT_LOC, since we are not dealing with scheme objects
+    here.
+
+2004-05-15  Dirk Herrmann  <dirk@dirk-herrmanns-seiten.de>
+
     * numbers.h (SCM_SLOPPY_FRACTIONP): Removed.  It was not used
     throughout guile, has not been part of an official release yet,
     and the concept of sloppy predicates has never been a good idea.
Index: libguile/deprecated.h
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/libguile/deprecated.h,v
retrieving revision 1.19
diff -u -b -r1.19 deprecated.h
--- libguile/deprecated.h    6 May 2004 16:42:50 -0000    1.19
+++ libguile/deprecated.h    15 May 2004 16:02:26 -0000
@@ -300,11 +300,6 @@
 
 #define SCM_VALIDATE_OPDIR(pos, port) SCM_MAKE_VALIDATE (pos, port, OPDIRP)
 
-/* Deprecated because we can not safely cast a SCM* to a scm_t_bits*
- */
-
-#define SCM_CELL_WORD_LOC(x, n)   
((scm_t_bits*)SCM_CELL_OBJECT_LOC((x),(n)))
-
 void scm_i_init_deprecated (void);
 
 #endif
Index: libguile/gc.h
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/libguile/gc.h,v
retrieving revision 1.116
diff -u -b -r1.116 gc.h
--- libguile/gc.h    6 May 2004 16:41:08 -0000    1.116
+++ libguile/gc.h    15 May 2004 16:02:26 -0000
@@ -36,8 +36,10 @@
 
 typedef struct scm_t_cell
 {
-  SCM word_0;
-  SCM word_1;
+  union {
+    scm_t_bits word;
+    SCM object;
+  } elements[2];
 } scm_t_cell;
 
 /*
@@ -73,12 +75,14 @@
 #define SCM_GC_CARD_N_CELLS        256
 #define SCM_GC_SIZEOF_CARD         SCM_GC_CARD_N_CELLS * sizeof 
(scm_t_cell)
 
-#define SCM_GC_CARD_BVEC(card)  ((scm_t_c_bvec_long *) ((card)->word_0))
+#define SCM_GC_CARD_BVEC(card) \
+  ((scm_t_c_bvec_long *) ((card)->elements[0].word))
 #define SCM_GC_SET_CARD_BVEC(card, bvec) \
-    ((card)->word_0 = (SCM) (bvec))
-#define SCM_GC_GET_CARD_FLAGS(card) ((long) ((card)->word_1))
+  ((card)->elements[0].word = (scm_t_bits) (bvec))
+#define SCM_GC_GET_CARD_FLAGS(card) \
+  ((long) ((card)->elements[1].word))
 #define SCM_GC_SET_CARD_FLAGS(card, flags) \
-    ((card)->word_1 = (SCM) (flags))
+  ((card)->elements[1].word = (flags))
 
 #define SCM_GC_GET_CARD_FLAG(card, shift) \
  (SCM_GC_GET_CARD_FLAGS (card) & (1L << (shift)))
@@ -141,12 +145,13 @@
  * in debug mode.  In particular these macros will even work for free 
cells,
  * which should never be encountered by user code.  */
 
-#define SCM_GC_CELL_OBJECT(x, n) (((SCM *)SCM2PTR (x)) [n])
-#define SCM_GC_CELL_WORD(x, n)   (SCM_UNPACK (SCM_GC_CELL_OBJECT ((x), 
(n))))
+#define SCM_GC_CELL_WORD(x, n) (((SCM2PTR (x))->elements[n]).word)
+#define SCM_GC_CELL_OBJECT(x, n) (((SCM2PTR (x))->elements[n]).object)
 
-#define SCM_GC_SET_CELL_OBJECT(x, n, v) ((((SCM *)SCM2PTR (x)) [n]) = (v))
+#define SCM_GC_SET_CELL_OBJECT(x, n, v) \
+  (((SCM2PTR (x))->elements[n]).object = (v))
 #define SCM_GC_SET_CELL_WORD(x, n, v)  \
-  (SCM_GC_SET_CELL_OBJECT ((x), (n), SCM_PACK (v)))
+  (((SCM2PTR (x))->elements[n]).word = (v))
 
 #define SCM_GC_CELL_TYPE(x) (SCM_GC_CELL_OBJECT ((x), 0))
 
@@ -214,6 +219,7 @@
   (SCM_GC_SET_CELL_OBJECT ((x), 1, (v)))
 
 
+#define SCM_CELL_WORD_LOC(x, n)   (&SCM_CELL_WORD ((x), (n)))
 #define SCM_CELL_OBJECT_LOC(x, n) (&SCM_CELL_OBJECT ((x), (n)))
 #define SCM_CARLOC(x)             (SCM_CELL_OBJECT_LOC ((x), 0))
 #define SCM_CDRLOC(x)             (SCM_CELL_OBJECT_LOC ((x), 1))
Index: libguile/numbers.h
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/libguile/numbers.h,v
retrieving revision 1.82
diff -u -b -r1.82 numbers.h
--- libguile/numbers.h    15 May 2004 15:50:24 -0000    1.82
+++ libguile/numbers.h    15 May 2004 16:02:26 -0000
@@ -144,7 +144,7 @@
 #define SCM_COMPLEX_IMAG(x) (SCM_COMPLEX_MEM (x)->imag)
 
 /* Each bignum is just an mpz_t stored in a double cell starting at 
word 1. */
-#define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_OBJECT_LOC((x),1))))
+#define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_WORD_LOC ((x), 1))))
 #define SCM_BIGP(x) (!SCM_IMP (x) && SCM_TYP16 (x) == scm_tc16_big)
 
 #define SCM_NUMBERP(x) (SCM_INUMP(x) || SCM_NUMP(x))




_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-15  7:31 ` Dirk Herrmann
@ 2004-05-17 18:09   ` Marius Vollmer
  0 siblings, 0 replies; 25+ messages in thread
From: Marius Vollmer @ 2004-05-17 18:09 UTC (permalink / raw)
  Cc: guile-devel

Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes:

> Marius Vollmer wrote:
>
>>Hi,
>>
>>I just got confused about our two 'fundamental' types, SCM and
>>scm_t_bits.
>>
> The two types were introduced in order to create a level of
> abstraction between code, that operates on the encoding of scheme
> types, and code that only requires on API elements to do so.
> [...]

Yep, thanks for reinforcing this.  The purpose of scm_t_bits as it is
currenly documented in the manual is different, however: it is the
type that can hold arbitrary words, not just valid SCM values.  The
first word of a smob would be a scm_t_bits since it is not a valid SCM
value, for example.  I was confused by these two distinctions between
the two types.

>>For example, consider a list that is pointed to by a global variable
>>and some fairly standard way of dealing with singly-linked lists in C:
>>
>>  SCM head;
>>
>>  void
>>  delete_some ()
>>  {
>>    SCM *node_ptr = &head;
>>    if (should_delete (*node_ptr))
>>      *node_ptr = SCM_CDR (*node_ptr);
>>    else
>>      node_ptr = SCM_CDRLOC (*node_ptr);
>>  }
>>
> Such code should be avoided. [...]

Yes, probably.  But we currently support this style and Guile uses it
itself.  However, the way we were casting pointers around did not
properly respect the disctinction between SCM and scm_t_bits: we
didn't make the guarantee that you could meaningfully cast a
scm_t_bits* to a SCM*, yet we would rely on it.

Instead of disallowing working with SCM* pointers, I have fixed the
type confusion.  This was the smaller change by far.

But of course, this does not mean that we can not at one time disallow
the use of SCM* pointers.  But that would be a much bigger decision
than I would want to make now, while (slowly) zooming in on 1.8.

> By modifying SCM values through SCM pointers you skip the write
> barrier. When switching to generational GC, such code would have to
> be fixed, either to avoid using the pointer access, or to add some
> operations that perform, what the write barrier would have
> done. Thus, in order to keep the number of places small which
> require such fixing later, I try to avoid using SCM_CDRLOC etc.

Yes, I fully agree here.

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-15 15:00 ` Dirk Herrmann
  2004-05-15 16:42   ` Dirk Herrmann
@ 2004-05-17 19:22   ` Marius Vollmer
  2004-05-17 20:17     ` Paul Jarc
  2004-05-21 19:37     ` Dirk Herrmann
  1 sibling, 2 replies; 25+ messages in thread
From: Marius Vollmer @ 2004-05-17 19:22 UTC (permalink / raw)
  Cc: guile-devel

Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes:

> I have not yet given it a try, but I found the suggestion to use a
> union quite appaling:

[ I think you mean "appealing". :-) I used to mix up the adjectives
  "content" and "contempt"...  [ And its "appalling" with double-el.
  I hope you don't mind this little public correction.  [ I think I
  can get away with it since I make a ton of mistakes myself... ] ]
]

> [...]
>
> typedef struct scm_t_cell
> {
>   union {
>     scm_t_bits word;
>     SCM object;
>   } elements[];
> } scm_t_cell;

Yes, but consider how we use the heap: we fetch a word and then must
decide whether it is a SCM or a scm_t_bits, we don't know this in
advance in every case.  This is not really supported by a union: I
don't think you can store into one member and then (portably) assume
anything about the value read from a different member.  This is very
much like storing into one memory location thru one pointer and
reading that same location through a differently-typed pointer.  I
therefore don't think that using a union is an improvement.

(It will work on all 'normal' platforms, of course, just like casting
the pointers worked, but I would like to get away from relying on this
non-portable behavior.)

Thus, I think we are better off by just declaring the heap words to be
of type SCM and always accessing them as this type.  Converting
between SCM and scm_t_bits will happen with SCM_PACK and SCM_UNPACK.
That way, we don't need to assume that a SCM and a scm_t_bits are
stored identically in memory.

(And incidentally, we don't use scm_t_cell that much in Guile anyway.
We should maybe remove it.  I tried recently but didn't finish since
it turned out to be quite some amount of work to get all size
statistics in the GC correct.)

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-17 19:22   ` Marius Vollmer
@ 2004-05-17 20:17     ` Paul Jarc
  2004-05-21 19:37     ` Dirk Herrmann
  1 sibling, 0 replies; 25+ messages in thread
From: Paul Jarc @ 2004-05-17 20:17 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer <mvo@zagadka.de> wrote:
> I don't think you can store into one member and then (portably)
> assume anything about the value read from a different member.  This
> is very much like storing into one memory location thru one pointer
> and reading that same location through a differently-typed pointer.

True in general, although some cases are legal.  Any type can be
accessed as an array of unsigned char; two structure types with a
common initial sequence of members can have those members accessed;
the first element of a structure, or any element of a union, can be
accessed directly through the element type instead of explicitly as a
member of the structure or union.

So, for example, if SCM is simply a struct containing scm_t_bits, then
you can access that scm_t_bits value without the SCM wrapping, and
given a union of scm_t_bits and SCM, you can access the scm_t_bits
member, even if the SCM member was the last one assigned to.


paul


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-17 19:22   ` Marius Vollmer
  2004-05-17 20:17     ` Paul Jarc
@ 2004-05-21 19:37     ` Dirk Herrmann
  2004-05-21 20:30       ` Paul Jarc
  1 sibling, 1 reply; 25+ messages in thread
From: Dirk Herrmann @ 2004-05-21 19:37 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer wrote:

>  Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes:
>
> > I have not yet given it a try, but I found the suggestion to use a
> > union quite appaling:
>
>  [ I think you mean "appealing". :-) I used to mix up the adjectives
>  "content" and "contempt"... [ And its "appalling" with double-el. I
>  hope you don't mind this little public correction. [ I think I can
>  get away with it since I make a ton of mistakes myself... ] ] ]

:-) No problem, thanks for the hint. In fact, I did not even know the
word "appalling". Reminds me of a situation when I read the word
"hostile" and thought it came from "host". Think about someone
thanking his host for their "hostility" :-)

> > [...]
> >
> > typedef struct scm_t_cell { union { scm_t_bits word; SCM object; }
> > elements[]; } scm_t_cell;
>
>  Yes, but consider how we use the heap: we fetch a word and then must
>  decide whether it is a SCM or a scm_t_bits, we don't know this in
>  advance in every case. This is not really supported by a union: I
>  don't think you can store into one member and then (portably) assume
>  anything about the value read from a different member. This is very
>  much like storing into one memory location thru one pointer and
>  reading that same location through a differently-typed pointer. I
>  therefore don't think that using a union is an improvement.

I don't see a problem here: The rule is, if you don't know better in 
advance,
always access your memory as a scm_t_bits variable. This is exactly the
way we determine, whether a cell really holds a pair: As long as it is just
a cell, we check the bits. Only if we know its a pair, we dare to access it
as a pair of SCM values.

>  Thus, I think we are better off by just declaring the heap words to
>  be of type SCM and always accessing them as this type. Converting
>  between SCM and scm_t_bits will happen with SCM_PACK and SCM_UNPACK.
>  That way, we don't need to assume that a SCM and a scm_t_bits are
>  stored identically in memory.

Then, again, we have to rather stay on the safe side and assume to have only
scm_t_bits variables on the heap: If a variable of type SCM and a 
variable of
type scm_t_bits would _really_ look different, then the heap _must_ hold
elements of type scm_t_bits, since all non-pair objects can store arbitrary
data in their cells. Thus, in such a case accessing the heap via SCM 
pointers
would be plain wrong.

However, I would not be too restrictive:
I don't think that the distinction between SCM and scm_t_bits should go in
the direction that SCM and scm_t_bits might be represented in completely
different ways: It was introduced as a means to provide better type checking
in guile. On that way it brought (almost coincidentally) a nice distinction
between code that operates on higher levels and code that doesn't. The fact
that some code does not yet use that abstraction barrier correctly (it may
be that this is the case for scm_mark_locations, which you gave as an
example) could also mean that this code needs to be fixed.

Another, more general note:

The whole discussion only came up since there are places in guile or in
client code where people want to access the heap via pointers. Before we
adapt one of our central structures for such uses, we should first think,
whether that usage is correct or not. In the context of generational gc,
I think we should be very careful about such uses. Let's rather try to get
rid of such code, and encourage users to do the same.

Note that, ..._WORD_LOC write accesses may be perfectly safe: If the
data that is being pointed to does not hold scheme objects and also no
other data that introduces gc-relevant dependencies, you can safely write
to the heap in this way. The acess in numbers.h that I modified in my
patch for example is no problem: The heap holds only references to
gmp-data, no references back into the heap.

On the contrary, ..._OBJECT_LOC write accesses are always a problem
with respect to generational gc.

Best regards
Dirk



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-21 19:37     ` Dirk Herrmann
@ 2004-05-21 20:30       ` Paul Jarc
  2004-05-22  6:48         ` Dirk Herrmann
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Jarc @ 2004-05-21 20:30 UTC (permalink / raw)
  Cc: guile-devel, Marius Vollmer

Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> wrote:
> As long as it is just a cell, we check the bits. Only if we know its
> a pair, we dare to access it as a pair of SCM values.

But according to the C standard, we shouldn't dare to access it as
scm_t_bits either.  unsigned char is the only universally safe
aliasing access type.

If it wouldn't be too much trouble to access the type bits through
unsigned char, then we could store SCM values on the heap and work
with pointers to SCM (even if only for reading, not writing).  We
would use the type bits to decide whether to access a given word/cell
as SCM or scm_t_bits.  (It'd be nice for smobs if void* could be
another option there.)  That would conform to the C standard.


paul


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-21 20:30       ` Paul Jarc
@ 2004-05-22  6:48         ` Dirk Herrmann
  2004-05-23 15:03           ` Paul Jarc
  2004-08-09 21:09           ` Marius Vollmer
  0 siblings, 2 replies; 25+ messages in thread
From: Dirk Herrmann @ 2004-05-22  6:48 UTC (permalink / raw)
  Cc: guile-devel, Marius Vollmer

Paul Jarc wrote:

>  Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> wrote:
>
> > As long as it is just a cell, we check the bits. Only if we know
> > its a pair, we dare to access it as a pair of SCM values.
>
>  But according to the C standard, we shouldn't dare to access it as
>  scm_t_bits either. unsigned char is the only universally safe
>  aliasing access type.

This is highly academic. Within guile there are so many places where
we perform operations that are not defined according to the C
standard: conservative garbage collection, copying stacks for
continuations, assembly code, ...

Certainly, the way we convert between scm_t_bits and SCM is
implementation dependent. However, the definitions for scm_t_bits
and SCM are IMO a very good way to provide an abstraction of
some of this uncleanlyness. And, with today's definitions of
scm_t_bits and SCM, the heap _must_ hold scm_t_bits variables.
Please explain, why you think that it is cleaner to say it only holds
scheme objects if in fact it does not.

I am still missing the background for the whole discussion: The
standard says that the result of casting pointers to scm_t_bits
to pointers to SCM gives implementation dependent results,
right? Or does the standard explicitly forbid such casts?

What is the reason to change a paradigm, which has for several
years worked quite nicely, is easily understood, and has helped to
find and probably also to avoid a bunch of errors? Is there any
_real_ reason that any of you is facing, like that guile does not
compile or work on one of your machines?

Best regards
Dirk



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-22  6:48         ` Dirk Herrmann
@ 2004-05-23 15:03           ` Paul Jarc
  2004-08-09 21:09           ` Marius Vollmer
  1 sibling, 0 replies; 25+ messages in thread
From: Paul Jarc @ 2004-05-23 15:03 UTC (permalink / raw)
  Cc: Marius Vollmer, guile-devel

Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> wrote:
> This is highly academic. Within guile there are so many places where
> we perform operations that are not defined according to the C
> standard: conservative garbage collection, copying stacks for
> continuations, assembly code, ...

Well, my impression is that Marius wants to get rid of at least this
particular bit of nonconformance.  Marius?

> Please explain, why you think that it is cleaner to say it only holds
> scheme objects if in fact it does not.

That's not exactly it.  The words that hold Scheme values would be
accessed using the SCM type, instead of coneverting to/from
scm_t_bits.  Other words, holding non-Scheme values, would still be
accessed as scm_t_bits.  (AIUI.)

> The standard says that the result of casting pointers to scm_t_bits
> to pointers to SCM gives implementation dependent results, right? Or
> does the standard explicitly forbid such casts?

Casting the pointer is ok in itself (well, unless there is an
alignment mismatch).  Accessing the pointed-to memory gives undefined
behvaior if there isn't actually an object there of the type you're
using to access it.

If SCM were defined as a struct containing scm_t_bits, then it would
certainly be legal to access an SCM object via scm_t_bits*.

> What is the reason to change a paradigm, which has for several
> years worked quite nicely, is easily understood, and has helped to
> find and probably also to avoid a bunch of errors? Is there any
> _real_ reason that any of you is facing, like that guile does not
> compile or work on one of your machines?

I think Marius is worried that there may be systems where the bits of
an SCM value are different from the bits of the corresponding
scm_t_bits value.  I don't know whether any have actually been found.


paul


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-05-22  6:48         ` Dirk Herrmann
  2004-05-23 15:03           ` Paul Jarc
@ 2004-08-09 21:09           ` Marius Vollmer
  2004-08-20 19:17             ` Dirk Herrmann
  1 sibling, 1 reply; 25+ messages in thread
From: Marius Vollmer @ 2004-08-09 21:09 UTC (permalink / raw)
  Cc: Paul Jarc, guile-devel

Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes:

> Certainly, the way we convert between scm_t_bits and SCM is
> implementation dependent. However, the definitions for scm_t_bits
> and SCM are IMO a very good way to provide an abstraction of some of
> this uncleanlyness. And, with today's definitions of scm_t_bits and
> SCM, the heap _must_ hold scm_t_bits variables.  Please explain, why
> you think that it is cleaner to say it only holds scheme objects if
> in fact it does not.

The reason is that there exits code that does essentially this:

    scm_t_bits heap_field;

    SCM value = whatever ();
    SCM *ptr = (SCM *)&heap_field;
    *ptr = value;

This is quite unclean.  This variant is cleaner and standards
conformant:

    SCM heap_field;
    
    SCM value = whatever ();
    SCM *ptr = &heap_field;
    *ptr = value;

> What is the reason to change a paradigm, which has for several years
> worked quite nicely, is easily understood, and has helped to find
> and probably also to avoid a bunch of errors?

I don't think that the paradigm has changed fundamentally.  It has
been strengthened, if you will.  The distinction between scm_t_bits
and SCM is still there.

We don't just cast between SCM and scm_t_bits, we use SCM_PACK and
SCM_UNPACK.  Except sometimes a scm_t_bits variable is stored into via
a SCM* pointer, totally ruining the care PACk/UNPACK abstraction.
That exception has now been removed.  I see that as an unconditional
improvement, don't you?

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-08-09 21:09           ` Marius Vollmer
@ 2004-08-20 19:17             ` Dirk Herrmann
  2004-08-21 16:16               ` Marius Vollmer
  0 siblings, 1 reply; 25+ messages in thread
From: Dirk Herrmann @ 2004-08-20 19:17 UTC (permalink / raw)
  Cc: Paul Jarc, guile-devel

[-- Attachment #1: Type: text/plain, Size: 3790 bytes --]

Marius Vollmer wrote:

>  Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes:
>
> > Certainly, the way we convert between scm_t_bits and SCM is
> > implementation dependent. However, the definitions for scm_t_bits
> > and SCM are IMO a very good way to provide an abstraction of some
> > of this uncleanlyness. And, with today's definitions of scm_t_bits
> > and SCM, the heap _must_ hold scm_t_bits variables. Please
> > explain, why you think that it is cleaner to say it only holds
> > scheme objects if in fact it does not.
>
>  The reason is that there exits code that does essentially this:
>
>  scm_t_bits heap_field;
>
>  SCM value = whatever (); SCM *ptr = (SCM *)&heap_field; *ptr = value;

I assume that you mean that heap_field is actually an element of the heap.

We already had the discussion that I suggest to discourage this style of 
coding since it violates a potential write barrier and will lead to 
problems if we ever switch to a generational garbage collection. Despite 
of this discussion, you nevertheless seem to have decided that you do 
not want to discourage this style of coding. That's OK, it's perfectly 
fair to make that decision if you are aware of the consequences. I would 
just prefer if such decisions were explicitly stated, in order to avoid 
confusion for both guile developers and users. Please consider that, as 
long as such a decision is not official, people who have ever followed 
discussions about the implications of such coding on generational gc 
might put some effort into avoiding such code. We can well spare them 
and ourselves this effort then.

> > What is the reason to change a paradigm, which has for several
> > years worked quite nicely, is easily understood, and has helped to
> > find and probably also to avoid a bunch of errors?
>
>  I don't think that the paradigm has changed fundamentally. It has
>  been strengthened, if you will. The distinction between scm_t_bits
>  and SCM is still there.
>
>  We don't just cast between SCM and scm_t_bits, we use SCM_PACK and
>  SCM_UNPACK. Except sometimes a scm_t_bits variable is stored into
>  via a SCM* pointer, totally ruining the care PACk/UNPACK abstraction.
>  That exception has now been removed. I see that as an unconditional
>  improvement, don't you?

As said above, I accept if you decide to allow such a coding style. 
Then, you are right, people should have the option to have SCM pointers 
into the heap. I just doubt that the current solution is elegant. On the 
contrary: I think that it introduces an uncleanlyness for exactly the 
other type of scenario, namely if someone needs to have a scm_t_bits 
pointer into the heap. In particular, I have a problem with the 
following lines of code.

  In gc.h:

    #define SCM_GC_CELL_WORD(x, n)   (SCM_UNPACK (SCM_GC_CELL_OBJECT 
((x), (n))))

    This expression has a SCM value as an intermediate result, which is 
definitely unclean, since the SCM value might (in contrast to the 
definition of SCM) not represent a valid scheme object.

  In numbers.h:

    #define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_OBJECT_LOC((x),1))))

    This expression has a SCM* as an intermediate result, although in 
this case we _know_ that we are actually pointing to a scm_t_bits value.

My suggestion is just, to remove this uncleanlyness by going one step 
further, namely to make scm_t_cell a union of scm_t_bits and SCM values. 
IMO, this is the most elegant approach anyway, since it reflects the 
actual situation best. I enclose a patch to accomplish this. Please 
consider the patch and, if you like, comment on it. I had it presented 
before, but if I remember correctly, there has not been a definite 
decision on it. Thus, I would just go ahead and apply it within the next 
couple of days.

Best regards,
Dirk

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 6278 bytes --]

Index: guile-core/NEWS
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/NEWS,v
retrieving revision 1.438
diff -u -r1.438 NEWS
--- guile-core/NEWS	19 Aug 2004 17:54:37 -0000	1.438
+++ guile-core/NEWS	20 Aug 2004 17:44:57 -0000
@@ -744,16 +744,6 @@
     SCM_SYMBOL_HASH           -> scm_hashq
     SCM_SYMBOL_INTERNED_P     -> scm_symbol_interned_p
 
-** SCM_CELL_WORD_LOC has been deprecated.
-
-Use the new macro SCM_CELL_OBJECT_LOC instead, which return a pointer
-to a SCM, as opposed to a pointer to a scm_t_bits.
-
-This was done to allow the correct use of pointers into the Scheme
-heap.  Previously, the heap words were of type scm_t_bits and local
-variables and function arguments were of type SCM, making it
-non-standards-conformant to have a pointer that can point to both.
-
 ** New macros SCM_SMOB_DATA_2, SCM_SMOB_DATA_3, etc.
 
 These macros should be used instead of SCM_CELL_WORD_2/3 to access the
Index: guile-core/libguile/ChangeLog
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/libguile/ChangeLog,v
retrieving revision 1.2128
diff -u -r1.2128 ChangeLog
--- guile-core/libguile/ChangeLog	20 Aug 2004 13:33:39 -0000	1.2128
+++ guile-core/libguile/ChangeLog	20 Aug 2004 17:45:05 -0000
@@ -1,3 +1,21 @@
+2004-05-24  Dirk Herrmann  <dirk@dirk-herrmanns-seiten.de>
+
+	* deprecated.h (SCM_CELL_WORD_LOC): Un-deprecated.
+
+	* gc.h (scm_t_cell): Redefined to hold a union, since each cell
+	element either holds a scm_t_bits value or a SCM value.
+
+	(SCM_GC_CARD_BVEC, SCM_GC_SET_CARD_BVEC, SCM_GC_GET_CARD_FLAGS,
+	SCM_GC_SET_CARD_FLAGS, SCM_GC_CELL_OBJECT, SCM_GC_CELL_WORD,
+	SCM_GC_SET_CELL_OBJECT, SCM_GC_SET_CELL_WORD): Modified to work
+	with the new scm_t_cell.
+
+	(SCM_CELL_WORD_LOC): Un-deprecated.
+
+	* numbers.h (SCM_I_BIG_MPZ): Use SCM_CELL_WORD_LOC instead of
+	SCM_CELL_OBJECT_LOC, since we are not dealing with scheme objects
+	here.
+
 2004-08-20  Marius Vollmer  <marius.vollmer@uni-dortmund.de>
 
 	* eval.c (scm_lookupcar1): Report "Variable used before given a
Index: guile-core/libguile/deprecated.h
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/libguile/deprecated.h,v
retrieving revision 1.29
diff -u -r1.29 deprecated.h
--- guile-core/libguile/deprecated.h	19 Aug 2004 16:49:42 -0000	1.29
+++ guile-core/libguile/deprecated.h	20 Aug 2004 17:45:05 -0000
@@ -291,11 +291,6 @@
 
 #define SCM_VALIDATE_OPDIR(pos, port) SCM_MAKE_VALIDATE (pos, port, OPDIRP)
 
-/* Deprecated because we can not safely cast a SCM* to a scm_t_bits*
- */
-
-#define SCM_CELL_WORD_LOC(x, n)   ((scm_t_bits*)SCM_CELL_OBJECT_LOC((x),(n)))
-
 /* Users shouldn't know about INUMs.
  */
 
Index: guile-core/libguile/gc.h
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/libguile/gc.h,v
retrieving revision 1.118
diff -u -r1.118 gc.h
--- guile-core/libguile/gc.h	19 Aug 2004 16:48:37 -0000	1.118
+++ guile-core/libguile/gc.h	20 Aug 2004 17:45:05 -0000
@@ -36,8 +36,10 @@
 
 typedef struct scm_t_cell
 {
-  SCM word_0;
-  SCM word_1;
+  union {
+    scm_t_bits word;
+    SCM object;
+  } elements[2];
 } scm_t_cell;
 
 /*
@@ -70,15 +72,17 @@
 
 
 #define SCM_GC_CARD_N_HEADER_CELLS 1
-#define SCM_GC_CARD_N_CELLS        256
-#define SCM_GC_SIZEOF_CARD 		SCM_GC_CARD_N_CELLS * sizeof (scm_t_cell)
+#define SCM_GC_CARD_N_CELLS 256
+#define SCM_GC_SIZEOF_CARD SCM_GC_CARD_N_CELLS * sizeof (scm_t_cell)
 
-#define SCM_GC_CARD_BVEC(card)  ((scm_t_c_bvec_long *) ((card)->word_0))
+#define SCM_GC_CARD_BVEC(card) \
+  ((scm_t_c_bvec_long *) ((card)->elements[0].word))
 #define SCM_GC_SET_CARD_BVEC(card, bvec) \
-    ((card)->word_0 = (SCM) (bvec))
-#define SCM_GC_GET_CARD_FLAGS(card) ((long) ((card)->word_1))
+  ((card)->elements[0].word = (scm_t_bits) (bvec))
+#define SCM_GC_GET_CARD_FLAGS(card) \
+  ((long) ((card)->elements[1].word))
 #define SCM_GC_SET_CARD_FLAGS(card, flags) \
-    ((card)->word_1 = (SCM) (flags))
+  ((card)->elements[1].word = (flags))
 
 #define SCM_GC_GET_CARD_FLAG(card, shift) \
  (SCM_GC_GET_CARD_FLAGS (card) & (1L << (shift)))
@@ -141,12 +145,13 @@
  * in debug mode.  In particular these macros will even work for free cells,
  * which should never be encountered by user code.  */
 
-#define SCM_GC_CELL_OBJECT(x, n) (((SCM *)SCM2PTR (x)) [n])
-#define SCM_GC_CELL_WORD(x, n)   (SCM_UNPACK (SCM_GC_CELL_OBJECT ((x), (n))))
+#define SCM_GC_CELL_WORD(x, n) (((SCM2PTR (x))->elements[n]).word)
+#define SCM_GC_CELL_OBJECT(x, n) (((SCM2PTR (x))->elements[n]).object)
 
-#define SCM_GC_SET_CELL_OBJECT(x, n, v) ((((SCM *)SCM2PTR (x)) [n]) = (v))
+#define SCM_GC_SET_CELL_OBJECT(x, n, v) \
+  (((SCM2PTR (x))->elements[n]).object = (v))
 #define SCM_GC_SET_CELL_WORD(x, n, v)  \
-  (SCM_GC_SET_CELL_OBJECT ((x), (n), SCM_PACK (v)))
+  (((SCM2PTR (x))->elements[n]).word = (v))
 
 #define SCM_GC_CELL_TYPE(x) (SCM_GC_CELL_OBJECT ((x), 0))
 
@@ -214,7 +219,10 @@
   (SCM_GC_SET_CELL_OBJECT ((x), 1, (v)))
 
 
-#define SCM_CELL_OBJECT_LOC(x, n) (SCM_VALIDATE_CELL((x), &SCM_GC_CELL_OBJECT ((x), (n))))
+#define SCM_CELL_WORD_LOC(x, n) \
+  (SCM_VALIDATE_CELL((x), &SCM_GC_CELL_WORD ((x), (n))))
+#define SCM_CELL_OBJECT_LOC(x, n) \
+  (SCM_VALIDATE_CELL((x), &SCM_CELL_OBJECT ((x), (n))))
 #define SCM_CARLOC(x)             (SCM_CELL_OBJECT_LOC ((x), 0))
 #define SCM_CDRLOC(x)             (SCM_CELL_OBJECT_LOC ((x), 1))
 
Index: guile-core/libguile/numbers.h
===================================================================
RCS file: /cvsroot/guile/guile/guile-core/libguile/numbers.h,v
retrieving revision 1.93
diff -u -r1.93 numbers.h
--- guile-core/libguile/numbers.h	9 Aug 2004 23:32:14 -0000	1.93
+++ guile-core/libguile/numbers.h	20 Aug 2004 17:45:05 -0000
@@ -142,7 +142,7 @@
 #define SCM_COMPLEX_IMAG(x) (SCM_COMPLEX_MEM (x)->imag)
 
 /* Each bignum is just an mpz_t stored in a double cell starting at word 1. */
-#define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_OBJECT_LOC((x),1))))
+#define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_WORD_LOC ((x), 1))))
 #define SCM_BIGP(x) (!SCM_IMP (x) && SCM_TYP16 (x) == scm_tc16_big)
 
 #define SCM_NUMBERP(x) (SCM_I_INUMP(x) || SCM_NUMP(x))

[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-08-20 19:17             ` Dirk Herrmann
@ 2004-08-21 16:16               ` Marius Vollmer
  2004-10-03  9:09                 ` Dirk Herrmann
  0 siblings, 1 reply; 25+ messages in thread
From: Marius Vollmer @ 2004-08-21 16:16 UTC (permalink / raw)
  Cc: Paul Jarc, guile-devel

Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes:

>>  The reason is that there exits code that does essentially this:
>>
>>  scm_t_bits heap_field;
>>
>>  SCM value = whatever (); SCM *ptr = (SCM *)&heap_field; *ptr = value;
>
> I assume that you mean that heap_field is actually an element of the heap.

Yes.

> We already had the discussion that I suggest to discourage this
> style of coding since it violates a potential write barrier and will
> lead to problems if we ever switch to a generational garbage
> collection.

Yes, that is the bigger issue.  What we are discussing here are quite
minor points, I'd say.  There might be a time when we do want to have
a write-barrier and then we can revisit whether to provide the *LOC
accessors or not.  Right now, removing them is not necessary.  We
should only remove them when there is an immediate benefit.

> In particular, I have a problem with the following lines of code.
>
>   In gc.h:
>
>     #define SCM_GC_CELL_WORD(x, n)   (SCM_UNPACK (SCM_GC_CELL_OBJECT
> ((x), (n))))
>
>     This expression has a SCM value as an intermediate result, which
> is definitely unclean, since the SCM value might (in contrast to the
> definition of SCM) not represent a valid scheme object.

Yes, that troubles me also a bit.  But I get over it by realizing that
we only really have one type, the type 'machine word', and SCM and
scm_t_bits are essentially this same type, used to provide markup for
different uses of the basic type 'machine word'.  (In my view, it is
essential that Scheme values are represented as a machine word.  Using
some other type that doesn't fit into a machine register, for example,
would not be good enough.)

As far as the ordinary user is concerned, we only have one type to
represent a Scheme value, SCM.  We don't say what a SCM is (whether it
is a pointer, an integer, a struct, etc), only that you can assign it
with '='.

The internals of Guile, and unfortunately also a user that works with
smobs, need to know more about SCM: that it really is a machine word
and can be treated as an integral type.  To treat it as such, a SCM is
reinterpreted as a scm_t_bits.

I think we need to make the following guarantees:

  - a SCM and a scm_t_bits have the same size in the sense that they
    can store exactly the same things.  We always have

       SCM scm;
       scm_is_eq (SCM_PACK (SCM_UNPACK (scm)), scm)

    and

       scm_t_bits bits;
       SCM_UNPACK (SCM_PACK (bits)) == bits                     (*)

  - a size_t can be cast to scm_t_bits and back without losing
    information.  (This is for storing integers in heap words.)

  - a void* can be cast to scm_t_bits and back without losing
    information.  (This is for storing pointers in heap words.)

  - a scm_t_bits can be cast to void* and back without losing
    information.  (This is for storing SCMs in void* locations
    provided by external code.)

This is not as elegant and clean as dropping the guarantee (*), but it
allows heap words to be declared as type SCM which is desirable since
local variables and function arguments are also declared to be of type
SCM.


The reason that SCM is distinct from scm_t_bits at all is to get some
help from the C compiler in type checking.

>   In numbers.h:
>
>     #define SCM_I_BIG_MPZ(x) (*((mpz_t *) (SCM_CELL_OBJECT_LOC((x),1))))
>
>     This expression has a SCM* as an intermediate result, although in
> this case we _know_ that we are actually pointing to a scm_t_bits
> value.

No, we point at an array of three SCMs... ;) This is actually a
separate issue: the memory used by SCM_I_BIG_MPZ is always used as
only one type, as an mpz_t.

The reason that I changed all heap words to be declared as SCM was
that previously some heap words would be written as a SCM and then
read as a scm_t_bits.  This is also the reason why I think that a
union does not help at all: with such a union, we would write into one
member and then read from the other.  This is just as unclean as
casting a pointer to scm_t_bits to a pointer to SCM.

> Thus, I would just go ahead and apply it within the next couple of
> days.

Please do not apply it.  We are not completely clean, true, but I
doubt that we can attain perfect cleanliness anyway.  Using a union
would just complicate the issue without giving any benefit (that I
could see).


Things started out simple, and got more complicated with the
introduction of scm_t_bits as an alias of SCM.  Let's not continue
this trend by pretending that SCM and scm_t_bits are actually separate
types.  They are not, they are the same type essentially, but one
allows certain low-level operations that the other prevents.

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-08-21 16:16               ` Marius Vollmer
@ 2004-10-03  9:09                 ` Dirk Herrmann
  2004-10-04 14:12                   ` Marius Vollmer
  0 siblings, 1 reply; 25+ messages in thread
From: Dirk Herrmann @ 2004-10-03  9:09 UTC (permalink / raw)
  Cc: guile-devel

Marius Vollmer wrote:

>  Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes:
>
> >> The reason is that there exits code that does essentially this:
> >>
> >> scm_t_bits heap_field;
> >>
> >> SCM value = whatever (); SCM *ptr = (SCM*)&heap_field; *ptr =
> >> value;
> >
> > I assume that you mean that heap_field is actually an element of
> > the heap.
>
>  Yes.
>
> > We already had the discussion that I suggest to discourage this
> > style of coding since it violates a potential write barrier and
> > will lead to problems if we ever switch to a generational garbage
> > collection.
>
>  Yes, that is the bigger issue. What we are discussing here are quite
>  minor points, I'd say. There might be a time when we do want to
>  have a write-barrier and then we can revisit whether to provide the
>  *LOC accessors or not. Right now, removing them is not necessary.
>  We should only remove them when there is an immediate benefit.

This did not answer my question from my previous mail, where I asked you 
for a clear statement about whether this style of coding is to be 
considered discouraged: Is there a strategy towards a generational gc, 
or not? Shall new code make use of the *LOC accessors or shall 
developers avoid it?

>  I think we need to make the following guarantees:
>
>  - a SCM and a scm_t_bits have the same size in the sense that they > 
can store exactly the same things. We always have
>
>  SCM scm; scm_is_eq (SCM_PACK (SCM_UNPACK (scm)), scm)
>
>  and
>
>  scm_t_bits bits; SCM_UNPACK (SCM_PACK (bits)) == bits > (*)
>
>  - a size_t can be cast to scm_t_bits and back without losing
>  information. (This is for storing integers in heap words.)
>
>  - a void* can be cast to scm_t_bits and back without losing
>  information. (This is for storing pointers in heap words.)
>
>  - a scm_t_bits can be cast to void* and back without losing
>  information. (This is for storing SCMs in void* locations provided
>  by external code.)
>
>  This is not as elegant and clean as dropping the guarantee (*), but
>  it allows heap words to be declared as type SCM which is desirable
>  since local variables and function arguments are also declared to be
>  of type SCM.

I am not quite sure, what you really want to state with (*):

A) For a variable "bits" of type "scm_t_bits" that holds a 
representation of a SCM value the expression "SCM_UNPACK (SCM_PACK 
(bits)) == bits" is guaranteed to evaluate to true.

B) For any variable "bits" of type "scm_t_bits" the expression 
"SCM_UNPACK (SCM_PACK (bits)) == bits" is guaranteed to evaluate to true.

If you only meant to say A), then I agree that we should make the above 
guarantees, including (*), if it was more accurately formulated.

If you meant to say B), then I disagree.  It is one thing if Guile 
internally makes use of the fact that things happen to work that way 
(like you have done with your modification to scm_t_cell).  It is 
another thing to make this an official guarantee.

>  Please do not apply it. We are not completely clean, true, but I
>  doubt that we can attain perfect cleanliness anyway. Using a union > 
would just complicate the issue without giving any benefit (that I > 
could see).

The effects of the patch are quite local. No other line of code in guile 
apart from the ones seen in the patch have to be modified.

Benefits are:
- Ease of documentation, since the current implementation needs to be 
documented in gc.h as a local uncleanlyness of the otherwise quite 
consequently used distinction between SCM and scm_t_bits.  Maybe I am 
assuming too much here, but you had planned to document this, right? :-)
- Possibility to modify SCM_PACK such that it does an assertion check on 
the value before actually converting it.

>  Things started out simple, and got more complicated with the
>  introduction of scm_t_bits as an alias of SCM. Let's not continue
>  this trend by pretending that SCM and scm_t_bits are actually
>  separate types. They are not, they are the same type essentially,
>  but one allows certain low-level operations that the other prevents.

I think you are confusing a type as a collection of values, and the 
underlying representation.  The type SCM is distinct from the type 
scm_t_bits, since only a subset of the values of type scm_t_bits can be 
used to create a valid SCM.  It is directly comparable to having a type 
"natural numbers" implemented using a C unsigned int, and a type "even 
natural numbers", also implemented as a C unsigned int.  Both have the 
same underlying representation, but are different types.  Despite the 
poor type checking offered by standard C, by using tools like splint it 
is even possible to get warnings about undesired mixes between different 
typedefs in C code.

Best regards,
Dirk



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: The relationship between SCM and scm_t_bits.
  2004-10-03  9:09                 ` Dirk Herrmann
@ 2004-10-04 14:12                   ` Marius Vollmer
  0 siblings, 0 replies; 25+ messages in thread
From: Marius Vollmer @ 2004-10-04 14:12 UTC (permalink / raw)
  Cc: guile-devel

Dirk Herrmann <dirk@dirk-herrmanns-seiten.de> writes:

> This did not answer my question from my previous mail, where I asked
> you for a clear statement about whether this style of coding is to be
> considered discouraged: Is there a strategy towards a generational gc,
> or not? Shall new code make use of the *LOC accessors or shall
> developers avoid it?

There is no strategy towards a generational GC or any other scheme
that requires a write-barrier.

I'd say we could 'discourage' the *LOC accessors.  We then need to
have Guile itself stop using them.  (See NEWS for an explanation of
'discouraged'.)

I will answer the rest later.  I really don't want to put much more
thought into this, anyway.  To me, SCM and scm_t_bits are the same
type (the machine word), only that SCM is rigged to trick the compiler
into not allowing certain operations that we don't want to allow.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-10-04 14:12 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-03 15:06 The relationship between SCM and scm_t_bits Marius Vollmer
2004-05-03 16:10 ` Marius Vollmer
2004-05-03 16:21 ` Paul Jarc
2004-05-04 13:53 ` Marius Vollmer
2004-05-04 17:16   ` Paul Jarc
2004-05-04 17:49     ` Marius Vollmer
2004-05-04 18:35       ` Paul Jarc
2004-05-05 10:00         ` Marius Vollmer
2004-05-05 14:58           ` Paul Jarc
2004-05-10 13:42 ` Marius Vollmer
2004-05-15  7:31 ` Dirk Herrmann
2004-05-17 18:09   ` Marius Vollmer
2004-05-15 15:00 ` Dirk Herrmann
2004-05-15 16:42   ` Dirk Herrmann
2004-05-17 19:22   ` Marius Vollmer
2004-05-17 20:17     ` Paul Jarc
2004-05-21 19:37     ` Dirk Herrmann
2004-05-21 20:30       ` Paul Jarc
2004-05-22  6:48         ` Dirk Herrmann
2004-05-23 15:03           ` Paul Jarc
2004-08-09 21:09           ` Marius Vollmer
2004-08-20 19:17             ` Dirk Herrmann
2004-08-21 16:16               ` Marius Vollmer
2004-10-03  9:09                 ` Dirk Herrmann
2004-10-04 14:12                   ` Marius Vollmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).