* the new gc asserts in master
@ 2008-08-26 20:12 Andy Wingo
2008-08-27 2:12 ` Han-Wen Nienhuys
0 siblings, 1 reply; 20+ messages in thread
From: Andy Wingo @ 2008-08-26 20:12 UTC (permalink / raw)
To: guile-devel
Hi,
I just merged master to guile-vm, but I'm not sure if I really wanted to
do that now. Normal test suites are failing:
lt-guile: gc.c:604: scm_i_gc: Assertion `scm_cells_allocated == scm_i_marked_count ()' failed.
/home/wingo/src/guile/vm/test-suite/standalone/test-use-srfi: line 27: 29507 Aborted guile -q --use-srfi=1,10 > /dev/null <<EOF
(if (and (defined? 'partition)
(defined? 'define-reader-ctor))
(exit 0) ;; good
(exit 1)) ;; bad
EOF
guile --use-srfi=1,10 fails to run
FAIL: test-use-srfi
This is on a core 2 duo, in 32-bit mode, configured as:
CFLAGS="-g -O2" ./configure --with-threads --enable-maintainer-mode --prefix=/opt/guile-vm
So it seems that the new gc "cleanups" don't want you to touch mark bits
outside the mark phase. This is incompatible with other uses inside
guile itself, e.g. the SCM_DEBUG_CELL_ACCESSES == 1 case in inline.h, or
even the lazy smob case I wrote about here:
http://thread.gmane.org/gmane.lisp.guile.user/6372
There are more cases, rgrep for SCM_SET_GC_MARK in libguile/*.[ch].
I'm going to commit an #if 0 around those asserts in the vm branch,
because I don't have the brain power to deal with it. (It is irritating
that I have to even write this mail.) Certainly if the near-term choice
is between inaccurate statistics and calls to abort(), I know which
choice I prefer...
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-26 20:12 the new gc asserts in master Andy Wingo
@ 2008-08-27 2:12 ` Han-Wen Nienhuys
2008-08-27 2:41 ` Han-Wen Nienhuys
2008-08-27 7:43 ` Ludovic Courtès
0 siblings, 2 replies; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-27 2:12 UTC (permalink / raw)
To: guile-devel
Andy Wingo escreveu:
> Hi,
>
> I just merged master to guile-vm, but I'm not sure if I really wanted to
> do that now. Normal test suites are failing:
>
> lt-guile: gc.c:604: scm_i_gc: Assertion `scm_cells_allocated == scm_i_marked_count ()' failed.
> /home/wingo/src/guile/vm/test-suite/standalone/test-use-srfi: line 27: 29507 Aborted guile -q --use-srfi=1,10 > /dev/null <<EOF
> (if (and (defined? 'partition)
> (defined? 'define-reader-ctor))
> (exit 0) ;; good
> (exit 1)) ;; bad
> EOF
>
> guile --use-srfi=1,10 fails to run
> FAIL: test-use-srfi
>
> This is on a core 2 duo, in 32-bit mode, configured as:
>
> CFLAGS="-g -O2" ./configure --with-threads --enable-maintainer-mode --prefix=/opt/guile-vm
>
> So it seems that the new gc "cleanups" don't want you to touch mark bits
> outside the mark phase. This is incompatible with other uses inside
> guile itself, e.g. the SCM_DEBUG_CELL_ACCESSES == 1 case in inline.h, or
I am pushing a fix for this to master.
> even the lazy smob case I wrote about here:
>
> http://thread.gmane.org/gmane.lisp.guile.user/6372
I would classify the use of mark bits outside of the mark phase as outside
of the defined API. If you want to have weak pointer semantics, use
a weak hashtable, or implement reference counting on the C side.
I am actuallly inclined to add add abort() for anyone who calls scm_gc_mark()
outside the marking phase.
> There are more cases, rgrep for SCM_SET_GC_MARK in libguile/*.[ch].
I looked through all of these, and these all happen during the mark phase.
> I'm going to commit an #if 0 around those asserts in the vm branch,
> because I don't have the brain power to deal with it. (It is irritating
> that I have to even write this mail.) Certainly if the near-term choice
> is between inaccurate statistics and calls to abort(), I know which
> choice I prefer...
The statistics form the basis of the allocation strategy, so they are not
a 'cute' feature. If these statistics go off, we overallocate, or spend too
much time trying garbage collect when there is nothing to reclaim.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 2:12 ` Han-Wen Nienhuys
@ 2008-08-27 2:41 ` Han-Wen Nienhuys
2008-08-27 7:43 ` Ludovic Courtès
1 sibling, 0 replies; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-27 2:41 UTC (permalink / raw)
To: guile-devel
Han-Wen Nienhuys escreveu:
>> even the lazy smob case I wrote about here:
>>
>> http://thread.gmane.org/gmane.lisp.guile.user/6372
>
> I would classify the use of mark bits outside of the mark phase as outside
> of the defined API. If you want to have weak pointer semantics, use
> a weak hashtable, or implement reference counting on the C side.
>
> I am actuallly inclined to add add abort() for anyone who calls scm_gc_mark()
> outside the marking phase.
Also, you're creating a race condition: the mark bits are not protected by a lock,
so you will be screwed in still more interesting ways if more threads or types
would start doing this.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 2:12 ` Han-Wen Nienhuys
2008-08-27 2:41 ` Han-Wen Nienhuys
@ 2008-08-27 7:43 ` Ludovic Courtès
2008-08-27 14:00 ` Han-Wen Nienhuys
1 sibling, 1 reply; 20+ messages in thread
From: Ludovic Courtès @ 2008-08-27 7:43 UTC (permalink / raw)
To: guile-devel
Hi,
Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
> I am pushing a fix for this to master.
Will you care to post and discuss your patches before pushing them?
This is all the more important that the patches don't seem to have any
relation with the problem at hand:
f85ea2a85fcdd051f432964806f044c0301d0945 Merge branch 'master' of git://git.sv.gnu.org/guile into nits
487b9dec2ea6b88ddbc6fbd17f445ddb197aebc5 Only sanity check numbers if SCM_DEBUG_CELL_ACCESSES is unset.
80237dcc7783b4d94ecf1d987deb9306d61735a0 Set SRCPROP{PLIST,COPY} through a macro, so SCM_DEBUG_CELL_ACCESSES compiles.
Can you please describe them and add ChangeLog entries (yes, we still
use that)?
In addition, they don't fix anything on x86-64:
$ ./pre-inst-guile
lt-guile: gc.c:610: scm_i_gc: Assertion `scm_i_gc_sweep_stats.collected + scm_cells_allocated == scm_i_gc_sweep_stats.swept' failed.
Aborted (core dumped)
Do you think you can come up with a fix within the next few days?
Otherwise, I'm inclined to revert the offending commits in `master' and
wait for a signal from you (i.e., a patch or merge request posted to the
mailing list, *not* a commit on `master'). It would make it easier for
us to play with `master' in the meantime.
Besides, avoid pushing from an non-up-to-date repo: this yields to
automatic merges like the one above, which is annoying as it makes
history harder to follow. Better pull first, then merge your changes,
then push.
>> even the lazy smob case I wrote about here:
>>
>> http://thread.gmane.org/gmane.lisp.guile.user/6372
>
> I would classify the use of mark bits outside of the mark phase as outside
> of the defined API. If you want to have weak pointer semantics, use
> a weak hashtable, or implement reference counting on the C side.
That's a reasonable argument, but it's something we should not change
without discussing it first. For instance, it may be important to study
why Guile-GNOME had to resort to this, and how it could avoid it,
instead of just gratuitously breaking it.
Thanks,
Ludo'.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 7:43 ` Ludovic Courtès
@ 2008-08-27 14:00 ` Han-Wen Nienhuys
2008-08-27 15:38 ` Ludovic Courtès
2008-08-27 19:00 ` Andy Wingo
0 siblings, 2 replies; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-27 14:00 UTC (permalink / raw)
To: guile-devel
Ludovic Courtès escreveu:
> Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
>
>> I am pushing a fix for this to master.
>
> Will you care to post and discuss your patches before pushing them?
Over the last weeks I have seen little discussion on your patches, eg.
commit 582a4997abc8b34ac6caf374fda8ea3ac65bd571
Author: Ludovic Courtès <ludo@gnu.org>
Date: Mon Aug 25 11:20:02 2008 +0200
Use $(GCC_CFLAGS) for `-Werror' et al. so that it's not used to compile
Gnulib code.
commit c95514b3b41c8e335ada863f8abb99cc4af9abe1
Author: Ludovic Courtès <ludo@gnu.org>
Date: Thu Aug 14 00:15:03 2008 +0200
Remove the now useless `qthreads.m4'.
Were pushed without review. There was a post on
commit 450be18dfffd496ef14e1c921953e6f179727ab4
Author: Ludovic Courtès <ludo@gnu.org>
Date: Thu Jul 17 00:17:56 2008 +0200
Handle lack of `struct dirent64' and `readdir64_r ()' on HP-UX 11.11.
but it was after the fact
Hi,
FYI, I committed the attached patch, which handles the lack of `struct
dirent64' and `readdir64_r ()' on HP-UX 11.11 (and possibly other
versions).
I'm assuming here that you -in good community spirit- don't consider
yourself to be above your own rules.
In other words: you can't do this. If you want people to discuss
before pushing you should set the good example.
> This is all the more important that the patches don't seem to have any
> relation with the problem at hand:
>
> f85ea2a85fcdd051f432964806f044c0301d0945 Merge branch 'master' of git://git.sv.gnu.org/guile into nits
> 487b9dec2ea6b88ddbc6fbd17f445ddb197aebc5 Only sanity check numbers if SCM_DEBUG_CELL_ACCESSES is unset.
> 80237dcc7783b4d94ecf1d987deb9306d61735a0 Set SRCPROP{PLIST,COPY} through a macro, so SCM_DEBUG_CELL_ACCESSES compiles.
This in reference to GUILE not compiling with SCM_DEBUG_CELL_ACCESSES
The commit message says,
Set SRCPROP{PLIST,COPY} through a macro, so SCM_DEBUG_CELL_ACCESSES compiles.
how much clearer do you want this message to be? It is fixing a
compilation problem for a certain preprocessor define. It doesn't
pretend to fix x86-64. You were complaining before that my changes
were too large and should have been split up. This patch is split
up.
Can you make up your mind?
> Do you think you can come up with a fix within the next few days?
In the spirit of your undocumented development and
community standards, I am including below a patch for this
problem. Let's discuss this complex change first to decide whether it
is worthy for inclusion in the oh-so-active GUILE repository.
> Otherwise, I'm inclined to revert the offending commits in `master' and
> wait for a signal from you (i.e., a patch or merge request posted to the
> mailing list, *not* a commit on `master'). It would make it easier for
> us to play with `master' in the meantime.
> Besides, avoid pushing from an non-up-to-date repo: this yields to
> automatic merges like the one above, which is annoying as it makes
> history harder to follow. Better pull first, then merge your changes,
> then push.
Can you document your requirements upfront instead complaining after the fact?
[lilydev@haring guile]$ grep -i merge HACKING
[lilydev@haring guile]$
[lilydev@haring guile]$ grep -i pull HACKING
[lilydev@haring guile]$
>>> even the lazy smob case I wrote about here:
>>>
>>> http://thread.gmane.org/gmane.lisp.guile.user/6372
>> I would classify the use of mark bits outside of the mark phase as outside
>> of the defined API. If you want to have weak pointer semantics, use
>> a weak hashtable, or implement reference counting on the C side.
>
> That's a reasonable argument, but it's something we should not change
> without discussing it first. For instance, it may be important to study
> why Guile-GNOME had to resort to this, and how it could avoid it,
> instead of just gratuitously breaking it.
I'm not suggesting to change without discussing; this message rather
is the start of the discussion. I think reference counting is the
correct solution for this, as far as I understand the problem from the
quoted message.
The use of scm_gc_mark() outside of GC is fundamentally broken, since it
creates race conditions in the presence of threads.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
From ccd010e15ec0ddf285b75911739e85866d2d865c Mon Sep 17 00:00:00 2001
From: Han-Wen Nienhuys <hanwen@lilypond.org>
Date: Wed, 27 Aug 2008 10:48:06 -0300
Subject: [PATCH] Kludge around x86-64 GC runtime checks.
2008-08-27 Han-Wen Nienhuys <hanwen@lilypond.org>
* gc.c (scm_i_gc): Don't sanity check numbers on x64, while we
investigate a real fix.
---
libguile/gc.c | 4 ++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/libguile/gc.c b/libguile/gc.c
index 8e8769c..a0b3080 100644
--- a/libguile/gc.c
+++ b/libguile/gc.c
@@ -597,9 +597,9 @@ scm_i_gc (const char *what)
scm_i_sweep_all_segments ("GC", &scm_i_gc_sweep_stats);
scm_check_deprecated_memory_return ();
+#if (SCM_DEBUG_CELL_ACCESSES == 0 && SCM_SIZEOF_UNSIGNED_LONG == 4)
/* Sanity check our numbers. */
-
-#if (SCM_DEBUG_CELL_ACCESSES == 0)
+ /* TODO(hanwen): figure out why the stats are off on x64_64. */
/* If this was not true, someone touched mark bits outside of the
mark phase. */
assert (scm_cells_allocated == scm_i_marked_count ());
--
1.5.5.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 14:00 ` Han-Wen Nienhuys
@ 2008-08-27 15:38 ` Ludovic Courtès
2008-08-28 4:08 ` Han-Wen Nienhuys
2008-08-28 4:14 ` Han-Wen Nienhuys
2008-08-27 19:00 ` Andy Wingo
1 sibling, 2 replies; 20+ messages in thread
From: Ludovic Courtès @ 2008-08-27 15:38 UTC (permalink / raw)
To: guile-devel
Hello Han-Wen,
Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
> Over the last weeks I have seen little discussion on your patches, eg.
Please, refrain from resorting to personal attacks.
> commit 582a4997abc8b34ac6caf374fda8ea3ac65bd571
> Author: Ludovic Courtès <ludo@gnu.org>
> Date: Mon Aug 25 11:20:02 2008 +0200
>
> Use $(GCC_CFLAGS) for `-Werror' et al. so that it's not used to compile
> Gnulib code.
>
> commit c95514b3b41c8e335ada863f8abb99cc4af9abe1
> Author: Ludovic Courtès <ludo@gnu.org>
> Date: Thu Aug 14 00:15:03 2008 +0200
>
> Remove the now useless `qthreads.m4'.
These patches are trivial IMO and contain a descriptive git log and
ChangeLog entry. Compare this with the ten-or-so patches that you
committed, which included a far-from-trivial rework of the GC.
From my observations, Neil, Kevin and I have always worked this way,
although this was never formally specified. Perhaps Neil can comment?
> The commit message says,
>
> Set SRCPROP{PLIST,COPY} through a macro, so SCM_DEBUG_CELL_ACCESSES compiles.
>
> how much clearer do you want this message to be?
Sorry if I missed something but my understanding was that you were
referring to a GC fix, which it isn't. Re-reading the thread, it seems
I indeed missed the point, and I apologize. For my defense, I'd say
that I didn't expect a "GC cleanup" to touch all that.
>>>> even the lazy smob case I wrote about here:
>>>>
>>>> http://thread.gmane.org/gmane.lisp.guile.user/6372
>>> I would classify the use of mark bits outside of the mark phase as outside
>>> of the defined API. If you want to have weak pointer semantics, use
>>> a weak hashtable, or implement reference counting on the C side.
>>
>> That's a reasonable argument, but it's something we should not change
>> without discussing it first. For instance, it may be important to study
>> why Guile-GNOME had to resort to this, and how it could avoid it,
>> instead of just gratuitously breaking it.
>
> I'm not suggesting to change without discussing; this message rather
> is the start of the discussion.
Good. However, my understanding of Andy's message was that the "GC
cleanups" already changed that, making Guile-GNOME's recipe fail.
> From ccd010e15ec0ddf285b75911739e85866d2d865c Mon Sep 17 00:00:00 2001
> From: Han-Wen Nienhuys <hanwen@lilypond.org>
> Date: Wed, 27 Aug 2008 10:48:06 -0300
> Subject: [PATCH] Kludge around x86-64 GC runtime checks.
>
> 2008-08-27 Han-Wen Nienhuys <hanwen@lilypond.org>
>
> * gc.c (scm_i_gc): Don't sanity check numbers on x64, while we
> investigate a real fix.
>
> ---
> libguile/gc.c | 4 ++--
> 2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/libguile/gc.c b/libguile/gc.c
> index 8e8769c..a0b3080 100644
> --- a/libguile/gc.c
> +++ b/libguile/gc.c
> @@ -597,9 +597,9 @@ scm_i_gc (const char *what)
> scm_i_sweep_all_segments ("GC", &scm_i_gc_sweep_stats);
> scm_check_deprecated_memory_return ();
>
> +#if (SCM_DEBUG_CELL_ACCESSES == 0 && SCM_SIZEOF_UNSIGNED_LONG == 4)
x86-64 is not the only arch with 4-byte long long integers.
I'm still in favor of "git revert" since the log message makes it clear
which patch was reverted and why. "We" can then take our time and work
out a proper fix, and finally re-merge the patch plus its fix.
Furthermore, in the eventuality where none of us eventually finds a fix,
`master' is left in the previous state, which is better IMO.
Would you like that solution?
Thanks,
Ludo'.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 14:00 ` Han-Wen Nienhuys
2008-08-27 15:38 ` Ludovic Courtès
@ 2008-08-27 19:00 ` Andy Wingo
2008-08-27 20:22 ` Andy Wingo
2008-08-28 4:04 ` Han-Wen Nienhuys
1 sibling, 2 replies; 20+ messages in thread
From: Andy Wingo @ 2008-08-27 19:00 UTC (permalink / raw)
To: hanwen; +Cc: guile-devel
Hi,
On Wed 27 Aug 2008 07:00, Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
>>>> http://thread.gmane.org/gmane.lisp.guile.user/6372
>
> I think reference counting is the correct solution for this, as far as
> I understand the problem from the quoted message.
I don't think so; the use case is that (1) we don't want to prevent the
C object from being freed, so we don't want to hold a reference on the C
object; but (2) we do want to know when it is freed, so we can release
our cache; but (3) we want to get the scheme object back if the object
has not in fact been swept.
But the laziness of the sweeper prevents us from knowing whether the
cache that we have is in fact accessible, because there is a time
between the mark phase (in which the object might become sweepable) and
when the smob's free function is called in the sweep phase (which would
invalidate the cache).
> The use of scm_gc_mark() outside of GC is fundamentally broken, since it
> creates race conditions in the presence of threads.
I was not aware that this was the case.
My impression was that the mark phase is global; it requires all threads
that were in guile mode to go dormant, and those that were not in guile
mode cannot enter guile mode until the mark is complete.
So if I have a thread in guile mode, it is not in the mark phase, hence
no race. Also, it would not be sweeping; I can check the cache and
retrieve and mark the object without the thread of interest doing a
sweep(). But perhaps some other thread would sweep that card, in which
case I guess I can see where the problem would come in.
It's very irksome that I missed this bit in the documentation.
So, my proposal to fix this is to expose the sweep mutex as part of the
API somehow, perhaps as e.g.
void* scm_with_sweep_mutex (void* (*with_mutex_func)(void*), void*);
or so. How do you feel about this? I know it constrains your GC
implementation, but threads + lazy sweeping + integrating with C
libraries = exposing some minimal amount of low-level details.
Regards,
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 19:00 ` Andy Wingo
@ 2008-08-27 20:22 ` Andy Wingo
2008-08-28 4:33 ` Han-Wen Nienhuys
2008-08-28 14:17 ` Ludovic Courtès
2008-08-28 4:04 ` Han-Wen Nienhuys
1 sibling, 2 replies; 20+ messages in thread
From: Andy Wingo @ 2008-08-27 20:22 UTC (permalink / raw)
To: hanwen; +Cc: guile-devel
Hi again!
On Wed 27 Aug 2008 12:00, Andy Wingo <wingo@pobox.com> writes:
> On Wed 27 Aug 2008 07:00, Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
>
>>>>> http://thread.gmane.org/gmane.lisp.guile.user/6372
>>
>> I think reference counting is the correct solution for this, as far as
>> I understand the problem from the quoted message.
>
> I don't think so; the use case is that (1) we don't want to prevent the
> C object from being freed, so we don't want to hold a reference on the C
> object; but (2) we do want to know when it is freed, so we can release
> our cache; but (3) we want to get the scheme object back if the object
> has not in fact been swept.
I don't think this is exactly right. I was discussing this on IRC with
Ludovic and he made me come out with a better characterization of the
problem.
Consider a C object, `C'. We wrap it in scheme with a SCM object, `S'. S
has a reference on C, using reference counting. C has some kind of API
to associate S with it: set_ptr() and get_ptr(). Cool.
So what happens if we get C back from a callback at some time in the
future? Well we call get_ptr(C) and return that if it's non-null.
Otherwise we make a new smob and call set_ptr (C, S) and then return S.
So what happens if the scheme object becomes collectable? Well S has a
free function which will unref the C object and set_ptr (C, NULL). This
is also OK.
But what if it goes like this:
S becomes collectable in theory
mark phase: S is indeed marked as collectable
C is returned from a callback: get_ptr() return S
at some later time the card containing S is swept; S's free function
is run, and S is marked as a free cell
at some later point maybe S gets reused for some other purpose
however S was already alive in scheme, and we are using it as a smob!
The point is:
You cannot do C->Scheme mapping reliably in the presence of lazy
sweeping, because there is a time in which the object is marked as
sweepable but not swept, but the C->Scheme code has no way of knowing
this.
(While talking with Ludovic we realized that his code has this problem.)
My solution was to re-mark when you do get_ptr(), but I see now where
the problem with that is.
> void* scm_with_sweep_mutex (void* (*with_mutex_func)(void*), void*);
I think this is not so elegant, but it is correct.
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 19:00 ` Andy Wingo
2008-08-27 20:22 ` Andy Wingo
@ 2008-08-28 4:04 ` Han-Wen Nienhuys
2008-08-28 13:20 ` Han-Wen Nienhuys
1 sibling, 1 reply; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-28 4:04 UTC (permalink / raw)
To: guile-devel
Andy Wingo escreveu:
>> I think reference counting is the correct solution for this, as far as
>> I understand the problem from the quoted message.
>
> I don't think so; the use case is that (1) we don't want to prevent the
> C object from being freed, so we don't want to hold a reference on the C
> object; but (2) we do want to know when it is freed, so we can release
> our cache; but (3) we want to get the scheme object back if the object
> has not in fact been swept.
I don't understand how reference counting would prevent you from doing
this. Reference counting only is broken if you have (lots of) cyclical
references. Do you have those?
With ref-counting you would loose the unique C object <-> SCM object
mapping, since you could create another reference (another SCM object)
for a given C object if a SCM reference was lost, but a new one
created in the same GC cycle. From what I've read, this should not be
a problem. Are you relying on the SCM representation of the C object
being unique? If not, there should not be a problem.
> But the laziness of the sweeper prevents us from knowing whether the
> cache that we have is in fact accessible, because there is a time
> between the mark phase (in which the object might become sweepable) and
> when the smob's free function is called in the sweep phase (which would
> invalidate the cache).
>> The use of scm_gc_mark() outside of GC is fundamentally broken, since it
>> creates race conditions in the presence of threads.
>
> I was not aware that this was the case.
>
> My impression was that the mark phase is global; it requires all threads
> that were in guile mode to go dormant, and those that were not in guile
> mode cannot enter guile mode until the mark is complete.
Yes, the mark phase is global, but the thread locking is done in
scm_i_gc; once the marking starts, there is only one thread. Since
scm_gc_mark is called from the smob mark functions, it does not force
other threads to go dormant. It could, but I suspect the lock would
be a contention point.
If you call scm_gc_mark() on your own schedule, you're venturing
outside the established model. From what I understand from your
description, you detect that an object is live outside of the mark
phase, which is what triggers the assertion, and which does not force
thread correctness.
> So if I have a thread in guile mode, it is not in the mark phase, hence
> no race. Also, it would not be sweeping; I can check the cache and
> retrieve and mark the object without the thread of interest doing a
> sweep(). But perhaps some other thread would sweep that card, in which
> case I guess I can see where the problem would come in.
An alternative is to have a GC hook, like struct does. It does tie you
to the GC implementation. I think something that works regardless of
the GC details is better for a user of GUILE.
> It's very irksome that I missed this bit in the documentation.
>
> So, my proposal to fix this is to expose the sweep mutex as part of the
> API somehow, perhaps as e.g.
>
> void* scm_with_sweep_mutex (void* (*with_mutex_func)(void*), void*);
>
> or so. How do you feel about this? I know it constrains your GC
> implementation, but threads + lazy sweeping + integrating with C
> libraries = exposing some minimal amount of low-level details.
I think it is best to expose as little of the GC implementation as
possible. The fact that there is a mutex already gives away
implementation details.
I'm a bit miffed that the current interface already gives away that we
won't rewrite (ie. compact) the current heap. Let's not paint
ourselves in more corners.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 15:38 ` Ludovic Courtès
@ 2008-08-28 4:08 ` Han-Wen Nienhuys
2008-08-28 4:14 ` Han-Wen Nienhuys
1 sibling, 0 replies; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-28 4:08 UTC (permalink / raw)
To: guile-devel
Ludovic Courtès escreveu:
> Sorry if I missed something but my understanding was that you were
> referring to a GC fix, which it isn't. Re-reading the thread, it seems
> I indeed missed the point, and I apologize.
I hope you do realize that every time you miss the point and send out a
reply you have me trembling with anger, and contemplating whether I should
stop contributing to GUILE right away or after it's patched up enough for LilyPond?
After some years of doing lilypond, I have become wise enough to delete the
expletives before I send an email, but please think and reread before you
send out something.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 15:38 ` Ludovic Courtès
2008-08-28 4:08 ` Han-Wen Nienhuys
@ 2008-08-28 4:14 ` Han-Wen Nienhuys
2008-08-28 7:15 ` Ludovic Courtès
1 sibling, 1 reply; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-28 4:14 UTC (permalink / raw)
To: guile-devel
Ludovic Courtès escreveu:
>> +#if (SCM_DEBUG_CELL_ACCESSES == 0 && SCM_SIZEOF_UNSIGNED_LONG == 4)
>
> x86-64 is not the only arch with 4-byte long long integers.
I'm not pretending it is the end-all fix. - we (I) need to understand
what is happening and make the numbers match up exactly. This is just
a kludge.
> I'm still in favor of "git revert" since the log message makes it clear
> which patch was reverted and why. "We" can then take our time and work
> out a proper fix, and finally re-merge the patch plus its fix.
> Furthermore, in the eventuality where none of us eventually finds a fix,
> `master' is left in the previous state, which is better IMO.
'master' in its previous states grows the heap to 600M doing the 1000-fold
version of srfi-18 test I posted. I think it's not a good solution.
Commenting out the assert for x86-64 should yield better behavior.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 20:22 ` Andy Wingo
@ 2008-08-28 4:33 ` Han-Wen Nienhuys
2008-08-28 14:17 ` Ludovic Courtès
1 sibling, 0 replies; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-28 4:33 UTC (permalink / raw)
To: guile-devel
Andy Wingo escreveu:
> Hi again!
>
> On Wed 27 Aug 2008 12:00, Andy Wingo <wingo@pobox.com> writes:
>
>> On Wed 27 Aug 2008 07:00, Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
>>
>>>>>> http://thread.gmane.org/gmane.lisp.guile.user/6372
>>> I think reference counting is the correct solution for this, as far as
>>> I understand the problem from the quoted message.
>> I don't think so; the use case is that (1) we don't want to prevent the
>> C object from being freed, so we don't want to hold a reference on the C
>> object; but (2) we do want to know when it is freed, so we can release
>> our cache; but (3) we want to get the scheme object back if the object
>> has not in fact been swept.
As far as I understand, you are splitting the decision whether an
object is live (reachable) between C and SCM - it's reachable when
there is an SCM reference, but it is also reachable when there is no
SCM reference anymore, but it is still reachable by looking at a C
table and returning the SCM reference inside the C struct.
You can never get a consistent model if you try to split it up like that.
Either the memory management is entirely in SCM (using (weak) hash tables
in SCM and GC which decides whether an object is reachable) or the memory
management is entirely in C (eg. using ref counting).
If you try to mix both, you'll get a frankenstein with cloudy semantics
and edgy corner cases.
> I don't think this is exactly right. I was discussing this on IRC with
> Ludovic and he made me come out with a better characterization of the
> problem.
>
> Consider a C object, `C'. We wrap it in scheme with a SCM object, `S'. S
> has a reference on C, using reference counting. C has some kind of API
> to associate S with it: set_ptr() and get_ptr(). Cool.
>
> So what happens if we get C back from a callback at some time in the
> future? Well we call get_ptr(C) and return that if it's non-null.
> Otherwise we make a new smob and call set_ptr (C, S) and then return S.
>
> So what happens if the scheme object becomes collectable? Well S has a
> free function which will unref the C object and set_ptr (C, NULL). This
> is also OK.
>
> But what if it goes like this:
>
> S becomes collectable in theory
>
> mark phase: S is indeed marked as collectable
>
> C is returned from a callback: get_ptr() return S
>
> at some later time the card containing S is swept; S's free function
> is run, and S is marked as a free cell
Here you are dividing the responsibity of liveness between SCM and C. SCM
decides the object is dead, but you hold on to it in in C, and then insert the
dead reference back into SCM, leading to crappage.
I think it is wrong to equate (destructor has run) with (object is
reachable). One of the important points of GC (RAII will say it's a
detractor) vs. C++ smart pointer is that going out of scope and
running the destructor to go with that can happen at different points
in time. Between these points, the objects are in a sort of limbo:
you can't do return them back to the 'live' part of the program, but
the destructror hasn't run yet.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-28 4:14 ` Han-Wen Nienhuys
@ 2008-08-28 7:15 ` Ludovic Courtès
2008-09-03 4:39 ` Han-Wen Nienhuys
0 siblings, 1 reply; 20+ messages in thread
From: Ludovic Courtès @ 2008-08-28 7:15 UTC (permalink / raw)
To: guile-devel
Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
> Ludovic Courtès escreveu:
>> I'm still in favor of "git revert" since the log message makes it clear
>> which patch was reverted and why. "We" can then take our time and work
>> out a proper fix, and finally re-merge the patch plus its fix.
>> Furthermore, in the eventuality where none of us eventually finds a fix,
>> `master' is left in the previous state, which is better IMO.
>
> 'master' in its previous states grows the heap to 600M doing the 1000-fold
> version of srfi-18 test I posted. I think it's not a good solution.
>
> Commenting out the assert for x86-64 should yield better behavior.
Alright, then please go ahead.
Thanks,
Ludo'.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-28 4:04 ` Han-Wen Nienhuys
@ 2008-08-28 13:20 ` Han-Wen Nienhuys
0 siblings, 0 replies; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-08-28 13:20 UTC (permalink / raw)
To: guile-devel
Han-Wen Nienhuys escreveu:
>>> The use of scm_gc_mark() outside of GC is fundamentally broken, since it
>>> creates race conditions in the presence of threads.
>> I was not aware that this was the case.
>>
>> My impression was that the mark phase is global; it requires all threads
>> that were in guile mode to go dormant, and those that were not in guile
>> mode cannot enter guile mode until the mark is complete.
>
> Yes, the mark phase is global, but the thread locking is done in
> scm_i_gc; once the marking starts, there is only one thread. Since
> scm_gc_mark is called from the smob mark functions, it does not force
> other threads to go dormant. It could, but I suspect the lock would
> be a contention point.
It would be very cool to have thread safe marking for a different reason:
marking it is the expensive step in GC, so if we can do that in N threads concurrently
(on a SMP machine) we have can speed it up by almost a factor N.
To do it properly, you could do the bitvector marking with
a compare & swap instruction.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-27 20:22 ` Andy Wingo
2008-08-28 4:33 ` Han-Wen Nienhuys
@ 2008-08-28 14:17 ` Ludovic Courtès
1 sibling, 0 replies; 20+ messages in thread
From: Ludovic Courtès @ 2008-08-28 14:17 UTC (permalink / raw)
To: guile-devel
Hello,
Thanks for the nice summary!
Andy Wingo <wingo@pobox.com> writes:
> But what if it goes like this:
>
> S becomes collectable in theory
>
> mark phase: S is indeed marked as collectable
>
> C is returned from a callback: get_ptr() return S
>
> at some later time the card containing S is swept; S's free function
> is run, and S is marked as a free cell
>
> at some later point maybe S gets reused for some other purpose
>
> however S was already alive in scheme, and we are using it as a smob!
>
> The point is:
>
> You cannot do C->Scheme mapping reliably in the presence of lazy
> sweeping, because there is a time in which the object is marked as
> sweepable but not swept, but the C->Scheme code has no way of knowing
> this.
>
> (While talking with Ludovic we realized that his code has this problem.)
The code I had in mind is GnuTLS [0], but it's a slightly more specific
scenario and I now don't think the problem applies there.
In GnuTLS, there are "session" objects; session objects can have a
Scheme port attached to them, and the `session-record-port' procedure
returns that port. What we want is:
(eq? (session-record-port s) (session-record-port s))
IOW, there must be a mapping from the session to the port so that we
don't create a new port each time `session-record-port' is called.
Here's how it's achieved. Let `s' be the C session object, `S' the
corresponding SMOB, and `P' the port (an `SCM'). We have the following
object graph:
SCM_STREAM(P)
S <-------------------- P
| ^
|SCM_SMOB_DATA(S) |
| |
| _____________________'
| / gnutls_session_get_ptr(s)
v/
s
The mark procedure of `S' marks `P'. Thus, as long as `S' is reachable,
`P' is reachable. In addition, as long as `P' is reachable, `S' is
reachable.
The important difference with the generalized scheme you described is
that when `S' becomes unreferenced by Scheme code, there's no way `s'
can suddenly reappear at the Scheme level because GnuTLS doesn't have
any function that would return `s'. Thus, the race condition you
described cannot happen.
The key insight here is that `S' and `s' "aggregate" `P', i.e., the
lifetime of `S' and `s' is always greater than or equal to that of `P'.
I presume that this scheme is applicable to many (most?) object-oriented
APIs. It's actually what lead to the inception of the `aggregated'
typespec in G-Wrap [1].
Now, I haven't considered call-backs. But maybe a call-back can be seen
as a procedure that's aggregated by some object; in turn, the procedure
refer to other objects in its environment, such that the lifetime of the
objects involved is similarly hierarchical.
Sorry for the digression but I think it's important to know whether
Guile's API intrinsically makes it hard to handle such common use cases.
Thanks,
Ludo'.
[0] http://git.savannah.gnu.org/gitweb/?p=gnutls.git;a=blob;f=guile/src/core.c#l42
[1] http://www.nongnu.org/g-wrap/manual/Wrapping-a-C-Function.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-08-28 7:15 ` Ludovic Courtès
@ 2008-09-03 4:39 ` Han-Wen Nienhuys
2008-09-03 8:12 ` Ludovic Courtès
0 siblings, 1 reply; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-09-03 4:39 UTC (permalink / raw)
To: guile-devel
Ludovic Courtès escreveu:
>>> I'm still in favor of "git revert" since the log message makes it clear
>>> which patch was reverted and why. "We" can then take our time and work
>>> out a proper fix, and finally re-merge the patch plus its fix.
>>> Furthermore, in the eventuality where none of us eventually finds a fix,
>>> `master' is left in the previous state, which is better IMO.
>> 'master' in its previous states grows the heap to 600M doing the 1000-fold
>> version of srfi-18 test I posted. I think it's not a good solution.
>>
>> Commenting out the assert for x86-64 should yield better behavior.
>
> Alright, then please go ahead.
Pushed (without changelog entry).
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-09-03 4:39 ` Han-Wen Nienhuys
@ 2008-09-03 8:12 ` Ludovic Courtès
2008-09-03 13:39 ` Han-Wen Nienhuys
0 siblings, 1 reply; 20+ messages in thread
From: Ludovic Courtès @ 2008-09-03 8:12 UTC (permalink / raw)
To: guile-devel
Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
> Pushed (without changelog entry).
Note that what we agreed on was to provide ChangeLog-style comments in
the Git log entry, which this patch doesn't have.
Thanks,
Ludo'.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-09-03 8:12 ` Ludovic Courtès
@ 2008-09-03 13:39 ` Han-Wen Nienhuys
2008-09-03 14:49 ` Ludovic Courtès
0 siblings, 1 reply; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-09-03 13:39 UTC (permalink / raw)
To: guile-devel
Ludovic Courtès escreveu:
> Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
>
>> Pushed (without changelog entry).
>
> Note that what we agreed on was to provide ChangeLog-style comments in
> the Git log entry, which this patch doesn't have.
Can you explain me exactly what you want and why? I hope you're not
suggesting that we add filenames, because git already tracks those
for us.
In many cases (see eg. git.git), people feel free to put much longer
messages (entire emails) into the message.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-09-03 13:39 ` Han-Wen Nienhuys
@ 2008-09-03 14:49 ` Ludovic Courtès
2008-09-04 2:17 ` Han-Wen Nienhuys
0 siblings, 1 reply; 20+ messages in thread
From: Ludovic Courtès @ 2008-09-03 14:49 UTC (permalink / raw)
To: guile-devel
Hi,
Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
> Ludovic Courtès escreveu:
>> Note that what we agreed on was to provide ChangeLog-style comments in
>> the Git log entry, which this patch doesn't have.
>
> Can you explain me exactly what you want and why?
I'm suggesting that we keep using ChangeLog-style entries, as per (info
"(standards) Change Logs").
This is in accordance with GCS and has proved to be a good auditing
tool.
Ludo'.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: the new gc asserts in master
2008-09-03 14:49 ` Ludovic Courtès
@ 2008-09-04 2:17 ` Han-Wen Nienhuys
0 siblings, 0 replies; 20+ messages in thread
From: Han-Wen Nienhuys @ 2008-09-04 2:17 UTC (permalink / raw)
To: guile-devel
Ludovic Courtès escreveu:
> Hi,
>
> Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
>
>> Ludovic Courtès escreveu:
>
>>> Note that what we agreed on was to provide ChangeLog-style comments in
>>> the Git log entry, which this patch doesn't have.
>> Can you explain me exactly what you want and why?
>
> I'm suggesting that we keep using ChangeLog-style entries, as per (info
> "(standards) Change Logs").
>
> This is in accordance with GCS and has proved to be a good auditing
> tool.
Auditing? how so?
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-09-04 2:17 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-26 20:12 the new gc asserts in master Andy Wingo
2008-08-27 2:12 ` Han-Wen Nienhuys
2008-08-27 2:41 ` Han-Wen Nienhuys
2008-08-27 7:43 ` Ludovic Courtès
2008-08-27 14:00 ` Han-Wen Nienhuys
2008-08-27 15:38 ` Ludovic Courtès
2008-08-28 4:08 ` Han-Wen Nienhuys
2008-08-28 4:14 ` Han-Wen Nienhuys
2008-08-28 7:15 ` Ludovic Courtès
2008-09-03 4:39 ` Han-Wen Nienhuys
2008-09-03 8:12 ` Ludovic Courtès
2008-09-03 13:39 ` Han-Wen Nienhuys
2008-09-03 14:49 ` Ludovic Courtès
2008-09-04 2:17 ` Han-Wen Nienhuys
2008-08-27 19:00 ` Andy Wingo
2008-08-27 20:22 ` Andy Wingo
2008-08-28 4:33 ` Han-Wen Nienhuys
2008-08-28 14:17 ` Ludovic Courtès
2008-08-28 4:04 ` Han-Wen Nienhuys
2008-08-28 13:20 ` Han-Wen Nienhuys
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).