* 1.8 make check failing in popen.test
@ 2006-07-23 8:37 Rob Browning
2006-07-23 22:36 ` Kevin Ryde
2006-08-15 8:35 ` Rob Browning
0 siblings, 2 replies; 30+ messages in thread
From: Rob Browning @ 2006-07-23 8:37 UTC (permalink / raw)
Right now 1.8's make check fails here in popen.test:
Running popen.test
FAIL: popen.test: open-input-pipe: echo hello
However, it's a fairly strange failure. For example, if I run
./check-guile directly, everything's fine, but if I create a trivial
makefile containing "bar: ; ./check-guile" and then run "makefile -f
foo bar", the popen test fails again.
Also, if disable environments.test, everything's fine, i.e.:
mkdir test-suite/tests.disabled
mv test-suite/tests/environments.test test-suite/tests.disabled/
make check
However, if I put environments.test back and disable these tests
instead:
r4rs.test ramap.test receive.test threads.test
r5rs_pitfall.test reader.test regexp.test
then make check also runs correctly. So it's not specifically
environments.test.
By adding some display statements to popen.test, it looks like the
fundamental cause of the failure is the fact that in those cases,
(open-input-pipe "echo hello") is returning an empty port.
Does anyone have any idea what might be going on?
Also, while looking around, I noticed that many of our tests don't
place themselves in a their own test module, and that raised two
questions:
1) Should they (to limit the chance that one test might affect
another inadvertently)?
2) Is there any reason I shouldn't consider just reworking the
scheme level tests to run each foo.test in a separate Guile
process?
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-07-23 8:37 1.8 make check failing in popen.test Rob Browning
@ 2006-07-23 22:36 ` Kevin Ryde
2006-07-23 23:12 ` Rob Browning
2006-08-15 8:35 ` Rob Browning
1 sibling, 1 reply; 30+ messages in thread
From: Kevin Ryde @ 2006-07-23 22:36 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
>
> FAIL: popen.test: open-input-pipe: echo hello
Seems ok for me.
> Also, while looking around, I noticed that many of our tests don't
> place themselves in a their own test module, and that raised two
> questions:
>
> 1) Should they (to limit the chance that one test might affect
> another inadvertently)?
Yes, I think so, those which load modules or make helper defines at
least. I changed a couple of the srfi ones a while back, ones that
loaded bindings extending core stuff.
> 2) Is there any reason I shouldn't consider just reworking the
> scheme level tests to run each foo.test in a separate Guile
> process?
I suppose you lose the aggregated tests counts at the end. If all
tests work (which is presumably the normal case!) I guess it doesn't
matter much either way.
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-07-23 22:36 ` Kevin Ryde
@ 2006-07-23 23:12 ` Rob Browning
2006-07-24 7:25 ` Ludovic Courtès
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-07-23 23:12 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> Seems ok for me.
I wonder if it might be a Debian unstable issue...
> I suppose you lose the aggregated tests counts at the end. If all
> tests work (which is presumably the normal case!) I guess it doesn't
> matter much either way.
I wonder if those test counts are actually all that useful. I would
guess that most of the time you only care whether there was an error
or not.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-07-23 23:12 ` Rob Browning
@ 2006-07-24 7:25 ` Ludovic Courtès
2006-07-30 1:59 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Ludovic Courtès @ 2006-07-24 7:25 UTC (permalink / raw)
Cc: guile-devel
Hi,
Rob Browning <rlb@defaultvalue.org> writes:
> Kevin Ryde <user42@zip.com.au> writes:
>> I suppose you lose the aggregated tests counts at the end. If all
>> tests work (which is presumably the normal case!) I guess it doesn't
>> matter much either way.
>
> I wonder if those test counts are actually all that useful. I would
> guess that most of the time you only care whether there was an error
> or not.
Beside the test counts (which I find useful), running all the scripts in
a single process increases the chance of catching nasty bugs, as you
noticed. ;-) So I think it makes sense to keep it as is.
Thanks,
Ludovic.
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-07-24 7:25 ` Ludovic Courtès
@ 2006-07-30 1:59 ` Rob Browning
0 siblings, 0 replies; 30+ messages in thread
From: Rob Browning @ 2006-07-30 1:59 UTC (permalink / raw)
ludovic.courtes@laas.fr (Ludovic Courtès) writes:
> Beside the test counts (which I find useful), running all the
> scripts in a single process increases the chance of catching nasty
> bugs, as you noticed. ;-) So I think it makes sense to keep it as
> is.
Hmm. While I can see the point that running all of the test in a
single process does exercise Guile as a whole more heavily, I suppose
I was looking at this more from the perspective of testing accuracy.
Running all of the tests in a single process increases the chance that
some unrelated action by a previous test might inadvertently break (or
just improperly skew) the current test, i.e. it makes it harder to
isolate your variables. Such a problem seems like the kind of thing
that might take a long time to track down, without providing any
useful diagnostics.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-07-23 8:37 1.8 make check failing in popen.test Rob Browning
2006-07-23 22:36 ` Kevin Ryde
@ 2006-08-15 8:35 ` Rob Browning
2006-08-15 23:13 ` Kevin Ryde
1 sibling, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-08-15 8:35 UTC (permalink / raw)
Rob Browning <rlb@defaultvalue.org> writes:
> Right now 1.8's make check fails here in popen.test:
I have some further information, and it's quite surprising.
At this point with a fresh guile-1.8.0 tree, popen.test no longer
fails during make check, it hangs, but only in *some* trees. If the
tree is named guile-1.8.0, the test will work. If the tree is named
foo-1.8.0, the test will hang.
For example, if I do this:
tar xzf guile-1.8.0.tar.gz
cd guile-1.8.0
./configure --host=i486-linux-gnu --build=i486-linux-gnu --with-threads=yes
make
make check
popen.test doesn't hang, but if I do this
tar xzf guile-1.8.0.tar.gz
mv guile-1.8.0 foo-1.8.0
cd foo-1.8.0
./configure --host=i486-linux-gnu --build=i486-linux-gnu --with-threads=yes
make
make check
it does hang (repeatably).
I've looked at the process in gdb after it hangs, and it appears to be
blocked in a select waiting in (I think) the "no duplicate" test.
At first I thought this might just be a local issue, but then I asked
someone on irc to try the same thing, and they saw identical behavior
on a completely different architecture (powerpc).
I'd really like to get this tracked down. The reason I noticed the
problem is because the tree that the build tree for the forthcoming
Debian Guile 1.8 packages isn't named guile-1.8.0.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-15 8:35 ` Rob Browning
@ 2006-08-15 23:13 ` Kevin Ryde
2006-08-16 1:18 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Ryde @ 2006-08-15 23:13 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
>
> mv guile-1.8.0 foo-1.8.0
Sounds bizarre.
> I've looked at the process in gdb after it hangs, and it appears to be
> blocked in a select waiting in (I think) the "no duplicate" test.
There two of those tests, which does it seem to be? (The child
processes are different in each, "ps" might show which.)
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-15 23:13 ` Kevin Ryde
@ 2006-08-16 1:18 ` Rob Browning
2006-08-16 23:26 ` Kevin Ryde
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-08-16 1:18 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
>> I've looked at the process in gdb after it hangs, and it appears to be
>> blocked in a select waiting in (I think) the "no duplicate" test.
>
> There two of those tests, which does it seem to be? (The child
> processes are different in each, "ps" might show which.)
Ahh, I didn't notice that. It's actually the second. Here's the tail
of check-guile.log:
PASS: popen.test: open-input-pipe: stdin==stderr
PASS: popen.test: open-input-pipe: stdout==stderr
PASS: popen.test: open-input-pipe: no duplicate
PASS: popen.test: open-output-pipe: no args
PASS: popen.test: open-output-pipe: port?
PASS: popen.test: open-output-pipe: stdin==stderr
PASS: popen.test: open-output-pipe: stdout==stderr
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-16 1:18 ` Rob Browning
@ 2006-08-16 23:26 ` Kevin Ryde
2006-08-18 3:01 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Ryde @ 2006-08-16 23:26 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
>
> Ahh, I didn't notice that. It's actually the second.
I guess a select hang is probably the read-char waiting for the child
to signal to the parent that it (the child) has closed stdout so the
parent can go ahead and test that connection (for eof). Maybe gdb or
some printfs can show if that's so.
It still works ok for me though, with a renamed directory. But I do
get a test fail in the "illegal (begin)" of syntax.test. Dunno what
that means ... "(if #t (begin))" is supposed to be illegal but it
passes, or something.
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-16 23:26 ` Kevin Ryde
@ 2006-08-18 3:01 ` Rob Browning
2006-08-21 23:38 ` Kevin Ryde
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-08-18 3:01 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> I guess a select hang is probably the read-char waiting for the
> child to signal to the parent that it (the child) has closed stdout
> so the parent can go ahead and test that connection (for eof).
> Maybe gdb or some printfs can show if that's so.
I haven't had a chance to investigate much further yet, but if I run
make check under gdb and hit Ctrl-C while popen.test is hung, there
are two threads that were running. The first thread's backtrace looks
like this:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x40281f41 in select () from /lib/tls/i686/cmov/libc.so.6
#2 0x400b481c in scm_std_select (nfds=29, readfds=0xbfa8621c, writefds=0x0,
exceptfds=0x0, timeout=0x0) at threads.c:1363
#3 0x4005f8ce in fport_fill_input (port=0x40647400) at fports.c:550
#4 0x40086aff in scm_fill_input (port=0x40647400) at ports.c:973
#5 0x40086c40 in scm_getc (port=0x40647400) at ports.c:991
#6 0x40088d7d in scm_read_char (port=0x0) at ports.c:948
#7 0x4005ad54 in ceval (x=0x404, env=0x40647188) at eval.c:4218
and the second thread's backtrace looks like this:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x401b681b in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x40091923 in do_read_without_guile (raw_data=0x40b310d8) at scmsigs.c:134
#3 0x400b42fc in scm_without_guile (func=0x400918f0 <do_read_without_guile>,
data=0x40b310d8) at threads.c:657
#4 0x40091875 in signal_delivery_thread (data=0x0) at scmsigs.c:145
#5 0x400b67b5 in scm_c_catch (tag=0x104,
body=0x400917f0 <signal_delivery_thread>, body_data=0x0,
handler=0x400b6080 <scm_handle_by_message>, handler_data=0x400d867e,
pre_unwind_handler=0, pre_unwind_handler_data=0x0) at throw.c:204
#6 0x400b69c5 in scm_internal_catch (tag=0x104,
body=0x400917f0 <signal_delivery_thread>, body_data=0x0,
handler=0x400b6080 <scm_handle_by_message>, handler_data=0x400d867e)
at throw.c:218
#7 0x400b55ad in really_spawn (d=0xbfa860c8) at threads.c:777
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-18 3:01 ` Rob Browning
@ 2006-08-21 23:38 ` Kevin Ryde
2006-08-22 2:03 ` Rob Browning
2006-08-30 7:50 ` Rob Browning
0 siblings, 2 replies; 30+ messages in thread
From: Kevin Ryde @ 2006-08-21 23:38 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
>
> I haven't had a chance to investigate much further yet, but if I run
> make check under gdb and hit Ctrl-C while popen.test is hung, there
> are two threads that were running.
Looks about right. What's the child process doing? It's supposed to
be writing to the parent to say continue. (Unless it failed to fork
there should be some child, either running or a zombie.)
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-21 23:38 ` Kevin Ryde
@ 2006-08-22 2:03 ` Rob Browning
2006-08-25 1:16 ` Kevin Ryde
2006-08-30 7:50 ` Rob Browning
1 sibling, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-08-22 2:03 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> Looks about right. What's the child process doing? It's supposed
> to be writing to the parent to say continue. (Unless it failed to
> fork there should be some child, either running or a zombie.)
>From the output of ps auxf, it looks like the child is probably one of
these zombies. There are several zombie sh children:
\_ [sh] <defunct>
\_ [sh] <defunct>
\_ [sh] <defunct>
\_ [lt-guile] <defunct>
\_ [lt-guile] <defunct>
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-22 2:03 ` Rob Browning
@ 2006-08-25 1:16 ` Kevin Ryde
2006-08-25 2:39 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Ryde @ 2006-08-25 1:16 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
>
>>From the output of ps auxf, it looks like the child is probably one of
> these zombies. There are several zombie sh children:
>
> \_ [sh] <defunct>
> \_ [sh] <defunct>
> \_ [sh] <defunct>
> \_ [lt-guile] <defunct>
> \_ [lt-guile] <defunct>
Actually, I see the signalling pipe is left in the parent, so if the
child has printed nothing it could hang. Obviously there's something
wrong if the child doesn't print what's expected, but the change below
should at least cause an error instead of a hang in that case.
;; After the child closes stdin (which it indicates here by writing
;; "closed" to stderr), the parent should see a broken pipe. We setup to
;; see this as EPIPE (rather than SIGPIPE). In Guile 1.6.4 and earlier a
;; duplicate of stdin existed in the child, preventing the broken pipe
;; occurring.
(pass-if "no duplicate"
(with-epipe
(lambda ()
(let* ((pair (pipe))
(port (with-error-to-port (cdr pair)
(lambda ()
(open-output-pipe
"exec 0</dev/null; echo closed 1>&2; exec 2>/dev/null; sleep 999")))))
(close-port (cdr pair)) ;; write side
(and (char? (read-char (car pair))) ;; wait for child to do its thing
(catch 'system-error
(lambda ()
(write-char #\x port)
(force-output port)
#f)
(lambda (key name fmt args errno-list)
(= (car errno-list) EPIPE))))))))
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-25 1:16 ` Kevin Ryde
@ 2006-08-25 2:39 ` Rob Browning
2006-08-27 20:06 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-08-25 2:39 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> Actually, I see the signalling pipe is left in the parent, so if the
> child has printed nothing it could hang. Obviously there's something
> wrong if the child doesn't print what's expected, but the change below
> should at least cause an error instead of a hang in that case.
OK, here's what I see with that test code:
Running popen.test
FAIL: popen.test: open-output-pipe: no duplicate
FAIL: popen.test: close-pipe: exit 0
ERROR: popen.test: close-pipe: exit 1 - arguments: ((wrong-number-of-args #f "Wrong number of arguments to ~A" (#<procedure #f (pt-entry)>) #f))
ERROR: In procedure fport_flush:
ERROR: Bad file descriptor
ERROR: In procedure fport_flush:
ERROR: Bad file descriptor
Running ports.test
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-25 2:39 ` Rob Browning
@ 2006-08-27 20:06 ` Rob Browning
2006-08-27 20:22 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-08-27 20:06 UTC (permalink / raw)
Rob Browning <rlb@defaultvalue.org> writes:
> OK, here's what I see with that test code:
>
> Running popen.test
> FAIL: popen.test: open-output-pipe: no duplicate
> FAIL: popen.test: close-pipe: exit 0
> ERROR: popen.test: close-pipe: exit 1 - arguments: ((wrong-number-of-args #f "Wrong number of arguments to ~A" (#<procedure #f (pt-entry)>) #f))
> ERROR: In procedure fport_flush:
> ERROR: Bad file descriptor
> ERROR: In procedure fport_flush:
> ERROR: Bad file descriptor
> Running ports.test
Hmm. Just to be sure, I started off with a completely fresh tree, and
now it hangs with the original code, and fails like this with your
replacement code:
Running popen.test
FAIL: popen.test: open-output-pipe: no duplicate
Running ports.test
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-27 20:06 ` Rob Browning
@ 2006-08-27 20:22 ` Rob Browning
0 siblings, 0 replies; 30+ messages in thread
From: Rob Browning @ 2006-08-27 20:22 UTC (permalink / raw)
Rob Browning <rlb@defaultvalue.org> writes:
> Hmm. Just to be sure, I started off with a completely fresh tree, and
> now it hangs with the original code, and fails like this with your
> replacement code:
>
> Running popen.test
> FAIL: popen.test: open-output-pipe: no duplicate
> Running ports.test
OK, it turns out if I add a (sleep 1) just before your close-port, I
see the "Bad file descriptor" errors again, i.e.:
(sleep 1)
(close-port (cdr pair)) ;; write side
produces this, and the two subsequent tests also fail:
Running popen.test
FAIL: popen.test: open-output-pipe: no duplicate
ERROR: popen.test: close-pipe: exit 0 - arguments: ((wrong-number-of-args #f "Wrong number of arguments to ~A" (#<procedure #f (pt-entry)>) #f))
ERROR: In procedure fport_flush:
ERROR: Bad file descriptor
ERROR: In procedure fport_flush:
ERROR: Bad file descriptor
FAIL: popen.test: close-pipe: exit 0
FAIL: popen.test: close-pipe: exit 1
Running ports.test
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-21 23:38 ` Kevin Ryde
2006-08-22 2:03 ` Rob Browning
@ 2006-08-30 7:50 ` Rob Browning
2006-08-31 0:24 ` Kevin Ryde
1 sibling, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-08-30 7:50 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> Looks about right. What's the child process doing? It's supposed to
> be writing to the parent to say continue. (Unless it failed to fork
> there should be some child, either running or a zombie.)
(Consider the following info preliminary. I haven't had any time to
try and figure out the actual cause, but since I just discovered this,
and I have to stop for the moment, I wanted to let everyone else have
a look.)
After further investigation, it appears that particular child might
not be running, at least not on some of the runs. I switched back to
the original code (the code that would hang), added some debug
statements, ran strace -p -s 100, etc. on "make check", and found that
the child appears to be segfaulting at least some of the time here (in
popen.scm):
(port-for-each (lambda (pt-entry)
;;(dbg-out (list 'pt-entry pt-entry))
(false-if-exception
(let ((pt-fileno (fileno pt-entry)))
(if (not (or (= pt-fileno input-fdes)
(= pt-fileno output-fdes)
(= pt-fileno error-fdes)))
(close-fdes pt-fileno))))))
When I uncomment the dbg-out statement above (which just writes the
arg and a newline to an output-port and then forces the output), I see
this on the console:
ERROR: popen.test: open-output-pipe: no duplicate - arguments:
((wrong-type-arg "list-copy" "Wrong type argument in position ~A: ~S"
(1 (pt-entry . #<freed cell 0x40305830; GC missed a reference>))
((pt-entry . #<freed cell 0x40305830; GC missed a reference>))))
this in the dbg-out output file:
...
(pt-entry #<output: string 81079e0>)
(pt-entry #<output: string 8106650>)
(pt-entry #<freed cell 0x40643c18; GC missed a reference>)
and this in the strace (1402 is the forked child process):
1402 write(7, "ERROR: popen.test: open-output-pipe: no duplicate - arguments: ((wrong-type-arg \"list-copy\" \"Wrong t"..., 263) = -1 EBADF (Bad file descriptor)
1402 write(2, "ERROR", 5) = -1 EBADF (Bad file descriptor)
1402 write(2, "\nException during displaying of ", 32) = -1 EBADF (Bad file descriptor)
1402 write(7, "ERROR: popen.test: open-output-pipe: no duplicate - arguments: ((wrong-type-arg \"list-copy\" \"Wrong t"..., 263) = -1 EBADF (Bad file descriptor)
1402 write(2, "ERROR", 5) = -1 EBADF (Bad file descriptor)
1402 write(2, "\nException during displaying of ", 32) = -1 EBADF (Bad file descriptor)
1402 exit_group(1) = ?
If I omit the dbg-out statement in the above code, then I can just see
the child die due to a SEGV in the strace log (2126 is the child):
2126 close(12 <unfinished ...>
2123 <... close resumed> ) = 0
2126 <... close resumed> ) = 0
2123 access("/etc/ld.so.nohwcap", F_OK <unfinished ...>
2126 close(10 <unfinished ...>
2123 <... access resumed> ) = -1 ENOENT (No such file or directory)
2126 <... close resumed> ) = 0
2123 open("/lib/tls/i686/cmov/libdl.so.2", O_RDONLY <unfinished ...>
2126 close(29 <unfinished ...>
2123 <... open resumed> ) = 5
2126 <... close resumed> ) = 0
2123 read(5, <unfinished ...>
2126 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
So I started from a clean tree, enabled core dumps, and here's what
gdb had to say about the resulting core:
Program terminated with signal 11, Segmentation fault.
#0 0x400729ca in scm_fileno (port=0x0) at ioext.c:180
180 port = SCM_COERCE_OUTPORT (port);
(gdb) where
#0 0x400729ca in scm_fileno (port=0x0) at ioext.c:180
#1 0x4005ad41 in ceval (x=0x404, env=0x40372710) at eval.c:4218
#2 0x4005b26e in ceval (x=<value optimized out>, env=0x40372710)
at eval.c:3634
In any case, as I said, consider all this preliminary. For everything
but the core dump, I wasn't working from a clean tree.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-30 7:50 ` Rob Browning
@ 2006-08-31 0:24 ` Kevin Ryde
2006-08-31 6:28 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Ryde @ 2006-08-31 0:24 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
>
> ;;(dbg-out (list 'pt-entry pt-entry))
> ...
>
> ERROR: popen.test: open-output-pipe: no duplicate - arguments:
> ((wrong-type-arg "list-copy" "Wrong type argument in position ~A: ~S"
> (1 (pt-entry . #<freed cell 0x40305830; GC missed a reference>))
> ((pt-entry . #<freed cell 0x40305830; GC missed a reference>))))
It's possible the complaint is about the `list' call. I changed
`list' to be implemented using scm_list_copy, which of course reports
itself as `list-copy' if the list is improper. But I can't think how
an improper list could arrive there from a "(list x y)" call.
> (pt-entry #<freed cell 0x40643c18; GC missed a reference>)
I suppose that's the killer, a port gc-ed prematurely. Perhaps
there'd be some significance in which one it was. port-for-each looks
pretty safe, maybe the port table has gone bad earlier.
Not sure how this would relate to the following bit you posted
before. If this bit was taken literally it'd sound like something
evil in scm_call_1 I suppose.
> ERROR: popen.test: close-pipe: exit 0 - arguments: ((wrong-number-of-args #f "Wrong number of arguments to ~A" (#<procedure #f (pt-entry)>) #f))
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-31 0:24 ` Kevin Ryde
@ 2006-08-31 6:28 ` Rob Browning
2006-09-04 2:45 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-08-31 6:28 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> I suppose that's the killer, a port gc-ed prematurely. Perhaps
> there'd be some significance in which one it was. port-for-each
> looks pretty safe, maybe the port table has gone bad earlier.
I think I found the problem. I believe scm_c_port_for_each() needs a
call to scm_remember_upto_here_1(ports) at the end.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-08-31 6:28 ` Rob Browning
@ 2006-09-04 2:45 ` Rob Browning
2006-09-07 20:46 ` Kevin Ryde
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-09-04 2:45 UTC (permalink / raw)
Rob Browning <rlb@defaultvalue.org> writes:
> Kevin Ryde <user42@zip.com.au> writes:
>
>> I suppose that's the killer, a port gc-ed prematurely. Perhaps
>> there'd be some significance in which one it was. port-for-each
>> looks pretty safe, maybe the port table has gone bad earlier.
>
> I think I found the problem. I believe scm_c_port_for_each() needs a
> call to scm_remember_upto_here_1(ports) at the end.
Apparently I spoke too soon, or rather, that wasn't the only problem.
Though the child is no longer segfaulting, popen.test still hangs
occasionally.
I caught it hanging between two debug statements with nothing between
them other than a (gc) statement I added.
So I tried attaching gdb to the child (which is still in open-process,
a bit after the fork), and saw:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x401b655e in __lll_mutex_lock_wait ()
from /lib/tls/i686/cmov/libpthread.so.0
#2 0x401b318d in _L_mutex_lock_29 () from /lib/tls/i686/cmov/libpthread.so.0
#3 0xbf83b568 in ?? ()
#4 0x400b41b9 in scm_leave_guile () at threads.c:369
#5 0x400b423b in scm_i_thread_put_to_sleep () at threads.c:1538
#6 0x40060c69 in scm_i_gc (what=0x400d2dcb "call") at gc.c:548
#7 0x40060e81 in scm_gc () at gc.c:454
An strace also indicates that the child is blocked on a futex call.
Looking at threads.c, at that that point Guile is calling
scm_i_pthread_mutex_unlock() from within scm_leave_guile(). I imagine
that pthread_mutex_unlock() is making the call to lock a lower level
mutex, which blocks.
At the moment, though, I don't why it blocks. I wonder if the parent
left that mutex in an unexpected state when it forked.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-04 2:45 ` Rob Browning
@ 2006-09-07 20:46 ` Kevin Ryde
2006-09-08 2:45 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Ryde @ 2006-09-07 20:46 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
>
> At the moment, though, I don't why it blocks. I wonder if the parent
> left that mutex in an unexpected state when it forked.
Sound bizarre. Maybe hacking scm_i_pthread_mutex_init to force the
use of the "error checking" mutex attribute and then hacking
scm_i_pthread_mutex_lock and scm_i_pthread_mutex_unlock to check the
return and errno would show something. Dunno what though.
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-07 20:46 ` Kevin Ryde
@ 2006-09-08 2:45 ` Rob Browning
2006-09-08 23:05 ` Kevin Ryde
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-09-08 2:45 UTC (permalink / raw)
Kevin Ryde <user42@zip.com.au> writes:
> Sound bizarre. Maybe hacking scm_i_pthread_mutex_init to force the
> use of the "error checking" mutex attribute and then hacking
> scm_i_pthread_mutex_lock and scm_i_pthread_mutex_unlock to check the
> return and errno would show something. Dunno what though.
See pthread_atfork(3):
To understand the purpose of !pthread_atfork!, recall that !fork!(2)
duplicates the whole memory space, including mutexes in their
current locking state, but only the calling thread: other threads
are not run- ning in the child process. The mutexes are not usable
after the !fork! and must be initialized with |pthread_mutex_init|
in the child process. This is a limitation of the current
implementation and might or might not be present in future versions.
I believe a while back I wondered if we might have trouble with
pthreads and fork unless we were very careful, and I'm beginning to
wonder again.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-08 2:45 ` Rob Browning
@ 2006-09-08 23:05 ` Kevin Ryde
2006-09-09 12:52 ` Marius Vollmer
0 siblings, 1 reply; 30+ messages in thread
From: Kevin Ryde @ 2006-09-08 23:05 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
>
> I believe a while back I wondered if we might have trouble with
> pthreads and fork unless we were very careful, and I'm beginning to
> wonder again.
Ahh, that sounds likely. If mutexes are unusable once forked then
that'd hurt the gc trying to get all threads to rendezvous. Sounds
bad for any scheme program that wants to fork, not just ice-9 popen.
:-(
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-08 23:05 ` Kevin Ryde
@ 2006-09-09 12:52 ` Marius Vollmer
2006-09-09 16:39 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Marius Vollmer @ 2006-09-09 12:52 UTC (permalink / raw)
Cc: guile-devel
Kevin Ryde <user42@zip.com.au> writes:
> Rob Browning <rlb@defaultvalue.org> writes:
>>
>> I believe a while back I wondered if we might have trouble with
>> pthreads and fork unless we were very careful, and I'm beginning to
>> wonder again.
>
> Ahh, that sounds likely. If mutexes are unusable once forked then
> that'd hurt the gc trying to get all threads to rendezvous. Sounds
> bad for any scheme program that wants to fork, not just ice-9 popen.
> :-(
Yes. I think I have been here when investigating the popen blocking
problem, but then I compiled Guile without thread support and I could
still observe the blocking. (I stopped at that point, lazy me, but
debugging should have been simpler without threads...)
--
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-09 12:52 ` Marius Vollmer
@ 2006-09-09 16:39 ` Rob Browning
2006-09-10 16:24 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-09-09 16:39 UTC (permalink / raw)
Cc: guile-devel
Marius Vollmer <mvo@zagadka.de> writes:
> Yes. I think I have been here when investigating the popen blocking
> problem, but then I compiled Guile without thread support and I could
> still observe the blocking. (I stopped at that point, lazy me, but
> debugging should have been simpler without threads...)
How recent was your checkout when you ran the test? I ask because it
looks like there are two bugs, and I wanted to know if your tree was
recent enough to include the fix I committed for one of them.
In any case, I'll see if I can still get it to hang using
--without-threads.
Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-09 16:39 ` Rob Browning
@ 2006-09-10 16:24 ` Rob Browning
2006-09-11 1:16 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-09-10 16:24 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
> In any case, I'll see if I can still get it to hang using
> --without-threads.
I have a tree here where I added a (gc) call before and after the
port-for-each call in popen.scm (in the child). Given that and
--with-threads=yes, "make check" would hang every time I ran it.
I rebuilt that tree with --with-threads=no and I ran make check
repeatedly. After 20 iterations, it hadn't blocked.
So I removed the (gc) calls and rebuilt with --with-threads=yes, and
make check blocked after 7 iterations and then 183 iterations.
Though not conclusive, these results, when combined with the gdb
backtrace I posted earlier showing the blockage in scm_gc() while
trying to lock a a mutex, seem to suggest that the remaining problem
is thread related.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-10 16:24 ` Rob Browning
@ 2006-09-11 1:16 ` Rob Browning
2006-09-11 6:40 ` Neil Jerram
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-09-11 1:16 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
> Though not conclusive, these results, when combined with the gdb
> backtrace I posted earlier showing the blockage in scm_gc() while
> trying to lock a a mutex, seem to suggest that the remaining problem
> is thread related.
On a related topic, in order for Guile 1.8 to make it into Debian etch
(the upcoming stable release), we're essentially out of time to fix
this problem. Because of this I'm contemplating uploading the initial
Guile 1.8 packages with --with-threads=no.
Does anyone see any major problems with this stopgap? It seems better
to have Guile 1.8 in etch without threads than not to have it at all.
Note that this issue doesn't just affect Guile. It's my understanding
that unless Guile 1.8 packages are uploaded very soon, the newer
version of lilypond won't make it into etch either.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-11 1:16 ` Rob Browning
@ 2006-09-11 6:40 ` Neil Jerram
2006-09-11 15:57 ` Rob Browning
0 siblings, 1 reply; 30+ messages in thread
From: Neil Jerram @ 2006-09-11 6:40 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
> On a related topic, in order for Guile 1.8 to make it into Debian etch
> (the upcoming stable release), we're essentially out of time to fix
> this problem. Because of this I'm contemplating uploading the initial
> Guile 1.8 packages with --with-threads=no.
>
> Does anyone see any major problems with this stopgap? It seems better
> to have Guile 1.8 in etch without threads than not to have it at all.
Will etch have guile 1.6 as well, or would this upload replace guile
1.6?
In any case, I think without-threads is probably OK.
> Note that this issue doesn't just affect Guile. It's my understanding
> that unless Guile 1.8 packages are uploaded very soon, the newer
> version of lilypond won't make it into etch either.
That would be serious. Presumably lilypond doesn't require threads
then?
Regards,
Neil
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-11 6:40 ` Neil Jerram
@ 2006-09-11 15:57 ` Rob Browning
2006-09-11 17:40 ` Neil Jerram
0 siblings, 1 reply; 30+ messages in thread
From: Rob Browning @ 2006-09-11 15:57 UTC (permalink / raw)
Cc: guile-devel
Neil Jerram <neil@ossau.uklinux.net> writes:
> Will etch have guile 1.6 as well, or would this upload replace guile
> 1.6?
This won't affect the guile-1.6 packages.
> That would be serious. Presumably lilypond doesn't require threads
> then?
I would assume so, since the lilypond maintainer's very much in favor
of an immediate 1.8 upload with --with-threads=no.
--
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test
2006-09-11 15:57 ` Rob Browning
@ 2006-09-11 17:40 ` Neil Jerram
0 siblings, 0 replies; 30+ messages in thread
From: Neil Jerram @ 2006-09-11 17:40 UTC (permalink / raw)
Cc: guile-devel
Rob Browning <rlb@defaultvalue.org> writes:
> Neil Jerram <neil@ossau.uklinux.net> writes:
>
>> Will etch have guile 1.6 as well, or would this upload replace guile
>> 1.6?
>
> This won't affect the guile-1.6 packages.
Thanks for confirming that.
Regards,
Neil
_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2006-09-11 17:40 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-23 8:37 1.8 make check failing in popen.test Rob Browning
2006-07-23 22:36 ` Kevin Ryde
2006-07-23 23:12 ` Rob Browning
2006-07-24 7:25 ` Ludovic Courtès
2006-07-30 1:59 ` Rob Browning
2006-08-15 8:35 ` Rob Browning
2006-08-15 23:13 ` Kevin Ryde
2006-08-16 1:18 ` Rob Browning
2006-08-16 23:26 ` Kevin Ryde
2006-08-18 3:01 ` Rob Browning
2006-08-21 23:38 ` Kevin Ryde
2006-08-22 2:03 ` Rob Browning
2006-08-25 1:16 ` Kevin Ryde
2006-08-25 2:39 ` Rob Browning
2006-08-27 20:06 ` Rob Browning
2006-08-27 20:22 ` Rob Browning
2006-08-30 7:50 ` Rob Browning
2006-08-31 0:24 ` Kevin Ryde
2006-08-31 6:28 ` Rob Browning
2006-09-04 2:45 ` Rob Browning
2006-09-07 20:46 ` Kevin Ryde
2006-09-08 2:45 ` Rob Browning
2006-09-08 23:05 ` Kevin Ryde
2006-09-09 12:52 ` Marius Vollmer
2006-09-09 16:39 ` Rob Browning
2006-09-10 16:24 ` Rob Browning
2006-09-11 1:16 ` Rob Browning
2006-09-11 6:40 ` Neil Jerram
2006-09-11 15:57 ` Rob Browning
2006-09-11 17:40 ` Neil Jerram
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).