* 1.8 make check failing in popen.test @ 2006-07-23 8:37 Rob Browning 2006-07-23 22:36 ` Kevin Ryde 2006-08-15 8:35 ` Rob Browning 0 siblings, 2 replies; 30+ messages in thread From: Rob Browning @ 2006-07-23 8:37 UTC (permalink / raw) Right now 1.8's make check fails here in popen.test: Running popen.test FAIL: popen.test: open-input-pipe: echo hello However, it's a fairly strange failure. For example, if I run ./check-guile directly, everything's fine, but if I create a trivial makefile containing "bar: ; ./check-guile" and then run "makefile -f foo bar", the popen test fails again. Also, if disable environments.test, everything's fine, i.e.: mkdir test-suite/tests.disabled mv test-suite/tests/environments.test test-suite/tests.disabled/ make check However, if I put environments.test back and disable these tests instead: r4rs.test ramap.test receive.test threads.test r5rs_pitfall.test reader.test regexp.test then make check also runs correctly. So it's not specifically environments.test. By adding some display statements to popen.test, it looks like the fundamental cause of the failure is the fact that in those cases, (open-input-pipe "echo hello") is returning an empty port. Does anyone have any idea what might be going on? Also, while looking around, I noticed that many of our tests don't place themselves in a their own test module, and that raised two questions: 1) Should they (to limit the chance that one test might affect another inadvertently)? 2) Is there any reason I shouldn't consider just reworking the scheme level tests to run each foo.test in a separate Guile process? -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-07-23 8:37 1.8 make check failing in popen.test Rob Browning @ 2006-07-23 22:36 ` Kevin Ryde 2006-07-23 23:12 ` Rob Browning 2006-08-15 8:35 ` Rob Browning 1 sibling, 1 reply; 30+ messages in thread From: Kevin Ryde @ 2006-07-23 22:36 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > > FAIL: popen.test: open-input-pipe: echo hello Seems ok for me. > Also, while looking around, I noticed that many of our tests don't > place themselves in a their own test module, and that raised two > questions: > > 1) Should they (to limit the chance that one test might affect > another inadvertently)? Yes, I think so, those which load modules or make helper defines at least. I changed a couple of the srfi ones a while back, ones that loaded bindings extending core stuff. > 2) Is there any reason I shouldn't consider just reworking the > scheme level tests to run each foo.test in a separate Guile > process? I suppose you lose the aggregated tests counts at the end. If all tests work (which is presumably the normal case!) I guess it doesn't matter much either way. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-07-23 22:36 ` Kevin Ryde @ 2006-07-23 23:12 ` Rob Browning 2006-07-24 7:25 ` Ludovic Courtès 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-07-23 23:12 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > Seems ok for me. I wonder if it might be a Debian unstable issue... > I suppose you lose the aggregated tests counts at the end. If all > tests work (which is presumably the normal case!) I guess it doesn't > matter much either way. I wonder if those test counts are actually all that useful. I would guess that most of the time you only care whether there was an error or not. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-07-23 23:12 ` Rob Browning @ 2006-07-24 7:25 ` Ludovic Courtès 2006-07-30 1:59 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Ludovic Courtès @ 2006-07-24 7:25 UTC (permalink / raw) Cc: guile-devel Hi, Rob Browning <rlb@defaultvalue.org> writes: > Kevin Ryde <user42@zip.com.au> writes: >> I suppose you lose the aggregated tests counts at the end. If all >> tests work (which is presumably the normal case!) I guess it doesn't >> matter much either way. > > I wonder if those test counts are actually all that useful. I would > guess that most of the time you only care whether there was an error > or not. Beside the test counts (which I find useful), running all the scripts in a single process increases the chance of catching nasty bugs, as you noticed. ;-) So I think it makes sense to keep it as is. Thanks, Ludovic. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-07-24 7:25 ` Ludovic Courtès @ 2006-07-30 1:59 ` Rob Browning 0 siblings, 0 replies; 30+ messages in thread From: Rob Browning @ 2006-07-30 1:59 UTC (permalink / raw) ludovic.courtes@laas.fr (Ludovic Courtès) writes: > Beside the test counts (which I find useful), running all the > scripts in a single process increases the chance of catching nasty > bugs, as you noticed. ;-) So I think it makes sense to keep it as > is. Hmm. While I can see the point that running all of the test in a single process does exercise Guile as a whole more heavily, I suppose I was looking at this more from the perspective of testing accuracy. Running all of the tests in a single process increases the chance that some unrelated action by a previous test might inadvertently break (or just improperly skew) the current test, i.e. it makes it harder to isolate your variables. Such a problem seems like the kind of thing that might take a long time to track down, without providing any useful diagnostics. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-07-23 8:37 1.8 make check failing in popen.test Rob Browning 2006-07-23 22:36 ` Kevin Ryde @ 2006-08-15 8:35 ` Rob Browning 2006-08-15 23:13 ` Kevin Ryde 1 sibling, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-08-15 8:35 UTC (permalink / raw) Rob Browning <rlb@defaultvalue.org> writes: > Right now 1.8's make check fails here in popen.test: I have some further information, and it's quite surprising. At this point with a fresh guile-1.8.0 tree, popen.test no longer fails during make check, it hangs, but only in *some* trees. If the tree is named guile-1.8.0, the test will work. If the tree is named foo-1.8.0, the test will hang. For example, if I do this: tar xzf guile-1.8.0.tar.gz cd guile-1.8.0 ./configure --host=i486-linux-gnu --build=i486-linux-gnu --with-threads=yes make make check popen.test doesn't hang, but if I do this tar xzf guile-1.8.0.tar.gz mv guile-1.8.0 foo-1.8.0 cd foo-1.8.0 ./configure --host=i486-linux-gnu --build=i486-linux-gnu --with-threads=yes make make check it does hang (repeatably). I've looked at the process in gdb after it hangs, and it appears to be blocked in a select waiting in (I think) the "no duplicate" test. At first I thought this might just be a local issue, but then I asked someone on irc to try the same thing, and they saw identical behavior on a completely different architecture (powerpc). I'd really like to get this tracked down. The reason I noticed the problem is because the tree that the build tree for the forthcoming Debian Guile 1.8 packages isn't named guile-1.8.0. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-15 8:35 ` Rob Browning @ 2006-08-15 23:13 ` Kevin Ryde 2006-08-16 1:18 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Kevin Ryde @ 2006-08-15 23:13 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > > mv guile-1.8.0 foo-1.8.0 Sounds bizarre. > I've looked at the process in gdb after it hangs, and it appears to be > blocked in a select waiting in (I think) the "no duplicate" test. There two of those tests, which does it seem to be? (The child processes are different in each, "ps" might show which.) _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-15 23:13 ` Kevin Ryde @ 2006-08-16 1:18 ` Rob Browning 2006-08-16 23:26 ` Kevin Ryde 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-08-16 1:18 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: >> I've looked at the process in gdb after it hangs, and it appears to be >> blocked in a select waiting in (I think) the "no duplicate" test. > > There two of those tests, which does it seem to be? (The child > processes are different in each, "ps" might show which.) Ahh, I didn't notice that. It's actually the second. Here's the tail of check-guile.log: PASS: popen.test: open-input-pipe: stdin==stderr PASS: popen.test: open-input-pipe: stdout==stderr PASS: popen.test: open-input-pipe: no duplicate PASS: popen.test: open-output-pipe: no args PASS: popen.test: open-output-pipe: port? PASS: popen.test: open-output-pipe: stdin==stderr PASS: popen.test: open-output-pipe: stdout==stderr -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-16 1:18 ` Rob Browning @ 2006-08-16 23:26 ` Kevin Ryde 2006-08-18 3:01 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Kevin Ryde @ 2006-08-16 23:26 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > > Ahh, I didn't notice that. It's actually the second. I guess a select hang is probably the read-char waiting for the child to signal to the parent that it (the child) has closed stdout so the parent can go ahead and test that connection (for eof). Maybe gdb or some printfs can show if that's so. It still works ok for me though, with a renamed directory. But I do get a test fail in the "illegal (begin)" of syntax.test. Dunno what that means ... "(if #t (begin))" is supposed to be illegal but it passes, or something. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-16 23:26 ` Kevin Ryde @ 2006-08-18 3:01 ` Rob Browning 2006-08-21 23:38 ` Kevin Ryde 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-08-18 3:01 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > I guess a select hang is probably the read-char waiting for the > child to signal to the parent that it (the child) has closed stdout > so the parent can go ahead and test that connection (for eof). > Maybe gdb or some printfs can show if that's so. I haven't had a chance to investigate much further yet, but if I run make check under gdb and hit Ctrl-C while popen.test is hung, there are two threads that were running. The first thread's backtrace looks like this: #0 0xffffe410 in __kernel_vsyscall () #1 0x40281f41 in select () from /lib/tls/i686/cmov/libc.so.6 #2 0x400b481c in scm_std_select (nfds=29, readfds=0xbfa8621c, writefds=0x0, exceptfds=0x0, timeout=0x0) at threads.c:1363 #3 0x4005f8ce in fport_fill_input (port=0x40647400) at fports.c:550 #4 0x40086aff in scm_fill_input (port=0x40647400) at ports.c:973 #5 0x40086c40 in scm_getc (port=0x40647400) at ports.c:991 #6 0x40088d7d in scm_read_char (port=0x0) at ports.c:948 #7 0x4005ad54 in ceval (x=0x404, env=0x40647188) at eval.c:4218 and the second thread's backtrace looks like this: #0 0xffffe410 in __kernel_vsyscall () #1 0x401b681b in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x40091923 in do_read_without_guile (raw_data=0x40b310d8) at scmsigs.c:134 #3 0x400b42fc in scm_without_guile (func=0x400918f0 <do_read_without_guile>, data=0x40b310d8) at threads.c:657 #4 0x40091875 in signal_delivery_thread (data=0x0) at scmsigs.c:145 #5 0x400b67b5 in scm_c_catch (tag=0x104, body=0x400917f0 <signal_delivery_thread>, body_data=0x0, handler=0x400b6080 <scm_handle_by_message>, handler_data=0x400d867e, pre_unwind_handler=0, pre_unwind_handler_data=0x0) at throw.c:204 #6 0x400b69c5 in scm_internal_catch (tag=0x104, body=0x400917f0 <signal_delivery_thread>, body_data=0x0, handler=0x400b6080 <scm_handle_by_message>, handler_data=0x400d867e) at throw.c:218 #7 0x400b55ad in really_spawn (d=0xbfa860c8) at threads.c:777 -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-18 3:01 ` Rob Browning @ 2006-08-21 23:38 ` Kevin Ryde 2006-08-22 2:03 ` Rob Browning 2006-08-30 7:50 ` Rob Browning 0 siblings, 2 replies; 30+ messages in thread From: Kevin Ryde @ 2006-08-21 23:38 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > > I haven't had a chance to investigate much further yet, but if I run > make check under gdb and hit Ctrl-C while popen.test is hung, there > are two threads that were running. Looks about right. What's the child process doing? It's supposed to be writing to the parent to say continue. (Unless it failed to fork there should be some child, either running or a zombie.) _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-21 23:38 ` Kevin Ryde @ 2006-08-22 2:03 ` Rob Browning 2006-08-25 1:16 ` Kevin Ryde 2006-08-30 7:50 ` Rob Browning 1 sibling, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-08-22 2:03 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > Looks about right. What's the child process doing? It's supposed > to be writing to the parent to say continue. (Unless it failed to > fork there should be some child, either running or a zombie.) >From the output of ps auxf, it looks like the child is probably one of these zombies. There are several zombie sh children: \_ [sh] <defunct> \_ [sh] <defunct> \_ [sh] <defunct> \_ [lt-guile] <defunct> \_ [lt-guile] <defunct> -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-22 2:03 ` Rob Browning @ 2006-08-25 1:16 ` Kevin Ryde 2006-08-25 2:39 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Kevin Ryde @ 2006-08-25 1:16 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > >>From the output of ps auxf, it looks like the child is probably one of > these zombies. There are several zombie sh children: > > \_ [sh] <defunct> > \_ [sh] <defunct> > \_ [sh] <defunct> > \_ [lt-guile] <defunct> > \_ [lt-guile] <defunct> Actually, I see the signalling pipe is left in the parent, so if the child has printed nothing it could hang. Obviously there's something wrong if the child doesn't print what's expected, but the change below should at least cause an error instead of a hang in that case. ;; After the child closes stdin (which it indicates here by writing ;; "closed" to stderr), the parent should see a broken pipe. We setup to ;; see this as EPIPE (rather than SIGPIPE). In Guile 1.6.4 and earlier a ;; duplicate of stdin existed in the child, preventing the broken pipe ;; occurring. (pass-if "no duplicate" (with-epipe (lambda () (let* ((pair (pipe)) (port (with-error-to-port (cdr pair) (lambda () (open-output-pipe "exec 0</dev/null; echo closed 1>&2; exec 2>/dev/null; sleep 999"))))) (close-port (cdr pair)) ;; write side (and (char? (read-char (car pair))) ;; wait for child to do its thing (catch 'system-error (lambda () (write-char #\x port) (force-output port) #f) (lambda (key name fmt args errno-list) (= (car errno-list) EPIPE)))))))) _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-25 1:16 ` Kevin Ryde @ 2006-08-25 2:39 ` Rob Browning 2006-08-27 20:06 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-08-25 2:39 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > Actually, I see the signalling pipe is left in the parent, so if the > child has printed nothing it could hang. Obviously there's something > wrong if the child doesn't print what's expected, but the change below > should at least cause an error instead of a hang in that case. OK, here's what I see with that test code: Running popen.test FAIL: popen.test: open-output-pipe: no duplicate FAIL: popen.test: close-pipe: exit 0 ERROR: popen.test: close-pipe: exit 1 - arguments: ((wrong-number-of-args #f "Wrong number of arguments to ~A" (#<procedure #f (pt-entry)>) #f)) ERROR: In procedure fport_flush: ERROR: Bad file descriptor ERROR: In procedure fport_flush: ERROR: Bad file descriptor Running ports.test -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-25 2:39 ` Rob Browning @ 2006-08-27 20:06 ` Rob Browning 2006-08-27 20:22 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-08-27 20:06 UTC (permalink / raw) Rob Browning <rlb@defaultvalue.org> writes: > OK, here's what I see with that test code: > > Running popen.test > FAIL: popen.test: open-output-pipe: no duplicate > FAIL: popen.test: close-pipe: exit 0 > ERROR: popen.test: close-pipe: exit 1 - arguments: ((wrong-number-of-args #f "Wrong number of arguments to ~A" (#<procedure #f (pt-entry)>) #f)) > ERROR: In procedure fport_flush: > ERROR: Bad file descriptor > ERROR: In procedure fport_flush: > ERROR: Bad file descriptor > Running ports.test Hmm. Just to be sure, I started off with a completely fresh tree, and now it hangs with the original code, and fails like this with your replacement code: Running popen.test FAIL: popen.test: open-output-pipe: no duplicate Running ports.test -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-27 20:06 ` Rob Browning @ 2006-08-27 20:22 ` Rob Browning 0 siblings, 0 replies; 30+ messages in thread From: Rob Browning @ 2006-08-27 20:22 UTC (permalink / raw) Rob Browning <rlb@defaultvalue.org> writes: > Hmm. Just to be sure, I started off with a completely fresh tree, and > now it hangs with the original code, and fails like this with your > replacement code: > > Running popen.test > FAIL: popen.test: open-output-pipe: no duplicate > Running ports.test OK, it turns out if I add a (sleep 1) just before your close-port, I see the "Bad file descriptor" errors again, i.e.: (sleep 1) (close-port (cdr pair)) ;; write side produces this, and the two subsequent tests also fail: Running popen.test FAIL: popen.test: open-output-pipe: no duplicate ERROR: popen.test: close-pipe: exit 0 - arguments: ((wrong-number-of-args #f "Wrong number of arguments to ~A" (#<procedure #f (pt-entry)>) #f)) ERROR: In procedure fport_flush: ERROR: Bad file descriptor ERROR: In procedure fport_flush: ERROR: Bad file descriptor FAIL: popen.test: close-pipe: exit 0 FAIL: popen.test: close-pipe: exit 1 Running ports.test -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-21 23:38 ` Kevin Ryde 2006-08-22 2:03 ` Rob Browning @ 2006-08-30 7:50 ` Rob Browning 2006-08-31 0:24 ` Kevin Ryde 1 sibling, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-08-30 7:50 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > Looks about right. What's the child process doing? It's supposed to > be writing to the parent to say continue. (Unless it failed to fork > there should be some child, either running or a zombie.) (Consider the following info preliminary. I haven't had any time to try and figure out the actual cause, but since I just discovered this, and I have to stop for the moment, I wanted to let everyone else have a look.) After further investigation, it appears that particular child might not be running, at least not on some of the runs. I switched back to the original code (the code that would hang), added some debug statements, ran strace -p -s 100, etc. on "make check", and found that the child appears to be segfaulting at least some of the time here (in popen.scm): (port-for-each (lambda (pt-entry) ;;(dbg-out (list 'pt-entry pt-entry)) (false-if-exception (let ((pt-fileno (fileno pt-entry))) (if (not (or (= pt-fileno input-fdes) (= pt-fileno output-fdes) (= pt-fileno error-fdes))) (close-fdes pt-fileno)))))) When I uncomment the dbg-out statement above (which just writes the arg and a newline to an output-port and then forces the output), I see this on the console: ERROR: popen.test: open-output-pipe: no duplicate - arguments: ((wrong-type-arg "list-copy" "Wrong type argument in position ~A: ~S" (1 (pt-entry . #<freed cell 0x40305830; GC missed a reference>)) ((pt-entry . #<freed cell 0x40305830; GC missed a reference>)))) this in the dbg-out output file: ... (pt-entry #<output: string 81079e0>) (pt-entry #<output: string 8106650>) (pt-entry #<freed cell 0x40643c18; GC missed a reference>) and this in the strace (1402 is the forked child process): 1402 write(7, "ERROR: popen.test: open-output-pipe: no duplicate - arguments: ((wrong-type-arg \"list-copy\" \"Wrong t"..., 263) = -1 EBADF (Bad file descriptor) 1402 write(2, "ERROR", 5) = -1 EBADF (Bad file descriptor) 1402 write(2, "\nException during displaying of ", 32) = -1 EBADF (Bad file descriptor) 1402 write(7, "ERROR: popen.test: open-output-pipe: no duplicate - arguments: ((wrong-type-arg \"list-copy\" \"Wrong t"..., 263) = -1 EBADF (Bad file descriptor) 1402 write(2, "ERROR", 5) = -1 EBADF (Bad file descriptor) 1402 write(2, "\nException during displaying of ", 32) = -1 EBADF (Bad file descriptor) 1402 exit_group(1) = ? If I omit the dbg-out statement in the above code, then I can just see the child die due to a SEGV in the strace log (2126 is the child): 2126 close(12 <unfinished ...> 2123 <... close resumed> ) = 0 2126 <... close resumed> ) = 0 2123 access("/etc/ld.so.nohwcap", F_OK <unfinished ...> 2126 close(10 <unfinished ...> 2123 <... access resumed> ) = -1 ENOENT (No such file or directory) 2126 <... close resumed> ) = 0 2123 open("/lib/tls/i686/cmov/libdl.so.2", O_RDONLY <unfinished ...> 2126 close(29 <unfinished ...> 2123 <... open resumed> ) = 5 2126 <... close resumed> ) = 0 2123 read(5, <unfinished ...> 2126 --- SIGSEGV (Segmentation fault) @ 0 (0) --- So I started from a clean tree, enabled core dumps, and here's what gdb had to say about the resulting core: Program terminated with signal 11, Segmentation fault. #0 0x400729ca in scm_fileno (port=0x0) at ioext.c:180 180 port = SCM_COERCE_OUTPORT (port); (gdb) where #0 0x400729ca in scm_fileno (port=0x0) at ioext.c:180 #1 0x4005ad41 in ceval (x=0x404, env=0x40372710) at eval.c:4218 #2 0x4005b26e in ceval (x=<value optimized out>, env=0x40372710) at eval.c:3634 In any case, as I said, consider all this preliminary. For everything but the core dump, I wasn't working from a clean tree. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-30 7:50 ` Rob Browning @ 2006-08-31 0:24 ` Kevin Ryde 2006-08-31 6:28 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Kevin Ryde @ 2006-08-31 0:24 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > > ;;(dbg-out (list 'pt-entry pt-entry)) > ... > > ERROR: popen.test: open-output-pipe: no duplicate - arguments: > ((wrong-type-arg "list-copy" "Wrong type argument in position ~A: ~S" > (1 (pt-entry . #<freed cell 0x40305830; GC missed a reference>)) > ((pt-entry . #<freed cell 0x40305830; GC missed a reference>)))) It's possible the complaint is about the `list' call. I changed `list' to be implemented using scm_list_copy, which of course reports itself as `list-copy' if the list is improper. But I can't think how an improper list could arrive there from a "(list x y)" call. > (pt-entry #<freed cell 0x40643c18; GC missed a reference>) I suppose that's the killer, a port gc-ed prematurely. Perhaps there'd be some significance in which one it was. port-for-each looks pretty safe, maybe the port table has gone bad earlier. Not sure how this would relate to the following bit you posted before. If this bit was taken literally it'd sound like something evil in scm_call_1 I suppose. > ERROR: popen.test: close-pipe: exit 0 - arguments: ((wrong-number-of-args #f "Wrong number of arguments to ~A" (#<procedure #f (pt-entry)>) #f)) _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-31 0:24 ` Kevin Ryde @ 2006-08-31 6:28 ` Rob Browning 2006-09-04 2:45 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-08-31 6:28 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > I suppose that's the killer, a port gc-ed prematurely. Perhaps > there'd be some significance in which one it was. port-for-each > looks pretty safe, maybe the port table has gone bad earlier. I think I found the problem. I believe scm_c_port_for_each() needs a call to scm_remember_upto_here_1(ports) at the end. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-08-31 6:28 ` Rob Browning @ 2006-09-04 2:45 ` Rob Browning 2006-09-07 20:46 ` Kevin Ryde 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-09-04 2:45 UTC (permalink / raw) Rob Browning <rlb@defaultvalue.org> writes: > Kevin Ryde <user42@zip.com.au> writes: > >> I suppose that's the killer, a port gc-ed prematurely. Perhaps >> there'd be some significance in which one it was. port-for-each >> looks pretty safe, maybe the port table has gone bad earlier. > > I think I found the problem. I believe scm_c_port_for_each() needs a > call to scm_remember_upto_here_1(ports) at the end. Apparently I spoke too soon, or rather, that wasn't the only problem. Though the child is no longer segfaulting, popen.test still hangs occasionally. I caught it hanging between two debug statements with nothing between them other than a (gc) statement I added. So I tried attaching gdb to the child (which is still in open-process, a bit after the fork), and saw: #0 0xffffe410 in __kernel_vsyscall () #1 0x401b655e in __lll_mutex_lock_wait () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x401b318d in _L_mutex_lock_29 () from /lib/tls/i686/cmov/libpthread.so.0 #3 0xbf83b568 in ?? () #4 0x400b41b9 in scm_leave_guile () at threads.c:369 #5 0x400b423b in scm_i_thread_put_to_sleep () at threads.c:1538 #6 0x40060c69 in scm_i_gc (what=0x400d2dcb "call") at gc.c:548 #7 0x40060e81 in scm_gc () at gc.c:454 An strace also indicates that the child is blocked on a futex call. Looking at threads.c, at that that point Guile is calling scm_i_pthread_mutex_unlock() from within scm_leave_guile(). I imagine that pthread_mutex_unlock() is making the call to lock a lower level mutex, which blocks. At the moment, though, I don't why it blocks. I wonder if the parent left that mutex in an unexpected state when it forked. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-04 2:45 ` Rob Browning @ 2006-09-07 20:46 ` Kevin Ryde 2006-09-08 2:45 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Kevin Ryde @ 2006-09-07 20:46 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > > At the moment, though, I don't why it blocks. I wonder if the parent > left that mutex in an unexpected state when it forked. Sound bizarre. Maybe hacking scm_i_pthread_mutex_init to force the use of the "error checking" mutex attribute and then hacking scm_i_pthread_mutex_lock and scm_i_pthread_mutex_unlock to check the return and errno would show something. Dunno what though. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-07 20:46 ` Kevin Ryde @ 2006-09-08 2:45 ` Rob Browning 2006-09-08 23:05 ` Kevin Ryde 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-09-08 2:45 UTC (permalink / raw) Kevin Ryde <user42@zip.com.au> writes: > Sound bizarre. Maybe hacking scm_i_pthread_mutex_init to force the > use of the "error checking" mutex attribute and then hacking > scm_i_pthread_mutex_lock and scm_i_pthread_mutex_unlock to check the > return and errno would show something. Dunno what though. See pthread_atfork(3): To understand the purpose of !pthread_atfork!, recall that !fork!(2) duplicates the whole memory space, including mutexes in their current locking state, but only the calling thread: other threads are not run- ning in the child process. The mutexes are not usable after the !fork! and must be initialized with |pthread_mutex_init| in the child process. This is a limitation of the current implementation and might or might not be present in future versions. I believe a while back I wondered if we might have trouble with pthreads and fork unless we were very careful, and I'm beginning to wonder again. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-08 2:45 ` Rob Browning @ 2006-09-08 23:05 ` Kevin Ryde 2006-09-09 12:52 ` Marius Vollmer 0 siblings, 1 reply; 30+ messages in thread From: Kevin Ryde @ 2006-09-08 23:05 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > > I believe a while back I wondered if we might have trouble with > pthreads and fork unless we were very careful, and I'm beginning to > wonder again. Ahh, that sounds likely. If mutexes are unusable once forked then that'd hurt the gc trying to get all threads to rendezvous. Sounds bad for any scheme program that wants to fork, not just ice-9 popen. :-( _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-08 23:05 ` Kevin Ryde @ 2006-09-09 12:52 ` Marius Vollmer 2006-09-09 16:39 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Marius Vollmer @ 2006-09-09 12:52 UTC (permalink / raw) Cc: guile-devel Kevin Ryde <user42@zip.com.au> writes: > Rob Browning <rlb@defaultvalue.org> writes: >> >> I believe a while back I wondered if we might have trouble with >> pthreads and fork unless we were very careful, and I'm beginning to >> wonder again. > > Ahh, that sounds likely. If mutexes are unusable once forked then > that'd hurt the gc trying to get all threads to rendezvous. Sounds > bad for any scheme program that wants to fork, not just ice-9 popen. > :-( Yes. I think I have been here when investigating the popen blocking problem, but then I compiled Guile without thread support and I could still observe the blocking. (I stopped at that point, lazy me, but debugging should have been simpler without threads...) -- GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3 331E FAF8 226A D5D4 E405 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-09 12:52 ` Marius Vollmer @ 2006-09-09 16:39 ` Rob Browning 2006-09-10 16:24 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-09-09 16:39 UTC (permalink / raw) Cc: guile-devel Marius Vollmer <mvo@zagadka.de> writes: > Yes. I think I have been here when investigating the popen blocking > problem, but then I compiled Guile without thread support and I could > still observe the blocking. (I stopped at that point, lazy me, but > debugging should have been simpler without threads...) How recent was your checkout when you ran the test? I ask because it looks like there are two bugs, and I wanted to know if your tree was recent enough to include the fix I committed for one of them. In any case, I'll see if I can still get it to hang using --without-threads. Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-09 16:39 ` Rob Browning @ 2006-09-10 16:24 ` Rob Browning 2006-09-11 1:16 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-09-10 16:24 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > In any case, I'll see if I can still get it to hang using > --without-threads. I have a tree here where I added a (gc) call before and after the port-for-each call in popen.scm (in the child). Given that and --with-threads=yes, "make check" would hang every time I ran it. I rebuilt that tree with --with-threads=no and I ran make check repeatedly. After 20 iterations, it hadn't blocked. So I removed the (gc) calls and rebuilt with --with-threads=yes, and make check blocked after 7 iterations and then 183 iterations. Though not conclusive, these results, when combined with the gdb backtrace I posted earlier showing the blockage in scm_gc() while trying to lock a a mutex, seem to suggest that the remaining problem is thread related. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-10 16:24 ` Rob Browning @ 2006-09-11 1:16 ` Rob Browning 2006-09-11 6:40 ` Neil Jerram 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-09-11 1:16 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > Though not conclusive, these results, when combined with the gdb > backtrace I posted earlier showing the blockage in scm_gc() while > trying to lock a a mutex, seem to suggest that the remaining problem > is thread related. On a related topic, in order for Guile 1.8 to make it into Debian etch (the upcoming stable release), we're essentially out of time to fix this problem. Because of this I'm contemplating uploading the initial Guile 1.8 packages with --with-threads=no. Does anyone see any major problems with this stopgap? It seems better to have Guile 1.8 in etch without threads than not to have it at all. Note that this issue doesn't just affect Guile. It's my understanding that unless Guile 1.8 packages are uploaded very soon, the newer version of lilypond won't make it into etch either. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-11 1:16 ` Rob Browning @ 2006-09-11 6:40 ` Neil Jerram 2006-09-11 15:57 ` Rob Browning 0 siblings, 1 reply; 30+ messages in thread From: Neil Jerram @ 2006-09-11 6:40 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > On a related topic, in order for Guile 1.8 to make it into Debian etch > (the upcoming stable release), we're essentially out of time to fix > this problem. Because of this I'm contemplating uploading the initial > Guile 1.8 packages with --with-threads=no. > > Does anyone see any major problems with this stopgap? It seems better > to have Guile 1.8 in etch without threads than not to have it at all. Will etch have guile 1.6 as well, or would this upload replace guile 1.6? In any case, I think without-threads is probably OK. > Note that this issue doesn't just affect Guile. It's my understanding > that unless Guile 1.8 packages are uploaded very soon, the newer > version of lilypond won't make it into etch either. That would be serious. Presumably lilypond doesn't require threads then? Regards, Neil _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-11 6:40 ` Neil Jerram @ 2006-09-11 15:57 ` Rob Browning 2006-09-11 17:40 ` Neil Jerram 0 siblings, 1 reply; 30+ messages in thread From: Rob Browning @ 2006-09-11 15:57 UTC (permalink / raw) Cc: guile-devel Neil Jerram <neil@ossau.uklinux.net> writes: > Will etch have guile 1.6 as well, or would this upload replace guile > 1.6? This won't affect the guile-1.6 packages. > That would be serious. Presumably lilypond doesn't require threads > then? I would assume so, since the lilypond maintainer's very much in favor of an immediate 1.8 upload with --with-threads=no. -- Rob Browning rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu GPG starting 2002-11-03 = 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 1.8 make check failing in popen.test 2006-09-11 15:57 ` Rob Browning @ 2006-09-11 17:40 ` Neil Jerram 0 siblings, 0 replies; 30+ messages in thread From: Neil Jerram @ 2006-09-11 17:40 UTC (permalink / raw) Cc: guile-devel Rob Browning <rlb@defaultvalue.org> writes: > Neil Jerram <neil@ossau.uklinux.net> writes: > >> Will etch have guile 1.6 as well, or would this upload replace guile >> 1.6? > > This won't affect the guile-1.6 packages. Thanks for confirming that. Regards, Neil _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2006-09-11 17:40 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-23 8:37 1.8 make check failing in popen.test Rob Browning 2006-07-23 22:36 ` Kevin Ryde 2006-07-23 23:12 ` Rob Browning 2006-07-24 7:25 ` Ludovic Courtès 2006-07-30 1:59 ` Rob Browning 2006-08-15 8:35 ` Rob Browning 2006-08-15 23:13 ` Kevin Ryde 2006-08-16 1:18 ` Rob Browning 2006-08-16 23:26 ` Kevin Ryde 2006-08-18 3:01 ` Rob Browning 2006-08-21 23:38 ` Kevin Ryde 2006-08-22 2:03 ` Rob Browning 2006-08-25 1:16 ` Kevin Ryde 2006-08-25 2:39 ` Rob Browning 2006-08-27 20:06 ` Rob Browning 2006-08-27 20:22 ` Rob Browning 2006-08-30 7:50 ` Rob Browning 2006-08-31 0:24 ` Kevin Ryde 2006-08-31 6:28 ` Rob Browning 2006-09-04 2:45 ` Rob Browning 2006-09-07 20:46 ` Kevin Ryde 2006-09-08 2:45 ` Rob Browning 2006-09-08 23:05 ` Kevin Ryde 2006-09-09 12:52 ` Marius Vollmer 2006-09-09 16:39 ` Rob Browning 2006-09-10 16:24 ` Rob Browning 2006-09-11 1:16 ` Rob Browning 2006-09-11 6:40 ` Neil Jerram 2006-09-11 15:57 ` Rob Browning 2006-09-11 17:40 ` Neil Jerram
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).