From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ken Raeburn Newsgroups: gmane.lisp.guile.devel Subject: Re: intermittent segfaults in master Date: Sat, 24 Oct 2009 17:56:52 -0400 Message-ID: <7B1681AC-62AE-4F28-9286-4F25BDEB5D0F@raeburn.org> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v936) Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1256421488 5747 80.91.229.12 (24 Oct 2009 21:58:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 24 Oct 2009 21:58:08 +0000 (UTC) To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Oct 24 23:58:01 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1N1ocD-0005Cr-Rg for guile-devel@m.gmane.org; Sat, 24 Oct 2009 23:58:01 +0200 Original-Received: from localhost ([127.0.0.1]:42406 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N1ocB-0000de-IH for guile-devel@m.gmane.org; Sat, 24 Oct 2009 17:57:35 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N1oc8-0000cL-Lb for guile-devel@gnu.org; Sat, 24 Oct 2009 17:57:32 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N1oc4-0000Uf-1R for guile-devel@gnu.org; Sat, 24 Oct 2009 17:57:32 -0400 Original-Received: from [199.232.76.173] (port=41774 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N1oc3-0000Uc-LF for guile-devel@gnu.org; Sat, 24 Oct 2009 17:57:27 -0400 Original-Received: from splat.raeburn.org ([69.25.196.39]:46839 helo=raeburn.org) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1N1obk-0002w3-VX for guile-devel@gnu.org; Sat, 24 Oct 2009 17:57:27 -0400 Original-Received: from [10.0.0.158] ([10.0.0.158]) by raeburn.org (8.14.3/8.14.1) with ESMTP id n9OLuqZG025629; Sat, 24 Oct 2009 17:56:52 -0400 (EDT) In-Reply-To: X-Mailer: Apple Mail (2.936) X-detected-operating-system: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9567 Archived-At: On Oct 24, 2009, at 09:30, Andy Wingo , n@a-pb-sasl-sd.pobox.com wrote: > I have been experiencing intermittent segfaults recently, as I > worked on > wip-case-lambda. They would almost always go away immediately -- as > in, > while rebuilding guile, the process would stop because of a segfault, > but I could type make again and it would succeed. I've been seeing intermittent faults too, while working on the trunk and building with -DSCM_DEBUG=1. > For the meantime I could just make this a key-weak hash table. But > this > seems like the kind of problem that could hit user code. Ludovic I > think > you will start to see these crashes now that case-lambda was merged > (and > specifically 56164a5a). Would you be on the lookout for this kind of > problem, and in contact with the libgc list? If this analysis is > correct > anyway, it's very possibly I have misinterpreted things. My guess is we want key-weak for that hash table anyways. But, I've been able to generate a crash even with this patch in. This is on Mac OS X (10.5.8), libgc 7.1 (as installed by macports), guile commit id 15ab466, plus the SCM_DEBUG patches I submitted before. (This particular set of stack traces is from binaries built without SCM_DEBUG, though the SCM_DEBUG version also shows the bug intermittently.) The code: (call-with-new-thread (lambda () (while #t (gc))) (lambda () #f)) (let ((h (make-doubly-weak-hash-table 0))) (while #t (hashq-set! h 'proc (assq-set! (hashq-ref h 'proc '()) 'akey (list 1))) (hashq-set! h 'proc (assq-set! (hashq-ref h 'proc '()) 'akey2 (list 1))) (assq-ref (hashq-ref h 'proc '()) 'akey) (assq-ref (hashq-ref h 'proc '()) 'akey3) (display "."))) It can take a while to trigger the problem, and I'm not sure it even happens on every invocation; I usually quit the test after several minutes if it hasn't shown the problem, but sometimes simply starting it again triggers it fairly quickly. It wouldn't surprise me if it's also OS-, CPU-, and compiler-dependent. I don't know if the separate GC thread is necessary. It wasn't in my original test case, but simplifying the test case seems to have made it harder to actually trigger the problem; I thought forcing excessive GC invocations might help, and I think it has, though that's just a subjective impression. A trace of the crashing thread: (gdb) bt 10 #0 0x0014af5b in scm_is_pair [inlined] () at inline.h:61 #1 0x0014af5b in scm_sloppy_assq (key=0x10b78d0, alist=0x0) at ../../ guile/libguile/alist.c:61 #2 0x0014b3b4 in scm_is_pair [inlined] () at inline.h:272 #3 0x0014b3b4 in scm_assq_set_x (alist=0x10858e0, key=0x10b78d0, val=0x1085fd0) at ../../guile/libguile/alist.c:61 #4 0x0016dcda in scm_dapply (proc=, arg1=0x1085fb0, args=) at ../../guile/libguile/alist.c:61 #5 0x001e90ac in vm_debug_engine (vp=0x597fa0, program=, argv=0x0, nargs=) at ../../guile/ libguile/alist.c:61 #6 0x001732f1 in scm_call_0 (proc=0x10b7790) at ../../guile/libguile/ alist.c:61 #7 0x001d9ad5 in scm_c_catch (tag=0x10858d8, body=0x1da070 , body_data=0xbfffe9b8, handler=0x1da090 , handler_data=0xbfffe9d8, pre_unwind_handler=0x10858d8, pre_unwind_handler_data=0x10858d8) at ../../guile/libguile/alist.c:61 #8 0x001da229 in scm_catch_with_pre_unwind_handler (key=0x10b7950, thunk=0x10b7790, handler=0x10b7740, pre_unwind_handler=0x204) at ../../ guile/libguile/alist.c:61 #9 0x00182261 in gsubr_apply_raw (proc=0x56ff50, argc=, argv=0xbfffea5c) at ../../guile/libguile/alist.c:61 [...] The data being examined: (gdb) fr 3 #3 scm_assq_set_x (alist=0x10858e0, key=0x10b78d0, val=0x1085fd0) at ../../guile/libguile/alist.c:61 273 if (scm_is_pair (handle)) (gdb) p alist $5 = (SCM) 0x10858e0 (gdb) p (SCM*)$5 $6 = (SCM *) 0x10858e0 (gdb) p $6[0] $7 = (SCM) 0x10858d8 (gdb) p $6[1] $8 = (SCM) 0x0 The garbage collection thread: #0 0x9186729e in semaphore_signal_trap () #1 0x9186f04d in pthread_mutex_unlock () #2 0x0029707e in GC_try_to_collect () #3 0x002970db in GC_gcollect () #4 0x00178a0e in scm_gc () at ../../guile/libguile/gc.c:390 #5 0x0016d7c2 in scm_dapply (proc=, arg1=0xa01cd584, args=) at eval.i.c:1754 #6 0x001e96c0 in vm_debug_engine (vp=0x597dc0, program=, argv=0x0, nargs=) at vm-i-system.c:919 #7 0x001732f1 in scm_call_0 (proc=0x1000a30) at ../../guile/libguile/ eval.c:3113 #8 0x001d9ad5 in scm_c_catch (tag=0x0, body=0x1da070 , body_data=0xb00807c8, handler=0x1da090 , handler_data=0xb00807e8, pre_unwind_handler=0, pre_unwind_handler_data=0x0) at ../../guile/libguile/throw.c:243 #9 0x001da229 in scm_catch_with_pre_unwind_handler (key=0x1000a40, thunk=0x1000a30, handler=0x10b7ae0, pre_unwind_handler=0x204) at ../../ guile/libguile/throw.c:627 [...] Ken