From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: The 2.0.9 VM cores in enqueue (threads.c:309) Date: Mon, 29 Apr 2013 06:10:01 -0400 Message-ID: <87sj29zlza.fsf@tines.lan> References: <517C2DBF.7050304@computer.org> <517E1990.3090302@computer.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1367230223 3104 80.91.229.3 (29 Apr 2013 10:10:23 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 29 Apr 2013 10:10:23 +0000 (UTC) Cc: guile-devel@gnu.org To: Andrew Gaylard Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Mon Apr 29 12:10:28 2013 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1UWl2R-0004Yv-N2 for guile-devel@m.gmane.org; Mon, 29 Apr 2013 12:10:27 +0200 Original-Received: from localhost ([::1]:43519 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UWl2R-0004yp-5s for guile-devel@m.gmane.org; Mon, 29 Apr 2013 06:10:27 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:53557) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UWl2K-0004wr-4o for guile-devel@gnu.org; Mon, 29 Apr 2013 06:10:24 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UWl2I-0006HL-8b for guile-devel@gnu.org; Mon, 29 Apr 2013 06:10:20 -0400 Original-Received: from world.peace.net ([96.39.62.75]:44313) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UWl2I-0006H1-3z for guile-devel@gnu.org; Mon, 29 Apr 2013 06:10:18 -0400 Original-Received: from ip68-9-118-38.ri.ri.cox.net ([68.9.118.38] helo=tines.lan) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1UWl29-0005Nj-SM; Mon, 29 Apr 2013 06:10:10 -0400 In-Reply-To: <517E1990.3090302@computer.org> (Andrew Gaylard's message of "Mon, 29 Apr 2013 08:56:16 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:16309 Archived-At: Hi Andrew, Andrew Gaylard writes: > On 04/28/13 03:07, Daniel Hartwig wrote: >> On 28 April 2013 03:57, Andrew Gaylard wrote: >>> Those 0x304 values look dodgy to me, and explain why the >>> SCM_SETCDR causes an invalid memory access. >>> >> 0x304 is SCM_EOL. > Hi Daniel, > > Thanks for the feedback. > > Are you saying that the 0x304 values are fine, and the problem lies > elsewhere? As Daniel pointed out, 0x304 is SCM_EOL, i.e. the empty list '(). > #0 0xffffffff7e77b5f4 in enqueue (q=0x1010892c0, t=0x1018aac20) at > threads.c:309 > #1 0xffffffff7e77bc20 in block_self (queue=0x1010892c0, > sleep_object=0x1010892d0, mutex=0x1019eef00, waittime=0x0) at > threads.c:452 > #2 0xffffffff7e77df50 in fat_mutex_lock (mutex=0x1010892d0, > timeout=0x0, owner=0x904, ret=0xffffffff734f92ac) at threads.c:1473 [...] > (gdb) list > 304 SCM c = scm_cons (t, SCM_EOL); > 305 SCM_CRITICAL_SECTION_START; > 306 if (scm_is_null (SCM_CDR (q))) > 307 SCM_SETCDR (q, c); > 308 else > 309 SCM_SETCDR (SCM_CAR (q), c); > 310 SCM_SETCAR (q, c); > 311 SCM_CRITICAL_SECTION_END; > 312 return c; > 313 } [...] > (gdb) p *SCM2PTR(q) > $26 = {word_0 = 0x304, word_1 = 0x1039c4c20} What's happening here is that the wait queue (m->waiting in fat_mutex) is somehow getting corrupted. The code above ('enqueue' in threads.c) is trying to add a new element to the queue. The queue is represented as a pair whose CDR is the list of items in the queue, and whose CAR points to the last pair of that list. Somehow, the CAR is becoming null even though the CDR is non-empty. This should never happen. I looked through the relevant code, and it's not obvious to me how this could happen. The only functions I see that manipulate this queue are 'enqueue', 'remqueue', and 'dequeue', all static functions in threads.c. As far as I can see, these functions maintain the invariant that the CAR is null if and only if the CDR is null. All queue manipulation is done between SCM_CRITICAL_SECTION_START and SCM_CRITICAL_SECTION_END (defined in async.h) which lock a single global pthread mutex. Any ideas? Thanks, Mark