From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ken Raeburn Newsgroups: gmane.emacs.devel Subject: Re: allocate_string_data memory corruption Date: Wed, 18 Jan 2006 16:35:52 -0500 Message-ID: References: <87vewha2zl.fsf@stupidchicken.com> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (Apple Message framework v746.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1137643876 16752 80.91.229.2 (19 Jan 2006 04:11:16 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 19 Jan 2006 04:11:16 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jan 19 05:11:14 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EzR8W-0000uD-KW for ged-emacs-devel@m.gmane.org; Thu, 19 Jan 2006 05:11:01 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EzRAw-0006HQ-K3 for ged-emacs-devel@m.gmane.org; Wed, 18 Jan 2006 23:13:30 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1EzL1F-0006nz-JL for emacs-devel@gnu.org; Wed, 18 Jan 2006 16:39:05 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1EzL16-0006iq-0f for emacs-devel@gnu.org; Wed, 18 Jan 2006 16:39:04 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1EzL15-0006ie-LL for emacs-devel@gnu.org; Wed, 18 Jan 2006 16:38:55 -0500 Original-Received: from [204.127.198.39] (helo=rwcrmhc12.comcast.net) by monty-python.gnu.org with esmtp (Exim 4.34) id 1EzL50-0008H7-Cc for emacs-devel@gnu.org; Wed, 18 Jan 2006 16:42:58 -0500 Original-Received: from raeburn.org (c-65-96-168-237.hsd1.ma.comcast.net[65.96.168.237]) by comcast.net (rwcrmhc13) with ESMTP id <2006011821355601500aftrae>; Wed, 18 Jan 2006 21:36:01 +0000 Original-Received: from [18.101.0.226] (laptop.raeburn.org [18.101.0.226]) by raeburn.org (8.12.11/8.12.11) with ESMTP id k0ILZsV3024227; Wed, 18 Jan 2006 16:35:54 -0500 (EST) In-Reply-To: <87vewha2zl.fsf@stupidchicken.com> Original-To: Chong Yidong X-Mailer: Apple Mail (2.746.2) X-Mailman-Approved-At: Wed, 18 Jan 2006 23:13:19 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:49255 Archived-At: On Jan 18, 2006, at 11:57, Chong Yidong wrote: > There's been some progress tracking down the hyperthreading / > allocate_string related crash. We can now reproduce a crash reliably. That's good news... sort of... :-) > In this function, data->string is set to s, and nbytes is set to > nbytes. If check_sblock is a no-op, there should be no change. By "no-op", do you mean, for example, a macro or previously-defined empty function, such that the compiler will produce different code for allocate_string_data? I don't know if you're fluent in x86 assembly, but I'd check to see if the function's code differs between the two cases. If it doesn't, I think the next thing I'd try would be a watchpoint under gdb to see what happens during check_sblock. If you need to run some unpredictable large number of invocations of the function to trigger the problem, commands can be run at a breakpoint to enable the watchpoint right before the first check, and disable it after the second. You can use convenience variables to store copies of s, data, etc. If the assembly code does differ, I'd inspect the failing version more carefully. And maybe try to tweak the source or build options such that check_sblock doesn't influence how allocate_string_data is compiled. Is this consistent across OSes? E.g., Linux and *BSD or Solaris? How about compiler versions? Could be a subtle OS bug in task switching or something. Anything interesting going on with signal handlers at the time? > #1 0x0817499e in allocate_string_data (s=0x8d18778, nchars=8, > nbytes=8) at alloc.c:2013 > > s == (struct Lisp_String *) 0x8d18778 > data->string == (struct Lisp_String *) 0x8d18788 <-- off by 16 > > nbytes == 8 > data->nbytes == 200 <-- off by 192 > > nchars == 8 > needed == 20 And you've checked, for example, that data hasn't changed, that s and nchars still accurately reflect what the caller passed in, etc? Sometimes gdb can get confused if the compiler is too clever. My Red Hat system at work has hyperthreading in its cpu, perhaps I could help if you've got a portable test case setup? Ken