From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.bugs Subject: bug#19883: Correction for backtrace Date: Thu, 26 Feb 2015 13:32:00 +0100 Message-ID: <874mq8dfb3.fsf@fencepost.gnu.org> References: <87twyln70f.fsf@fencepost.gnu.org> <873865n2rr.fsf@fencepost.gnu.org> <87twy91vtc.fsf@gnu.org> <878ufld1iw.fsf@fencepost.gnu.org> <87k2z4dhx7.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1424953999 9967 80.91.229.3 (26 Feb 2015 12:33:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 26 Feb 2015 12:33:19 +0000 (UTC) Cc: 19883@debbugs.gnu.org To: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Thu Feb 26 13:33:11 2015 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YQxcq-00084o-A1 for guile-bugs@m.gmane.org; Thu, 26 Feb 2015 13:33:08 +0100 Original-Received: from localhost ([::1]:58690 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQxcp-00089y-Hi for guile-bugs@m.gmane.org; Thu, 26 Feb 2015 07:33:07 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45203) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQxcl-00086q-Ky for bug-guile@gnu.org; Thu, 26 Feb 2015 07:33:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YQxck-0002Wv-90 for bug-guile@gnu.org; Thu, 26 Feb 2015 07:33:03 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:54689) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQxck-0002Wr-61 for bug-guile@gnu.org; Thu, 26 Feb 2015 07:33:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1YQxcj-0002Bu-Ln for bug-guile@gnu.org; Thu, 26 Feb 2015 07:33:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: David Kastrup Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Thu, 26 Feb 2015 12:33:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 19883 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 19883-submit@debbugs.gnu.org id=B19883.14249539628398 (code B ref 19883); Thu, 26 Feb 2015 12:33:01 +0000 Original-Received: (at 19883) by debbugs.gnu.org; 26 Feb 2015 12:32:42 +0000 Original-Received: from localhost ([127.0.0.1]:58287 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YQxcP-0002BM-VV for submit@debbugs.gnu.org; Thu, 26 Feb 2015 07:32:42 -0500 Original-Received: from fencepost.gnu.org ([208.118.235.10]:45947 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YQxcM-0002BD-Qe for 19883@debbugs.gnu.org; Thu, 26 Feb 2015 07:32:39 -0500 Original-Received: from localhost ([127.0.0.1]:53252 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YQxcM-0004eQ-58; Thu, 26 Feb 2015 07:32:38 -0500 Original-Received: by lola (Postfix, from userid 1000) id 5CBFADF676; Thu, 26 Feb 2015 13:32:00 +0100 (CET) In-Reply-To: <87k2z4dhx7.fsf@gnu.org> ("Ludovic =?UTF-8?Q?Court=C3=A8s?="'s message of "Thu, 26 Feb 2015 12:35:32 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7728 Archived-At: ludo@gnu.org (Ludovic Court=C3=A8s) writes: > David Kastrup skribis: > >> ludo@gnu.org (Ludovic Court=C3=A8s) writes: >> >>> David Kastrup skribis: >>> >>>> This is embarrassing: I used the wrong executable in connection with t= he >>>> core dump. With the matching executable, the coredump makes a lot more >>>> sense: >>>> >>>> #0 0x00000000 in ?? () >>>> #1 0x0804aee0 in Smob_base::mark_trampoline (arg=3D0x9fbb000) >>>> at smobs.tcc:34 >>>> #2 0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22 >>>> #3 0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.s= o.1 >>> >>> Could you try commenting out all the SMOB mark functions in LilyPond? >>> >>> This doesn=E2=80=99t fix the bug, of course, but it=E2=80=99s probably = a good >>> workaround: user-provided mark functions are not needed in Guile 2.0 >>> since libgc scans the whole heap for live pointers. >> >> Even the test program crashes at the end (when `count' is called in >> order to traverse the created hierarchy) when you disable the setting of >> the mark function in the init method in smobs.tcc. > > Could you add debugging symbols for libguile? I don=E2=80=99t understand= how > =E2=80=98count=E2=80=99 gets called. Figure me surprised. Here is the recursive walk: int Family::count () { int sum =3D 1; for (int i =3D 0; i < kids.size (); i++) sum +=3D kids[i]->count (); return sum; } and here is the starting call in workload(): cout << "last has " << Family::unsmob (k)->count () << endl; > Do you know if this is a use-after-free error? Sure. Nothing else would clobber the kids[] array to contain bad pointers. > If this is the case, Andy had the idea of turning on topological > finalization in the GC. This may help for this particular case, but I > vaguely recall that this breaks other finalizer-related things. I don't see why. Topological finalization might help with mark-after-free. But why would it help if there is not even any mark call involved? This is clearly use-after-free. > (I would check by myself, but ISTR that building LilyPond =E2=80=9Con one= =E2=80=99s > own=E2=80=9D is not recommended. What would you suggest? A Guix recipe = would > be sweet.) Is there a reason you are not using the test program provided with this bug report? There is no real point in experimenting with LilyPond's complexity when a simple test program using its memory management classes already crashes. LilyPond's GUILEv2 branch is currently out of order again since 2.0.11 changed encoding mechanisms _again_ in an incompatible manner (what GUILE calls "stable" is anything but). It is becoming harder and harder to work around GUILE's attempts of wresting encoding control from the application, while GUILE has no byte-transparent decoding of UTF-8, does not support strings encoded in UTF-8, and (as of 2.0.11 or 2.0.10) supports _only_ string ports redecoded to UTF-8. So dealing with memory-mapped UTF-8 encoded files which are multiplexed between reading by GUILE and reading by an UTF-8 decoding parser has again been thwarted. While I try figuring out how to repair the damage this time, testing with LilyPond itself is hard to interpret since a number of problems are not related to the memory management. As long as this simple test program can show the memory management related crashes, I don't see the point in throwing people at LilyPond: that has not delivered any results the last several times I tried it. >> A pointer to a C++ structure does not appear to protect the >> corresponding SMOB data and free_smob calls the delete operator which >> calls destructors and clobbers the memory area. > > Oh, I was mistaken in my previous message. GC scans the stack and the > GC-managed heap (stuff allocated with GC_MALLOC/scm_gc_malloc et al.), > but it does *not* scan the malloc/new heap. > > So indeed, C++ objects that hold references to =E2=80=98SCM=E2=80=99 obje= cts, such as > instances of =E2=80=98Smob=E2=80=99, must either have a mark function,= or they must > be allocated with scm_gc_malloc. > > Would it be possible to add a =E2=80=98new=E2=80=99 operator to =E2=80=98= Smob=E2=80=99 that uses > =E2=80=98scm_gc_malloc=E2=80=99, and a =E2=80=98delete=E2=80=99 operator = that uses =E2=80=98scm_gc_free=E2=80=99? It would not help since many of the references are stored in STL containers (like std::vector ) which have their data allocated/deallocated separately from the memory area of the structure itself. Frankly, I don't get the current strategy of GUILE: basically any use of scm_set_smob_mark will result in a function that can be called with garbage from a smob that has already been deallocated via the function registered with scm_set_smob_free. GUILEv2 developers have resisted fixing this bug for years by trying to stop people from using scm_set_smob_mark and instead telling people to have their entire heap scanned by a conservative garbage collector. For an application like LilyPond which can easily have the heap cover more than half of the available address space and run for half an hour (when generating docs) processing independent files with large individual memory requirements, this strategy will have both considerable performance impacts as well as bleed enough randomly retained memory to run the application into the ground eventually. In my current work on fixing the encoding stuff again I have patched my code to deal with the mark-after-free errors in the free and mark trampolines myself. I need to find a solution for the encoding mess before I can actually indulge in more testing of this workaround. However, due to the intransparency of GUILE's implementation and the multithreaded collector, I have no guarantees that my work on the respective trampolines will reliably prevent all mark-after-free errors. This is something that needs to get fixed in GUILE. It does not make sense to provide a mark callback mechanism that can be called with garbage in GUILE's free store. When GUILE releases/collects memory, it does not make sense to leave the SMOB cells in a state indistinguishable from from valid data. Apart from causing crashes in mark functions, this makes work much harder for the conservative garbage collector. --=20 David Kastrup