On 04/06/2014 09:59 AM, Eli Zaretskii wrote: >> Date: Sun, 06 Apr 2014 09:37:23 -0700 >> From: Daniel Colascione >> CC: monnier@IRO.UMontreal.CA, dmantipov@yandex.ru, 17168@debbugs.gnu.org >> >>> Because Richard has been using that machine for years, and I very much >>> doubt that he changed his usage patterns lately. >> >> Richard's not the only one who has seen this crash. Drew's also reported >> GC crashes in odd, and different, places. > > Which seem unrelated, and started much later than Richard reported > his. With a bug like this, unpredictable, usage-pattern-dependent behavior is expected. >>>>>>> In http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15583#23, Richard >>>>>>> provided the last good revno (113938) and the first bad one (114268); >>>>>>> I looked at that range of revisions, and 114156 looks relevant. How >>>>>>> about if we revert it and see if the problems go away? >>>>>> >>>>>> The bug would still be there, and we'd have no way to tell whether your >>>>>> proposed change actually reduced its occurrence to a tolerable level. >>>>>> Why would you want to do that instead of just fixing the bug? >>>>> >>>>> Because it's simpler, >>>> >>>> It's easy to make code that's simple and wrong. >>> >>> I didn't suggest any new code. >> >> No: you're just suggesting leaving incorrect code in Emacs. > > It's not incorrect, AFAIU. It might be less optimal. The current code isn't just sub-optimal. It's wrong. If you get unlucky and try to mark a dead symbol, you will crash. >>>>> and because it just might be that the bug was >>>>> caused by that other changeset. >>>> >>>> How might that changeset in particular have caused the problem reports? >>> >>> It is related to calling a function, and is in the same function from >>> which all the recent crashes started. >> >> You haven't identified a causal mechanism. Any recent change could have >> caused enough of a shift in code generation or stack layout to cause >> this problem, and because it manifests so seldom, it'd be hard to verify >> that reverting any particular change "fixed" the problem. > > I thought you had a test case. If not, how did you verify that your > suggested changes do fix the problem? There is a test. Your proposed change does not cause the test to pass. Even if it did, I would argue against substituting a real fix with your change. >> Also, eval_sub does *everything*. It's no surprise that we saw the >> crashes there. That's like saying "all crashes are associated with main, >> this change affects main, and therefore this change is responsible." > > The change is related to calling a function whose symbol has certain > properties. That sounds related to me, not just a random change > somewhere in eval_sub. It's a dangling pointer. Changing slightly the way we chase that dangling pointer won't change the overall result.