* Changes that should go into version 24.4 @ 2014-03-22 1:47 Richard Stallman 2014-03-22 1:57 ` Daniel Colascione 0 siblings, 1 reply; 22+ messages in thread From: Richard Stallman @ 2014-03-22 1:47 UTC (permalink / raw) To: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] I committed my changes that were waiting, then realized that two of them, in subr.el and battery.el, should probably go in the 24.4 release too. Would someone please put them in? -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. Use Ekiga or an ordinary phone call. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Changes that should go into version 24.4 2014-03-22 1:47 Changes that should go into version 24.4 Richard Stallman @ 2014-03-22 1:57 ` Daniel Colascione 2014-03-22 8:44 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Daniel Colascione @ 2014-03-22 1:57 UTC (permalink / raw) To: rms, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1270 bytes --] On 03/21/2014 06:47 PM, Richard Stallman wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > I committed my changes that were waiting, then realized > that two of them, in subr.el and battery.el, should probably > go in the 24.4 release too. > > Would someone please put them in? I can't speak to the battery.el change, but the subr.el one should be in neither branch. It papers over a release-blocking bug. We shouldn't release 24.4 until we've figured out why the hell the GC randomly crashes. You can't even be sure that your lisp hack even fixes the problem. Richard, it would be very helpful if you could provide either a recipe for reproducing your crash or an actual crash dump (not your paraphrasing of the stack trace). Specifically, you've mentioned that the crash happens in mark_memory. *Where* in mark_memory? What instruction? It doesn't make sense that we'd fault accessing a stack slot on an active frame: doing so might corrupt something later, sure, but that stack location is valid and touching it isn't going to cause an immediate SIGSEGV. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Changes that should go into version 24.4 2014-03-22 1:57 ` Daniel Colascione @ 2014-03-22 8:44 ` Eli Zaretskii 2014-03-22 8:50 ` Daniel Colascione 2014-03-22 9:08 ` Eli Zaretskii 2014-03-22 23:57 ` Richard Stallman 2 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2014-03-22 8:44 UTC (permalink / raw) To: Daniel Colascione; +Cc: rms, emacs-devel > Date: Fri, 21 Mar 2014 18:57:03 -0700 > From: Daniel Colascione <dancol@dancol.org> > > the subr.el one should be in neither branch. It papers over a > release-blocking bug. We shouldn't release 24.4 until we've figured > out why the hell the GC randomly crashes. You can't even be sure > that your lisp hack even fixes the problem. Since (evidently) no one is actively works on fixing that bug, I see no reasons to punish people who run the trunk codebase by imposing on them random crashes they cannot recover from. As long as the bug remains open, we didn't forget about it, and will fix it eventually; in the meantime, let users have one less reason for crashes. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Changes that should go into version 24.4 2014-03-22 8:44 ` Eli Zaretskii @ 2014-03-22 8:50 ` Daniel Colascione 2014-03-22 9:24 ` Eli Zaretskii 0 siblings, 1 reply; 22+ messages in thread From: Daniel Colascione @ 2014-03-22 8:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: rms, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1198 bytes --] On 03/22/2014 01:44 AM, Eli Zaretskii wrote: >> Date: Fri, 21 Mar 2014 18:57:03 -0700 >> From: Daniel Colascione <dancol@dancol.org> >> >> the subr.el one should be in neither branch. It papers over a >> release-blocking bug. We shouldn't release 24.4 until we've figured >> out why the hell the GC randomly crashes. You can't even be sure >> that your lisp hack even fixes the problem. > > Since (evidently) no one is actively works on fixing that bug, It's not that nobody's working on it --- it's that there's not enough information to make progress. The crash happens sporadically. > I see > no reasons to punish people who run the trunk codebase by imposing on > them random crashes they cannot recover from. As long as the bug > remains open, we didn't forget about it, and will fix it eventually; > in the meantime, let users have one less reason for crashes. If this were a normal bug, I'd agree completely --- but this bug is one we can't reproduce. (I've tried.) The more people see this problem, the greater the chance we'll get the information we need to actually fix its underlying cause. Without a reliable repro, what alternative would you suggest? [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Changes that should go into version 24.4 2014-03-22 8:50 ` Daniel Colascione @ 2014-03-22 9:24 ` Eli Zaretskii 2014-03-22 14:03 ` bug#15688: " Stefan 0 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2014-03-22 9:24 UTC (permalink / raw) To: Daniel Colascione; +Cc: rms, emacs-devel > Date: Sat, 22 Mar 2014 01:50:23 -0700 > From: Daniel Colascione <dancol@dancol.org> > CC: rms@gnu.org, emacs-devel@gnu.org > > > Since (evidently) no one is actively works on fixing that bug, > > It's not that nobody's working on it --- it's that there's not enough > information to make progress. The crash happens sporadically. Working on a bug includes adding debugging code that would help collecting the missing information. AFAICS, no one is doing that, either. > > I see > > no reasons to punish people who run the trunk codebase by imposing on > > them random crashes they cannot recover from. As long as the bug > > remains open, we didn't forget about it, and will fix it eventually; > > in the meantime, let users have one less reason for crashes. > > If this were a normal bug, I'd agree completely --- but this bug is one > we can't reproduce. (I've tried.) The more people see this problem, the > greater the chance we'll get the information we need to actually fix its > underlying cause. It is unreasonable to use others as guinea pigs in such cases, IMO. Also, we have already several other reports about GC-related crashes, see the list in bug #16901 (http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16901#32). Not sure if those are related, but if they aren't, how can we explain that only Richard experiences these problems? Perhaps searching the debbugs reports about any crashes in GC will reveal other potential candidates that are related to the same bug? > Without a reliable repro, what alternative would you suggest? Richard reported that the crashes started when he updated his branch not later than Sep 22, and that his previous update was around Aug 18. We could start by scrutinizing relevant changes between Aug 18 and Sep 22. In http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15688#107, Stefan described some insight on the problem, and Richard suggested that someone writes debugging code to detect the situation that apparently is a precursor to the crash. No one wrote such a code, AFAIK; perhaps we should. (I don't understand what Stefan said enough to do this, but maybe someone else does.) I also agree that having a core file from the crash might help, although we shouldn't have our expectations about that too high, since such core files are only good for determining which Lisp object caused it, and Richard already found out and described that. The efforts now should be to understand how does that object get corrupted, which is not something a core file from GC would normally help with. ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#15688: Changes that should go into version 24.4 2014-03-22 9:24 ` Eli Zaretskii @ 2014-03-22 14:03 ` Stefan 2014-03-22 14:53 ` Eli Zaretskii 2014-03-22 23:57 ` Richard Stallman 0 siblings, 2 replies; 22+ messages in thread From: Stefan @ 2014-03-22 14:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: rms, 15688 > Not sure if those are related, but if they aren't, how can we explain > that only Richard experiences these problems? My guess is that they only show up on Yeeloong, or maybe mips. Could even be a compiler bug, for all I know. > In http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15688#107, Stefan > described some insight on the problem, and Richard suggested that > someone writes debugging code to detect the situation that apparently > is a precursor to the crash. No one wrote such a code, AFAIK; perhaps > we should. (I don't understand what Stefan said enough to do this, > but maybe someone else does.) I'm not sure what kind of code could catch this, nor when to run it. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#15688: Changes that should go into version 24.4 2014-03-22 14:03 ` bug#15688: " Stefan @ 2014-03-22 14:53 ` Eli Zaretskii 2014-03-22 23:57 ` Richard Stallman 1 sibling, 0 replies; 22+ messages in thread From: Eli Zaretskii @ 2014-03-22 14:53 UTC (permalink / raw) To: Stefan; +Cc: rms, 15688 > From: Stefan <monnier@iro.umontreal.ca> > Cc: Daniel Colascione <dancol@dancol.org>, rms@gnu.org, 15688@debbugs.gnu.org > Date: Sat, 22 Mar 2014 10:03:09 -0400 > > > Not sure if those are related, but if they aren't, how can we explain > > that only Richard experiences these problems? > > My guess is that they only show up on Yeeloong, or maybe mips. > Could even be a compiler bug, for all I know. Could be. But we have quite a few GC-related bug reports lately, so it's not inconceivable that they are related. > > In http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15688#107, Stefan > > described some insight on the problem, and Richard suggested that > > someone writes debugging code to detect the situation that apparently > > is a precursor to the crash. No one wrote such a code, AFAIK; perhaps > > we should. (I don't understand what Stefan said enough to do this, > > but maybe someone else does.) > > I'm not sure what kind of code could catch this, nor when to run it. Perhaps you could add more detail to what you said there, and then others could try thinking about what kind of code would help. ^ permalink raw reply [flat|nested] 22+ messages in thread
* bug#15688: Changes that should go into version 24.4 2014-03-22 14:03 ` bug#15688: " Stefan 2014-03-22 14:53 ` Eli Zaretskii @ 2014-03-22 23:57 ` Richard Stallman 1 sibling, 0 replies; 22+ messages in thread From: Richard Stallman @ 2014-03-22 23:57 UTC (permalink / raw) To: Stefan; +Cc: 15688 [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] My guess is that they only show up on Yeeloong, or maybe mips. Could even be a compiler bug, for all I know. Since I can't get any more data by debugging when it crashes, what good do you think will be done by leaving out the workaround I installed? If you don't want it in the trunk and the release, I won't override you, but I will install it in my own copy. I'm tired of the hassle and I see nothing to be gained by suffering it more. > In http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15688#107, Stefan > described some insight on the problem, and Richard suggested that > someone writes debugging code to detect the situation that apparently > is a precursor to the crash. No one wrote such a code, AFAIK; perhaps > we should. (I don't understand what Stefan said enough to do this, > but maybe someone else does.) I'm not sure what kind of code could catch this, nor when to run it. Here's one idea: store that vector somewhere where GC can see it, and arrange to abort on freeing that vector. By debugging at that point we might learn some more. That seems easy enough that I will try it one of these days. -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. Use Ekiga or an ordinary phone call. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Changes that should go into version 24.4 2014-03-22 1:57 ` Daniel Colascione 2014-03-22 8:44 ` Eli Zaretskii @ 2014-03-22 9:08 ` Eli Zaretskii 2014-03-22 9:15 ` Daniel Colascione 2014-03-22 23:57 ` Richard Stallman 2 siblings, 1 reply; 22+ messages in thread From: Eli Zaretskii @ 2014-03-22 9:08 UTC (permalink / raw) To: Daniel Colascione; +Cc: rms, emacs-devel > Date: Fri, 21 Mar 2014 18:57:03 -0700 > From: Daniel Colascione <dancol@dancol.org> > > It doesn't make sense that we'd fault accessing a stack slot on an > active frame: doing so might corrupt something later, sure, but that > stack location is valid and touching it isn't going to cause an > immediate SIGSEGV. Crashes in mark_object usually have nothing to do with accessing a stack slot per se. mark_object looks at the object type, and then extracts a pointer to a C structure from it, and proceeds treating that pointer as a valid pointer to a valid structure of that type. If pointer it extracts is invalid, or points to something that is not a C struct of the type mark_object expects, we will segfault trying to interpret those. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Changes that should go into version 24.4 2014-03-22 9:08 ` Eli Zaretskii @ 2014-03-22 9:15 ` Daniel Colascione 0 siblings, 0 replies; 22+ messages in thread From: Daniel Colascione @ 2014-03-22 9:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: rms, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1134 bytes --] On 03/22/2014 02:08 AM, Eli Zaretskii wrote: >> Date: Fri, 21 Mar 2014 18:57:03 -0700 >> From: Daniel Colascione <dancol@dancol.org> >> >> It doesn't make sense that we'd fault accessing a stack slot on an >> active frame: doing so might corrupt something later, sure, but that >> stack location is valid and touching it isn't going to cause an >> immediate SIGSEGV. > > Crashes in mark_object usually have nothing to do with accessing a > stack slot per se. mark_object looks at the object type, and then > extracts a pointer to a C structure from it, and proceeds treating > that pointer as a valid pointer to a valid structure of that type. If > pointer it extracts is invalid, or points to something that is not a C > struct of the type mark_object expects, we will segfault trying to > interpret those. > Ah, yes. I was reading the message about the crash occurring "when mark_stack calls mark_memory". mark_object makes a lot more sense. (I read through the rest of the thread, but must have decoded "mark_object" as "mark_memory" based on the earlier message and the most recent message.) Thanks. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Changes that should go into version 24.4 2014-03-22 1:57 ` Daniel Colascione 2014-03-22 8:44 ` Eli Zaretskii 2014-03-22 9:08 ` Eli Zaretskii @ 2014-03-22 23:57 ` Richard Stallman 2014-03-23 1:58 ` GC bug investigation Daniel Colascione 2014-03-23 3:57 ` Changes that should go into version 24.4 Eli Zaretskii 2 siblings, 2 replies; 22+ messages in thread From: Richard Stallman @ 2014-03-22 23:57 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Richard, it would be very helpful if you could provide either a recipe for reproducing your crash I agree, it would be very helpful if I could. But I can't. or an actual crash dump (not your paraphrasing of the stack trace). If someone tells me a GDB command to make one, maybe I can do so. It would be many megabytes and contain my private email, so I would hesitate to show it to anyone. And I don't think it would be useful. I don't think any more information can be extracted at the time it crashes. -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. Use Ekiga or an ordinary phone call. ^ permalink raw reply [flat|nested] 22+ messages in thread
* GC bug investigation 2014-03-22 23:57 ` Richard Stallman @ 2014-03-23 1:58 ` Daniel Colascione 2014-03-23 2:13 ` Daniel Colascione 2014-03-23 14:57 ` Richard Stallman 2014-03-23 3:57 ` Changes that should go into version 24.4 Eli Zaretskii 1 sibling, 2 replies; 22+ messages in thread From: Daniel Colascione @ 2014-03-23 1:58 UTC (permalink / raw) To: rms; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2748 bytes --] On 03/22/2014 04:57 PM, Richard Stallman wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Richard, it would be very helpful if you could provide either a recipe > for reproducing your crash > > I agree, it would be very helpful if I could. But I can't. > > or an actual crash dump (not your > paraphrasing of the stack trace). > > If someone tells me a GDB command to make one, maybe I can do so. As Eli mentioned, you can use the "gcore" gdb command. > hesitate to show it to anyone. And I don't think it would be useful. I understand; I'd also be hesitant to share a dump. But being able to instruct you to examine the dump in various ways would be very useful, especially if we add debug instrumentation. > I don't think any more information can be extracted at the time > it crashes. Details of the objects on the path might be useful. In prior messages about this bug, you focus on stack slots. I don't think that's useful, as a conservative GC ought to operate properly using arbitrary inputs as temporary roots. I want to know exactly where we crash and in what manner, as I explained on another thread. For clarity: you mention "[the crash was in] mark_object called from mark_vectorlike called from mark_object called from mark_object (marking that symbol)." I interpret this text as meaning "some instruction in mark_object faulted", with the top of the execution stack looking like this: mark_object(A) mark_vectorlike(B) mark_object(B) mark_object(clear-transient-map) B here is clear-transient-map's function cell, right? You're saying you saw that it's a pseudovector that safe_debug_print reports as INVALID_LISP_OBJECT, probably because live_vector_p returns 0. That we're reaching B at all indicates that it shouldn't be dead. clear-transient-map isn't dead either, although double-checking would be nice. That's why the symbol_free_list->function = Vdead code did nothing. B must have been made dead *before* being assigned to clear-transient-map's function cell. Looking at the bytecode in set-transient-map, though, I don't see how that's possible. Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS? I don't think that writing code that aborts or breaks when a particular vector is freed will be very helpful; we'll hit that code in normal operation too. Instead, it'll probably be more useful to print a backtrace (using emacs_backtrace) each time we see that vectorlike freed, then look at the last backtrace before the GC crash. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 1:58 ` GC bug investigation Daniel Colascione @ 2014-03-23 2:13 ` Daniel Colascione 2014-03-23 14:56 ` Richard Stallman 2014-03-23 14:57 ` Richard Stallman 1 sibling, 1 reply; 22+ messages in thread From: Daniel Colascione @ 2014-03-23 2:13 UTC (permalink / raw) To: rms; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 239 bytes --] On 03/22/2014 06:58 PM, Daniel Colascione wrote: > Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS? Also, since building a whole cross-compiler is a pain, can you provide the disassembly of your mips64el Ffset? [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 2:13 ` Daniel Colascione @ 2014-03-23 14:56 ` Richard Stallman 0 siblings, 0 replies; 22+ messages in thread From: Richard Stallman @ 2014-03-23 14:56 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Here's Ffset. 0x6fe7cc <Ffset>: addiu sp,sp,-40 0x6fe7d0 <Ffset+4>: sw ra,36(sp) 0x6fe7d4 <Ffset+8>: sw s8,32(sp) 0x6fe7d8 <Ffset+12>: sw s1,28(sp) 0x6fe7dc <Ffset+16>: sw s0,24(sp) 0x6fe7e0 <Ffset+20>: move s8,sp 0x6fe7e4 <Ffset+24>: lui gp,0xa6 0x6fe7e8 <Ffset+28>: addiu gp,gp,-208 0x6fe7ec <Ffset+32>: sw gp,16(sp) 0x6fe7f0 <Ffset+36>: move s0,a0 0x6fe7f4 <Ffset+40>: sw a1,44(s8) 0x6fe7f8 <Ffset+44>: move v0,s0 0x6fe7fc <Ffset+48>: andi v1,v0,0x7 0x6fe800 <Ffset+52>: li v0,2 0x6fe804 <Ffset+56>: beq v1,v0,0x6fe82c <Ffset+96> 0x6fe808 <Ffset+60>: move at,at 0x6fe80c <Ffset+64>: lw v0,-19584(gp) 0x6fe810 <Ffset+68>: move at,at 0x6fe814 <Ffset+72>: lw v0,0(v0) 0x6fe818 <Ffset+76>: move at,at 0x6fe81c <Ffset+80>: move a0,v0 0x6fe820 <Ffset+84>: move a1,s0 (gdb) 0x6fe824 <Ffset+88>: jal 0x6fcad4 <wrong_type_argument> 0x6fe828 <Ffset+92>: move at,at 0x6fe82c <Ffset+96>: addiu v0,s0,-2 0x6fe830 <Ffset+100>: lw s1,12(v0) 0x6fe834 <Ffset+104>: lw v0,-30324(gp) 0x6fe838 <Ffset+108>: move at,at 0x6fe83c <Ffset+112>: lw v1,0(v0) 0x6fe840 <Ffset+116>: lw v0,-31872(gp) 0x6fe844 <Ffset+120>: move at,at 0x6fe848 <Ffset+124>: lw v0,0(v0) 0x6fe84c <Ffset+128>: move at,at 0x6fe850 <Ffset+132>: beq v1,v0,0x6fe8d0 <Ffset+260> 0x6fe854 <Ffset+136>: move at,at 0x6fe858 <Ffset+140>: lw v0,-31872(gp) 0x6fe85c <Ffset+144>: move at,at 0x6fe860 <Ffset+148>: lw v0,0(v0) 0x6fe864 <Ffset+152>: move at,at 0x6fe868 <Ffset+156>: beq s1,v0,0x6fe8d0 <Ffset+260> 0x6fe86c <Ffset+160>: move at,at 0x6fe870 <Ffset+164>: move a0,s0 (gdb) 0x6fe874 <Ffset+168>: move a1,s1 0x6fe878 <Ffset+172>: lw v0,-32632(gp) 0x6fe87c <Ffset+176>: move at,at 0x6fe880 <Ffset+180>: move t9,v0 0x6fe884 <Ffset+184>: jalr t9 0x6fe888 <Ffset+188>: move at,at 0x6fe88c <Ffset+192>: lw gp,16(s8) 0x6fe890 <Ffset+196>: move v1,v0 0x6fe894 <Ffset+200>: lw v0,-30324(gp) 0x6fe898 <Ffset+204>: move at,at 0x6fe89c <Ffset+208>: lw v0,0(v0) 0x6fe8a0 <Ffset+212>: move a0,v1 0x6fe8a4 <Ffset+216>: move a1,v0 0x6fe8a8 <Ffset+220>: lw v0,-32632(gp) 0x6fe8ac <Ffset+224>: move at,at 0x6fe8b0 <Ffset+228>: move t9,v0 0x6fe8b4 <Ffset+232>: jalr t9 0x6fe8b8 <Ffset+236>: move at,at 0x6fe8bc <Ffset+240>: lw gp,16(s8) 0x6fe8c0 <Ffset+244>: move v1,v0 0x6fe8c4 <Ffset+248>: lw v0,-30324(gp) 0x6fe8c8 <Ffset+252>: move at,at 0x6fe8cc <Ffset+256>: sw v1,0(v0) 0x6fe8d0 <Ffset+260>: move a0,s1 0x6fe8d4 <Ffset+264>: lw v0,-20008(gp) 0x6fe8d8 <Ffset+268>: move at,at 0x6fe8dc <Ffset+272>: move t9,v0 0x6fe8e0 <Ffset+276>: jalr t9 0x6fe8e4 <Ffset+280>: move at,at 0x6fe8e8 <Ffset+284>: lw gp,16(s8) 0x6fe8ec <Ffset+288>: beqz v0,0x6fe92c <Ffset+352> 0x6fe8f0 <Ffset+292>: move at,at 0x6fe8f4 <Ffset+296>: lw v0,-30104(gp) 0x6fe8f8 <Ffset+300>: move at,at 0x6fe8fc <Ffset+304>: lw v1,0(v0) 0x6fe900 <Ffset+308>: addiu v0,s1,-6 0x6fe904 <Ffset+312>: lw v0,4(v0) 0x6fe908 <Ffset+316>: move a0,s0 0x6fe90c <Ffset+320>: move a1,v1 0x6fe910 <Ffset+324>: move a2,v0 0x6fe914 <Ffset+328>: lw v0,-32292(gp) 0x6fe918 <Ffset+332>: move at,at 0x6fe91c <Ffset+336>: move t9,v0 0x6fe920 <Ffset+340>: jalr t9 0x6fe924 <Ffset+344>: move at,at 0x6fe928 <Ffset+348>: lw gp,16(s8) 0x6fe92c <Ffset+352>: move a0,s0 0x6fe930 <Ffset+356>: lw a1,44(s8) 0x6fe934 <Ffset+360>: lw v0,-20808(gp) 0x6fe938 <Ffset+364>: move at,at 0x6fe93c <Ffset+368>: move t9,v0 0x6fe940 <Ffset+372>: jalr t9 0x6fe944 <Ffset+376>: move at,at 0x6fe948 <Ffset+380>: lw gp,16(s8) 0x6fe94c <Ffset+384>: lw v0,44(s8) 0x6fe950 <Ffset+388>: move sp,s8 0x6fe954 <Ffset+392>: lw ra,36(sp) 0x6fe958 <Ffset+396>: lw s8,32(sp) 0x6fe95c <Ffset+400>: lw s1,28(sp) 0x6fe960 <Ffset+404>: lw s0,24(sp) 0x6fe964 <Ffset+408>: addiu sp,sp,40 0x6fe968 <Ffset+412>: jr ra 0x6fe96c <Ffset+416>: move at,at -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. Use Ekiga or an ordinary phone call. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 1:58 ` GC bug investigation Daniel Colascione 2014-03-23 2:13 ` Daniel Colascione @ 2014-03-23 14:57 ` Richard Stallman 2014-03-23 15:15 ` David Kastrup ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Richard Stallman @ 2014-03-23 14:57 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Details of the objects on the path might be useful. I don't understand "on the path". mark_object(A) mark_vectorlike(B) mark_object(B) mark_object(clear-transient-map) Right. B here is clear-transient-map's function cell, right? You're saying you saw that it's a pseudovector that safe_debug_print reports as INVALID_LISP_OBJECT, probably because live_vector_p returns 0. Yes. That we're reaching B at all indicates that it shouldn't be dead. I guess so. This is the mysterious part. B must have been made dead *before* being assigned to clear-transient-map's function cell. Looking at the bytecode in set-transient-map, though, I don't see how that's possible. I don't think that's what happened. If it were that, we would see crashes when that code tries to _use_ the value legitimately. clear-transient-map isn't dead either, It has not been freed, it seems, but it may be garbage. It is being marked through a spurious pointer randomly hanging around in a stack slot for something else. We don't know that there is any real pointer to it. I don't think that writing code that aborts or breaks when a particular vector is freed will be very helpful; we'll hit that code in normal operation too. Instead, it'll probably be more useful to print a backtrace (using emacs_backtrace) each time we see that vectorlike freed, then look at the last backtrace before the GC crash. Maybe you are right. Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS? I can, but it would be a big pain. It takes many hours to recompile Emacs on this machine. What would it tell us? It would confirm that the vectorlike was freed, perhaps, but do we doubt that? If that hassle is likely to solve the problem, I'll do it, but I'd rather not go to that hassle just to confirm what we know. -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. Use Ekiga or an ordinary phone call. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 14:57 ` Richard Stallman @ 2014-03-23 15:15 ` David Kastrup 2014-03-24 15:01 ` Richard Stallman 2014-03-23 15:22 ` Daniel Colascione 2014-03-23 16:20 ` Eli Zaretskii 2 siblings, 1 reply; 22+ messages in thread From: David Kastrup @ 2014-03-23 15:15 UTC (permalink / raw) To: emacs-devel Richard Stallman <rms@gnu.org> writes: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Details of the objects on the path might be useful. > > I don't understand "on the path". > > mark_object(A) > mark_vectorlike(B) > mark_object(B) > mark_object(clear-transient-map) > > Right. > > B here is clear-transient-map's function cell, right? You're saying you > saw that it's a pseudovector that safe_debug_print reports as > INVALID_LISP_OBJECT, probably because live_vector_p returns 0. > > Yes. > > That > we're reaching B at all indicates that it shouldn't be dead. > > I guess so. This is the mysterious part. I may be missing something here, but I thought that Emacs was using a _conservative_ garbage collector by default. That means that arbitrary garbage may mistakenly be considered as being in-use because some integer on the stack is misinterpreted as a pointer to it. > I don't think that's what happened. If it were that, we would see > crashes when that code tries to _use_ the value legitimately. > > clear-transient-map isn't dead either, > > It has not been freed, it seems, but it may be garbage. > > It is being marked through a spurious pointer randomly hanging around > in a stack slot for something else. We don't know that there is any > real pointer to it. If that is the case, then any code supposed to work in conjunction with a conservative garbage collector has to able to deal with it. -- David Kastrup ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 15:15 ` David Kastrup @ 2014-03-24 15:01 ` Richard Stallman 0 siblings, 0 replies; 22+ messages in thread From: Richard Stallman @ 2014-03-24 15:01 UTC (permalink / raw) To: David Kastrup; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > That > we're reaching B at all indicates that it shouldn't be dead. > > I guess so. This is the mysterious part. I may be missing something here, but I thought that Emacs was using a _conservative_ garbage collector by default. That means that arbitrary garbage may mistakenly be considered as being in-use because some integer on the stack is misinterpreted as a pointer to it. That is true, but it's a different question. > It is being marked through a spurious pointer randomly hanging around > in a stack slot for something else. We don't know that there is any > real pointer to it. If that is the case, then any code supposed to work in conjunction with a conservative garbage collector has to able to deal with it. Right. The point is, if that symbol was never collected, how did the vector in its function cell get collected? -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. Use Ekiga or an ordinary phone call. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 14:57 ` Richard Stallman 2014-03-23 15:15 ` David Kastrup @ 2014-03-23 15:22 ` Daniel Colascione 2014-03-23 16:14 ` Andreas Schwab 2014-03-24 15:01 ` Richard Stallman 2014-03-23 16:20 ` Eli Zaretskii 2 siblings, 2 replies; 22+ messages in thread From: Daniel Colascione @ 2014-03-23 15:22 UTC (permalink / raw) To: rms; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 4287 bytes --] On 03/23/2014 07:57 AM, Richard Stallman wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Details of the objects on the path might be useful. > > I don't understand "on the path". > > mark_object(A) > mark_vectorlike(B) > mark_object(B) > mark_object(clear-transient-map) > > Right. > > B here is clear-transient-map's function cell, right? You're saying you > saw that it's a pseudovector that safe_debug_print reports as > INVALID_LISP_OBJECT, probably because live_vector_p returns 0. > > Yes. > > That > we're reaching B at all indicates that it shouldn't be dead. > > I guess so. This is the mysterious part. > > B must have been made dead *before* being assigned to > clear-transient-map's function cell. Looking at the bytecode in > set-transient-map, though, I don't see how that's possible. > > I don't think that's what happened. If it were that, we would > see crashes when that code tries to _use_ the value legitimately. ...unless we're GCing before the value is used. Keep in mind that we'll only try to use the value before the next command runs. It sounds far-fetched, but I don't have a better idea. > > clear-transient-map isn't dead either, > > It has not been freed, it seems, but it may be garbage. > > It is being marked through a spurious pointer randomly hanging around > in a stack slot for something else. We don't know that there is any > real pointer to it. Conservative GC is designed to cope with occasional stray pointers into the GC heap. That we're somehow finding a pointer to the symbol is not the problem. mark_maybe_pointer marks an object at an address only if mem_find() and live_XXX_p() indicate that the address holds a live object. Now, it's conceivable that there might be a bug in the liveness detection, but if there were, I'd expect to see it manifest much more frequently and on many more platforms. Collecting garbage is pretty much the main thing Emacs does. :-) Besides: looking at the commits during the range you gave, I don't see anything that might suggest that we broke the GC itself. That's why I'm curious about Ffset: if there's a window between the time the function object is created and the time it's assigned to the symbol's function cell during which time the function value isn't reachable from a GC root, then it's possible that we're occasionally GCing during that period, freeing the function object, then assigning it to the symbol's function slot. The only place I can imagine that happening is inside FFset. The GC code *should* be spilling all non-volatile registers onto the stack for examination, but I imagine the MIPS version of this code is lightly tested. Maybe unrelated code changes triggered some kind of code rearrangement that made it more likely to encounter this condition. Anyway, if, when we crash, we're able to see the stack captured at the last time that vector was freed, we should have a much better idea of what's going on. I can work on adding that instrumentation. > > I don't think that writing code that aborts or breaks when a particular > vector is freed will be very helpful; we'll hit that code in normal > operation too. Instead, it'll probably be more useful to print a > backtrace (using emacs_backtrace) each time we see that vectorlike > freed, then look at the last backtrace before the GC crash. > > Maybe you are right. > > Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS? > > I can, but it would be a big pain. It takes many hours to recompile > Emacs on this machine. Cross-compile? > What would it tell us? It would confirm that the vectorlike was freed, > perhaps, but do we doubt that? I doubt everything here. > If that hassle is likely to solve the problem, I'll do it, > but I'd rather not go to that hassle just to confirm what we know. If we can combine that recompilation with some other debugging instrumentation, the hassle will be worthwhile. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 15:22 ` Daniel Colascione @ 2014-03-23 16:14 ` Andreas Schwab 2014-03-24 15:01 ` Richard Stallman 1 sibling, 0 replies; 22+ messages in thread From: Andreas Schwab @ 2014-03-23 16:14 UTC (permalink / raw) To: Daniel Colascione; +Cc: rms, emacs-devel Daniel Colascione <dancol@dancol.org> writes: > That's why I'm curious about Ffset: if there's a window between the time > the function object is created and the time it's assigned to the > symbol's function cell during which time the function value isn't > reachable from a GC root, then it's possible that we're occasionally > GCing during that period, fset cannot GC. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 15:22 ` Daniel Colascione 2014-03-23 16:14 ` Andreas Schwab @ 2014-03-24 15:01 ` Richard Stallman 1 sibling, 0 replies; 22+ messages in thread From: Richard Stallman @ 2014-03-24 15:01 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I can, but it would be a big pain. It takes many hours to recompile > Emacs on this machine. Cross-compile? Sorry, I have no machine to do that with (and I don't know how anyway). > If that hassle is likely to solve the problem, I'll do it, > but I'd rather not go to that hassle just to confirm what we know. If we can combine that recompilation with some other debugging instrumentation, the hassle will be worthwhile. I will get to it sooner or later. Right no I am trying to file my income tax and prepare to fly out. -- Dr Richard Stallman President, Free Software Foundation 51 Franklin St Boston MA 02110 USA www.fsf.org www.gnu.org Skype: No way! That's nonfree (freedom-denying) software. Use Ekiga or an ordinary phone call. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: GC bug investigation 2014-03-23 14:57 ` Richard Stallman 2014-03-23 15:15 ` David Kastrup 2014-03-23 15:22 ` Daniel Colascione @ 2014-03-23 16:20 ` Eli Zaretskii 2 siblings, 0 replies; 22+ messages in thread From: Eli Zaretskii @ 2014-03-23 16:20 UTC (permalink / raw) To: rms; +Cc: dancol, emacs-devel > Date: Sun, 23 Mar 2014 10:57:34 -0400 > From: Richard Stallman <rms@gnu.org> > Cc: emacs-devel@gnu.org > > Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS? > > I can, but it would be a big pain. It takes many hours to recompile > Emacs on this machine. This macro affects only alloc.c, so you need only recompile that one file to get the effect. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Changes that should go into version 24.4 2014-03-22 23:57 ` Richard Stallman 2014-03-23 1:58 ` GC bug investigation Daniel Colascione @ 2014-03-23 3:57 ` Eli Zaretskii 1 sibling, 0 replies; 22+ messages in thread From: Eli Zaretskii @ 2014-03-23 3:57 UTC (permalink / raw) To: rms; +Cc: dancol, emacs-devel > Date: Sat, 22 Mar 2014 19:57:36 -0400 > From: Richard Stallman <rms@gnu.org> > Cc: emacs-devel@gnu.org > > or an actual crash dump (not your > paraphrasing of the stack trace). > > If someone tells me a GDB command to make one, maybe I can do so. The GDB command is "gcore". ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-03-24 15:01 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-22 1:47 Changes that should go into version 24.4 Richard Stallman 2014-03-22 1:57 ` Daniel Colascione 2014-03-22 8:44 ` Eli Zaretskii 2014-03-22 8:50 ` Daniel Colascione 2014-03-22 9:24 ` Eli Zaretskii 2014-03-22 14:03 ` bug#15688: " Stefan 2014-03-22 14:53 ` Eli Zaretskii 2014-03-22 23:57 ` Richard Stallman 2014-03-22 9:08 ` Eli Zaretskii 2014-03-22 9:15 ` Daniel Colascione 2014-03-22 23:57 ` Richard Stallman 2014-03-23 1:58 ` GC bug investigation Daniel Colascione 2014-03-23 2:13 ` Daniel Colascione 2014-03-23 14:56 ` Richard Stallman 2014-03-23 14:57 ` Richard Stallman 2014-03-23 15:15 ` David Kastrup 2014-03-24 15:01 ` Richard Stallman 2014-03-23 15:22 ` Daniel Colascione 2014-03-23 16:14 ` Andreas Schwab 2014-03-24 15:01 ` Richard Stallman 2014-03-23 16:20 ` Eli Zaretskii 2014-03-23 3:57 ` Changes that should go into version 24.4 Eli Zaretskii
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.