unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Changes that should go into version 24.4
@ 2014-03-22  1:47 Richard Stallman
  2014-03-22  1:57 ` Daniel Colascione
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Stallman @ 2014-03-22  1:47 UTC (permalink / raw)
  To: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

I committed my changes that were waiting, then realized
that two of them, in subr.el and battery.el, should probably
go in the 24.4 release too.

Would someone please put them in?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Changes that should go into version 24.4
  2014-03-22  1:47 Changes that should go into version 24.4 Richard Stallman
@ 2014-03-22  1:57 ` Daniel Colascione
  2014-03-22  8:44   ` Eli Zaretskii
                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Daniel Colascione @ 2014-03-22  1:57 UTC (permalink / raw)
  To: rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1270 bytes --]

On 03/21/2014 06:47 PM, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> 
> I committed my changes that were waiting, then realized
> that two of them, in subr.el and battery.el, should probably
> go in the 24.4 release too.
> 
> Would someone please put them in?

I can't speak to the battery.el change, but the subr.el one should be in
neither branch. It papers over a release-blocking bug. We shouldn't
release 24.4 until we've figured out why the hell the GC randomly
crashes. You can't even be sure that your lisp hack even fixes the problem.

Richard, it would be very helpful if you could provide either a recipe
for reproducing your crash or an actual crash dump (not your
paraphrasing of the stack trace).

Specifically, you've mentioned that the crash happens in mark_memory.
*Where* in mark_memory? What instruction? It doesn't make sense that
we'd fault accessing a stack slot on an active frame: doing so might
corrupt something later, sure, but that stack location is valid and
touching it isn't going to cause an immediate SIGSEGV.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Changes that should go into version 24.4
  2014-03-22  1:57 ` Daniel Colascione
@ 2014-03-22  8:44   ` Eli Zaretskii
  2014-03-22  8:50     ` Daniel Colascione
  2014-03-22  9:08   ` Eli Zaretskii
  2014-03-22 23:57   ` Richard Stallman
  2 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2014-03-22  8:44 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: rms, emacs-devel

> Date: Fri, 21 Mar 2014 18:57:03 -0700
> From: Daniel Colascione <dancol@dancol.org>
> 
> the subr.el one should be in neither branch. It papers over a
> release-blocking bug. We shouldn't release 24.4 until we've figured
> out why the hell the GC randomly crashes. You can't even be sure
> that your lisp hack even fixes the problem.

Since (evidently) no one is actively works on fixing that bug, I see
no reasons to punish people who run the trunk codebase by imposing on
them random crashes they cannot recover from.  As long as the bug
remains open, we didn't forget about it, and will fix it eventually;
in the meantime, let users have one less reason for crashes.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Changes that should go into version 24.4
  2014-03-22  8:44   ` Eli Zaretskii
@ 2014-03-22  8:50     ` Daniel Colascione
  2014-03-22  9:24       ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Colascione @ 2014-03-22  8:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]

On 03/22/2014 01:44 AM, Eli Zaretskii wrote:
>> Date: Fri, 21 Mar 2014 18:57:03 -0700
>> From: Daniel Colascione <dancol@dancol.org>
>>
>> the subr.el one should be in neither branch. It papers over a
>> release-blocking bug. We shouldn't release 24.4 until we've figured
>> out why the hell the GC randomly crashes. You can't even be sure
>> that your lisp hack even fixes the problem.
> 
> Since (evidently) no one is actively works on fixing that bug,

It's not that nobody's working on it --- it's that there's not enough
information to make progress. The crash happens sporadically.

> I see
> no reasons to punish people who run the trunk codebase by imposing on
> them random crashes they cannot recover from.  As long as the bug
> remains open, we didn't forget about it, and will fix it eventually;
> in the meantime, let users have one less reason for crashes.

If this were a normal bug, I'd agree completely --- but this bug is one
we can't reproduce. (I've tried.) The more people see this problem, the
greater the chance we'll get the information we need to actually fix its
underlying cause.

Without a reliable repro, what alternative would you suggest?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Changes that should go into version 24.4
  2014-03-22  1:57 ` Daniel Colascione
  2014-03-22  8:44   ` Eli Zaretskii
@ 2014-03-22  9:08   ` Eli Zaretskii
  2014-03-22  9:15     ` Daniel Colascione
  2014-03-22 23:57   ` Richard Stallman
  2 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2014-03-22  9:08 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: rms, emacs-devel

> Date: Fri, 21 Mar 2014 18:57:03 -0700
> From: Daniel Colascione <dancol@dancol.org>
> 
> It doesn't make sense that we'd fault accessing a stack slot on an
> active frame: doing so might corrupt something later, sure, but that
> stack location is valid and touching it isn't going to cause an
> immediate SIGSEGV.

Crashes in mark_object usually have nothing to do with accessing a
stack slot per se.  mark_object looks at the object type, and then
extracts a pointer to a C structure from it, and proceeds treating
that pointer as a valid pointer to a valid structure of that type.  If
pointer it extracts is invalid, or points to something that is not a C
struct of the type mark_object expects, we will segfault trying to
interpret those.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Changes that should go into version 24.4
  2014-03-22  9:08   ` Eli Zaretskii
@ 2014-03-22  9:15     ` Daniel Colascione
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Colascione @ 2014-03-22  9:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]

On 03/22/2014 02:08 AM, Eli Zaretskii wrote:
>> Date: Fri, 21 Mar 2014 18:57:03 -0700
>> From: Daniel Colascione <dancol@dancol.org>
>>
>> It doesn't make sense that we'd fault accessing a stack slot on an
>> active frame: doing so might corrupt something later, sure, but that
>> stack location is valid and touching it isn't going to cause an
>> immediate SIGSEGV.
> 
> Crashes in mark_object usually have nothing to do with accessing a
> stack slot per se.  mark_object looks at the object type, and then
> extracts a pointer to a C structure from it, and proceeds treating
> that pointer as a valid pointer to a valid structure of that type.  If
> pointer it extracts is invalid, or points to something that is not a C
> struct of the type mark_object expects, we will segfault trying to
> interpret those.
> 

Ah, yes. I was reading the message about the crash occurring "when
mark_stack calls mark_memory". mark_object makes a lot more sense. (I
read through the rest of the thread, but must have decoded "mark_object"
as "mark_memory" based on the earlier message and the most recent
message.) Thanks.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Changes that should go into version 24.4
  2014-03-22  8:50     ` Daniel Colascione
@ 2014-03-22  9:24       ` Eli Zaretskii
  0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-03-22  9:24 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: rms, emacs-devel

> Date: Sat, 22 Mar 2014 01:50:23 -0700
> From: Daniel Colascione <dancol@dancol.org>
> CC: rms@gnu.org, emacs-devel@gnu.org
> 
> > Since (evidently) no one is actively works on fixing that bug,
> 
> It's not that nobody's working on it --- it's that there's not enough
> information to make progress. The crash happens sporadically.

Working on a bug includes adding debugging code that would help
collecting the missing information.  AFAICS, no one is doing that,
either.

> > I see
> > no reasons to punish people who run the trunk codebase by imposing on
> > them random crashes they cannot recover from.  As long as the bug
> > remains open, we didn't forget about it, and will fix it eventually;
> > in the meantime, let users have one less reason for crashes.
> 
> If this were a normal bug, I'd agree completely --- but this bug is one
> we can't reproduce. (I've tried.) The more people see this problem, the
> greater the chance we'll get the information we need to actually fix its
> underlying cause.

It is unreasonable to use others as guinea pigs in such cases, IMO.

Also, we have already several other reports about GC-related crashes,
see the list in bug #16901
(http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16901#32).  Not sure if
those are related, but if they aren't, how can we explain that only
Richard experiences these problems?

Perhaps searching the debbugs reports about any crashes in GC will
reveal other potential candidates that are related to the same bug?

> Without a reliable repro, what alternative would you suggest?

Richard reported that the crashes started when he updated his branch
not later than Sep 22, and that his previous update was around Aug 18.
We could start by scrutinizing relevant changes between Aug 18 and Sep
22.

In http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15688#107, Stefan
described some insight on the problem, and Richard suggested that
someone writes debugging code to detect the situation that apparently
is a precursor to the crash.  No one wrote such a code, AFAIK; perhaps
we should.  (I don't understand what Stefan said enough to do this,
but maybe someone else does.)

I also agree that having a core file from the crash might help,
although we shouldn't have our expectations about that too high, since
such core files are only good for determining which Lisp object caused
it, and Richard already found out and described that.  The efforts now
should be to understand how does that object get corrupted, which is
not something a core file from GC would normally help with.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Changes that should go into version 24.4
  2014-03-22  1:57 ` Daniel Colascione
  2014-03-22  8:44   ` Eli Zaretskii
  2014-03-22  9:08   ` Eli Zaretskii
@ 2014-03-22 23:57   ` Richard Stallman
  2014-03-23  1:58     ` GC bug investigation Daniel Colascione
  2014-03-23  3:57     ` Changes that should go into version 24.4 Eli Zaretskii
  2 siblings, 2 replies; 19+ messages in thread
From: Richard Stallman @ 2014-03-22 23:57 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    Richard, it would be very helpful if you could provide either a recipe
    for reproducing your crash 

I agree, it would be very helpful if I could.  But I can't.

    or an actual crash dump (not your
    paraphrasing of the stack trace).

If someone tells me a GDB command to make one, maybe I can do so.
It would be many megabytes and contain my private email, so I would
hesitate to show it to anyone.  And I don't think it would be useful.
I don't think any more information can be extracted at the time
it crashes.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* GC bug investigation
  2014-03-22 23:57   ` Richard Stallman
@ 2014-03-23  1:58     ` Daniel Colascione
  2014-03-23  2:13       ` Daniel Colascione
  2014-03-23 14:57       ` Richard Stallman
  2014-03-23  3:57     ` Changes that should go into version 24.4 Eli Zaretskii
  1 sibling, 2 replies; 19+ messages in thread
From: Daniel Colascione @ 2014-03-23  1:58 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2748 bytes --]

On 03/22/2014 04:57 PM, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> 
>     Richard, it would be very helpful if you could provide either a recipe
>     for reproducing your crash 
> 
> I agree, it would be very helpful if I could.  But I can't.
> 
>     or an actual crash dump (not your
>     paraphrasing of the stack trace).
> 
> If someone tells me a GDB command to make one, maybe I can do so.

As Eli mentioned, you can use the "gcore" gdb command.

> hesitate to show it to anyone.  And I don't think it would be useful.

I understand; I'd also be hesitant to share a dump. But being able to
instruct you to examine the dump in various ways would be very useful,
especially if we add debug instrumentation.

> I don't think any more information can be extracted at the time
> it crashes.

Details of the objects on the path might be useful. In prior messages
about this bug, you focus on stack slots. I don't think that's useful,
as a conservative GC ought to operate properly using arbitrary inputs as
temporary roots.  I want to know exactly where we crash and in what
manner, as I explained on another thread.

For clarity: you mention "[the crash was in] mark_object called from
mark_vectorlike called from mark_object called from mark_object (marking
that symbol)." I interpret this text as meaning "some instruction in
mark_object faulted", with the top of the execution stack looking like this:

mark_object(A)
mark_vectorlike(B)
mark_object(B)
mark_object(clear-transient-map)

B here is clear-transient-map's function cell, right? You're saying you
saw that it's a pseudovector that safe_debug_print reports as
INVALID_LISP_OBJECT, probably because live_vector_p returns 0. That
we're reaching B at all indicates that it shouldn't be dead.
clear-transient-map isn't dead either, although double-checking would be
nice. That's why the symbol_free_list->function = Vdead code did nothing.

B must have been made dead *before* being assigned to
clear-transient-map's function cell. Looking at the bytecode in
set-transient-map, though, I don't see how that's possible.

Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS?

I don't think that writing code that aborts or breaks when a particular
vector is freed will be very helpful; we'll hit that code in normal
operation too. Instead, it'll probably be more useful to print a
backtrace (using emacs_backtrace) each time we see that vectorlike
freed, then look at the last backtrace before the GC crash.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23  1:58     ` GC bug investigation Daniel Colascione
@ 2014-03-23  2:13       ` Daniel Colascione
  2014-03-23 14:56         ` Richard Stallman
  2014-03-23 14:57       ` Richard Stallman
  1 sibling, 1 reply; 19+ messages in thread
From: Daniel Colascione @ 2014-03-23  2:13 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 239 bytes --]

On 03/22/2014 06:58 PM, Daniel Colascione wrote:
> Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS?

Also, since building a whole cross-compiler is a pain, can you provide
the disassembly of your mips64el Ffset?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Changes that should go into version 24.4
  2014-03-22 23:57   ` Richard Stallman
  2014-03-23  1:58     ` GC bug investigation Daniel Colascione
@ 2014-03-23  3:57     ` Eli Zaretskii
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-03-23  3:57 UTC (permalink / raw)
  To: rms; +Cc: dancol, emacs-devel

> Date: Sat, 22 Mar 2014 19:57:36 -0400
> From: Richard Stallman <rms@gnu.org>
> Cc: emacs-devel@gnu.org
> 
>     or an actual crash dump (not your
>     paraphrasing of the stack trace).
> 
> If someone tells me a GDB command to make one, maybe I can do so.

The GDB command is "gcore".



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23  2:13       ` Daniel Colascione
@ 2014-03-23 14:56         ` Richard Stallman
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Stallman @ 2014-03-23 14:56 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

Here's Ffset.

0x6fe7cc <Ffset>:	addiu	sp,sp,-40
0x6fe7d0 <Ffset+4>:	sw	ra,36(sp)
0x6fe7d4 <Ffset+8>:	sw	s8,32(sp)
0x6fe7d8 <Ffset+12>:	sw	s1,28(sp)
0x6fe7dc <Ffset+16>:	sw	s0,24(sp)
0x6fe7e0 <Ffset+20>:	move	s8,sp
0x6fe7e4 <Ffset+24>:	lui	gp,0xa6
0x6fe7e8 <Ffset+28>:	addiu	gp,gp,-208
0x6fe7ec <Ffset+32>:	sw	gp,16(sp)
0x6fe7f0 <Ffset+36>:	move	s0,a0
0x6fe7f4 <Ffset+40>:	sw	a1,44(s8)
0x6fe7f8 <Ffset+44>:	move	v0,s0
0x6fe7fc <Ffset+48>:	andi	v1,v0,0x7
0x6fe800 <Ffset+52>:	li	v0,2
0x6fe804 <Ffset+56>:	beq	v1,v0,0x6fe82c <Ffset+96>
0x6fe808 <Ffset+60>:	move	at,at
0x6fe80c <Ffset+64>:	lw	v0,-19584(gp)
0x6fe810 <Ffset+68>:	move	at,at
0x6fe814 <Ffset+72>:	lw	v0,0(v0)
0x6fe818 <Ffset+76>:	move	at,at
0x6fe81c <Ffset+80>:	move	a0,v0
0x6fe820 <Ffset+84>:	move	a1,s0
(gdb) 
0x6fe824 <Ffset+88>:	jal	0x6fcad4 <wrong_type_argument>
0x6fe828 <Ffset+92>:	move	at,at
0x6fe82c <Ffset+96>:	addiu	v0,s0,-2
0x6fe830 <Ffset+100>:	lw	s1,12(v0)
0x6fe834 <Ffset+104>:	lw	v0,-30324(gp)
0x6fe838 <Ffset+108>:	move	at,at
0x6fe83c <Ffset+112>:	lw	v1,0(v0)
0x6fe840 <Ffset+116>:	lw	v0,-31872(gp)
0x6fe844 <Ffset+120>:	move	at,at
0x6fe848 <Ffset+124>:	lw	v0,0(v0)
0x6fe84c <Ffset+128>:	move	at,at
0x6fe850 <Ffset+132>:	beq	v1,v0,0x6fe8d0 <Ffset+260>
0x6fe854 <Ffset+136>:	move	at,at
0x6fe858 <Ffset+140>:	lw	v0,-31872(gp)
0x6fe85c <Ffset+144>:	move	at,at
0x6fe860 <Ffset+148>:	lw	v0,0(v0)
0x6fe864 <Ffset+152>:	move	at,at
0x6fe868 <Ffset+156>:	beq	s1,v0,0x6fe8d0 <Ffset+260>
0x6fe86c <Ffset+160>:	move	at,at
0x6fe870 <Ffset+164>:	move	a0,s0
(gdb) 
0x6fe874 <Ffset+168>:	move	a1,s1
0x6fe878 <Ffset+172>:	lw	v0,-32632(gp)
0x6fe87c <Ffset+176>:	move	at,at
0x6fe880 <Ffset+180>:	move	t9,v0
0x6fe884 <Ffset+184>:	jalr	t9
0x6fe888 <Ffset+188>:	move	at,at
0x6fe88c <Ffset+192>:	lw	gp,16(s8)
0x6fe890 <Ffset+196>:	move	v1,v0
0x6fe894 <Ffset+200>:	lw	v0,-30324(gp)
0x6fe898 <Ffset+204>:	move	at,at
0x6fe89c <Ffset+208>:	lw	v0,0(v0)
0x6fe8a0 <Ffset+212>:	move	a0,v1
0x6fe8a4 <Ffset+216>:	move	a1,v0
0x6fe8a8 <Ffset+220>:	lw	v0,-32632(gp)
0x6fe8ac <Ffset+224>:	move	at,at
0x6fe8b0 <Ffset+228>:	move	t9,v0
0x6fe8b4 <Ffset+232>:	jalr	t9
0x6fe8b8 <Ffset+236>:	move	at,at
0x6fe8bc <Ffset+240>:	lw	gp,16(s8)
0x6fe8c0 <Ffset+244>:	move	v1,v0
0x6fe8c4 <Ffset+248>:	lw	v0,-30324(gp)
0x6fe8c8 <Ffset+252>:	move	at,at
0x6fe8cc <Ffset+256>:	sw	v1,0(v0)
0x6fe8d0 <Ffset+260>:	move	a0,s1
0x6fe8d4 <Ffset+264>:	lw	v0,-20008(gp)
0x6fe8d8 <Ffset+268>:	move	at,at
0x6fe8dc <Ffset+272>:	move	t9,v0
0x6fe8e0 <Ffset+276>:	jalr	t9
0x6fe8e4 <Ffset+280>:	move	at,at
0x6fe8e8 <Ffset+284>:	lw	gp,16(s8)
0x6fe8ec <Ffset+288>:	beqz	v0,0x6fe92c <Ffset+352>
0x6fe8f0 <Ffset+292>:	move	at,at
0x6fe8f4 <Ffset+296>:	lw	v0,-30104(gp)
0x6fe8f8 <Ffset+300>:	move	at,at
0x6fe8fc <Ffset+304>:	lw	v1,0(v0)
0x6fe900 <Ffset+308>:	addiu	v0,s1,-6
0x6fe904 <Ffset+312>:	lw	v0,4(v0)
0x6fe908 <Ffset+316>:	move	a0,s0
0x6fe90c <Ffset+320>:	move	a1,v1
0x6fe910 <Ffset+324>:	move	a2,v0
0x6fe914 <Ffset+328>:	lw	v0,-32292(gp)
0x6fe918 <Ffset+332>:	move	at,at
0x6fe91c <Ffset+336>:	move	t9,v0
0x6fe920 <Ffset+340>:	jalr	t9
0x6fe924 <Ffset+344>:	move	at,at
0x6fe928 <Ffset+348>:	lw	gp,16(s8)
0x6fe92c <Ffset+352>:	move	a0,s0
0x6fe930 <Ffset+356>:	lw	a1,44(s8)
0x6fe934 <Ffset+360>:	lw	v0,-20808(gp)
0x6fe938 <Ffset+364>:	move	at,at
0x6fe93c <Ffset+368>:	move	t9,v0
0x6fe940 <Ffset+372>:	jalr	t9
0x6fe944 <Ffset+376>:	move	at,at
0x6fe948 <Ffset+380>:	lw	gp,16(s8)
0x6fe94c <Ffset+384>:	lw	v0,44(s8)
0x6fe950 <Ffset+388>:	move	sp,s8
0x6fe954 <Ffset+392>:	lw	ra,36(sp)
0x6fe958 <Ffset+396>:	lw	s8,32(sp)
0x6fe95c <Ffset+400>:	lw	s1,28(sp)
0x6fe960 <Ffset+404>:	lw	s0,24(sp)
0x6fe964 <Ffset+408>:	addiu	sp,sp,40
0x6fe968 <Ffset+412>:	jr	ra
0x6fe96c <Ffset+416>:	move	at,at

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23  1:58     ` GC bug investigation Daniel Colascione
  2014-03-23  2:13       ` Daniel Colascione
@ 2014-03-23 14:57       ` Richard Stallman
  2014-03-23 15:15         ` David Kastrup
                           ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Richard Stallman @ 2014-03-23 14:57 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    Details of the objects on the path might be useful.

I don't understand "on the path".

    mark_object(A)
    mark_vectorlike(B)
    mark_object(B)
    mark_object(clear-transient-map)

Right.

    B here is clear-transient-map's function cell, right? You're saying you
    saw that it's a pseudovector that safe_debug_print reports as
    INVALID_LISP_OBJECT, probably because live_vector_p returns 0.

Yes.

     That
    we're reaching B at all indicates that it shouldn't be dead.

I guess so.  This is the mysterious part.

    B must have been made dead *before* being assigned to
    clear-transient-map's function cell. Looking at the bytecode in
    set-transient-map, though, I don't see how that's possible.

I don't think that's what happened.  If it were that, we would
see crashes when that code tries to _use_ the value legitimately.

    clear-transient-map isn't dead either,

It has not been freed, it seems, but it may be garbage.

It is being marked through a spurious pointer randomly hanging around
in a stack slot for something else.  We don't know that there is any
real pointer to it.

    I don't think that writing code that aborts or breaks when a particular
    vector is freed will be very helpful; we'll hit that code in normal
    operation too. Instead, it'll probably be more useful to print a
    backtrace (using emacs_backtrace) each time we see that vectorlike
    freed, then look at the last backtrace before the GC crash.

Maybe you are right.

    Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS?

I can, but it would be a big pain.  It takes many hours to recompile
Emacs on this machine.

What would it tell us?  It would confirm that the vectorlike was freed,
perhaps, but do we doubt that?

If that hassle is likely to solve the problem, I'll do it,
but I'd rather not go to that hassle just to confirm what we know.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23 14:57       ` Richard Stallman
@ 2014-03-23 15:15         ` David Kastrup
  2014-03-24 15:01           ` Richard Stallman
  2014-03-23 15:22         ` Daniel Colascione
  2014-03-23 16:20         ` Eli Zaretskii
  2 siblings, 1 reply; 19+ messages in thread
From: David Kastrup @ 2014-03-23 15:15 UTC (permalink / raw)
  To: emacs-devel

Richard Stallman <rms@gnu.org> writes:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>     Details of the objects on the path might be useful.
>
> I don't understand "on the path".
>
>     mark_object(A)
>     mark_vectorlike(B)
>     mark_object(B)
>     mark_object(clear-transient-map)
>
> Right.
>
>     B here is clear-transient-map's function cell, right? You're saying you
>     saw that it's a pseudovector that safe_debug_print reports as
>     INVALID_LISP_OBJECT, probably because live_vector_p returns 0.
>
> Yes.
>
>      That
>     we're reaching B at all indicates that it shouldn't be dead.
>
> I guess so.  This is the mysterious part.

I may be missing something here, but I thought that Emacs was using a
_conservative_ garbage collector by default.  That means that arbitrary
garbage may mistakenly be considered as being in-use because some
integer on the stack is misinterpreted as a pointer to it.

> I don't think that's what happened.  If it were that, we would see
> crashes when that code tries to _use_ the value legitimately.
>
>     clear-transient-map isn't dead either,
>
> It has not been freed, it seems, but it may be garbage.
>
> It is being marked through a spurious pointer randomly hanging around
> in a stack slot for something else.  We don't know that there is any
> real pointer to it.

If that is the case, then any code supposed to work in conjunction with
a conservative garbage collector has to able to deal with it.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23 14:57       ` Richard Stallman
  2014-03-23 15:15         ` David Kastrup
@ 2014-03-23 15:22         ` Daniel Colascione
  2014-03-23 16:14           ` Andreas Schwab
  2014-03-24 15:01           ` Richard Stallman
  2014-03-23 16:20         ` Eli Zaretskii
  2 siblings, 2 replies; 19+ messages in thread
From: Daniel Colascione @ 2014-03-23 15:22 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4287 bytes --]

On 03/23/2014 07:57 AM, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> 
>     Details of the objects on the path might be useful.
> 
> I don't understand "on the path".
> 
>     mark_object(A)
>     mark_vectorlike(B)
>     mark_object(B)
>     mark_object(clear-transient-map)
> 
> Right.
> 
>     B here is clear-transient-map's function cell, right? You're saying you
>     saw that it's a pseudovector that safe_debug_print reports as
>     INVALID_LISP_OBJECT, probably because live_vector_p returns 0.
> 
> Yes.
> 
>      That
>     we're reaching B at all indicates that it shouldn't be dead.
> 
> I guess so.  This is the mysterious part.
> 
>     B must have been made dead *before* being assigned to
>     clear-transient-map's function cell. Looking at the bytecode in
>     set-transient-map, though, I don't see how that's possible.
> 
> I don't think that's what happened.  If it were that, we would
> see crashes when that code tries to _use_ the value legitimately.

...unless we're GCing before the value is used.  Keep in mind that we'll
only try to use the value before the next command runs. It sounds
far-fetched, but I don't have a better idea.

> 
>     clear-transient-map isn't dead either,
> 
> It has not been freed, it seems, but it may be garbage.
> 
> It is being marked through a spurious pointer randomly hanging around
> in a stack slot for something else.  We don't know that there is any
> real pointer to it.

Conservative GC is designed to cope with occasional stray pointers into
the GC heap. That we're somehow finding a pointer to the symbol is not
the problem. mark_maybe_pointer marks an object at an address only if
mem_find() and live_XXX_p() indicate that the address holds a live object.

Now, it's conceivable that there might be a bug in the liveness
detection, but if there were, I'd expect to see it manifest much more
frequently and on many more platforms. Collecting garbage is pretty much
the main thing Emacs does. :-)

Besides: looking at the commits during the range you gave, I don't see
anything that might suggest that we broke the GC itself.

That's why I'm curious about Ffset: if there's a window between the time
the function object is created and the time it's assigned to the
symbol's function cell during which time the function value isn't
reachable from a GC root, then it's possible that we're occasionally
GCing during that period, freeing the function object, then assigning it
to the symbol's function slot. The only place I can imagine that
happening is inside FFset. The GC code *should* be spilling all
non-volatile registers onto the stack for examination, but I imagine the
MIPS version of this code is lightly tested. Maybe unrelated code
changes triggered some kind of code rearrangement that made it more
likely to encounter this condition.

Anyway, if, when we crash, we're able to see the stack captured at the
last time that vector was freed, we should have a much better idea of
what's going on. I can work on adding that instrumentation.

> 
>     I don't think that writing code that aborts or breaks when a particular
>     vector is freed will be very helpful; we'll hit that code in normal
>     operation too. Instead, it'll probably be more useful to print a
>     backtrace (using emacs_backtrace) each time we see that vectorlike
>     freed, then look at the last backtrace before the GC crash.
> 
> Maybe you are right.
> 
>     Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS?
> 
> I can, but it would be a big pain.  It takes many hours to recompile
> Emacs on this machine.

Cross-compile?

> What would it tell us?  It would confirm that the vectorlike was freed,
> perhaps, but do we doubt that?

I doubt everything here.

> If that hassle is likely to solve the problem, I'll do it,
> but I'd rather not go to that hassle just to confirm what we know.

If we can combine that recompilation with some other debugging
instrumentation, the hassle will be worthwhile.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23 15:22         ` Daniel Colascione
@ 2014-03-23 16:14           ` Andreas Schwab
  2014-03-24 15:01           ` Richard Stallman
  1 sibling, 0 replies; 19+ messages in thread
From: Andreas Schwab @ 2014-03-23 16:14 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: rms, emacs-devel

Daniel Colascione <dancol@dancol.org> writes:

> That's why I'm curious about Ffset: if there's a window between the time
> the function object is created and the time it's assigned to the
> symbol's function cell during which time the function value isn't
> reachable from a GC root, then it's possible that we're occasionally
> GCing during that period,

fset cannot GC.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23 14:57       ` Richard Stallman
  2014-03-23 15:15         ` David Kastrup
  2014-03-23 15:22         ` Daniel Colascione
@ 2014-03-23 16:20         ` Eli Zaretskii
  2 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2014-03-23 16:20 UTC (permalink / raw)
  To: rms; +Cc: dancol, emacs-devel

> Date: Sun, 23 Mar 2014 10:57:34 -0400
> From: Richard Stallman <rms@gnu.org>
> Cc: emacs-devel@gnu.org
> 
>     Can you try running with -DGC_CHECK_MARKED_OBJECTS=1 in your CFLAGS?
> 
> I can, but it would be a big pain.  It takes many hours to recompile
> Emacs on this machine.

This macro affects only alloc.c, so you need only recompile that one
file to get the effect.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23 15:15         ` David Kastrup
@ 2014-03-24 15:01           ` Richard Stallman
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Stallman @ 2014-03-24 15:01 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    >      That
    >     we're reaching B at all indicates that it shouldn't be dead.
    >
    > I guess so.  This is the mysterious part.

    I may be missing something here, but I thought that Emacs was using a
    _conservative_ garbage collector by default.  That means that arbitrary
    garbage may mistakenly be considered as being in-use because some
    integer on the stack is misinterpreted as a pointer to it.

That is true, but it's a different question.

    > It is being marked through a spurious pointer randomly hanging around
    > in a stack slot for something else.  We don't know that there is any
    > real pointer to it.

    If that is the case, then any code supposed to work in conjunction with
    a conservative garbage collector has to able to deal with it.

Right.

The point is, if that symbol was never collected, how did 
the vector in its function cell get collected?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GC bug investigation
  2014-03-23 15:22         ` Daniel Colascione
  2014-03-23 16:14           ` Andreas Schwab
@ 2014-03-24 15:01           ` Richard Stallman
  1 sibling, 0 replies; 19+ messages in thread
From: Richard Stallman @ 2014-03-24 15:01 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > I can, but it would be a big pain.  It takes many hours to recompile
    > Emacs on this machine.

    Cross-compile?

Sorry, I have no machine to do that with (and I don't know how anyway).

    > If that hassle is likely to solve the problem, I'll do it,
    > but I'd rather not go to that hassle just to confirm what we know.

    If we can combine that recompilation with some other debugging
    instrumentation, the hassle will be worthwhile.

I will get to it sooner or later.  Right no I am trying to file
my income tax and prepare to fly out.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call.




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-03-24 15:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-22  1:47 Changes that should go into version 24.4 Richard Stallman
2014-03-22  1:57 ` Daniel Colascione
2014-03-22  8:44   ` Eli Zaretskii
2014-03-22  8:50     ` Daniel Colascione
2014-03-22  9:24       ` Eli Zaretskii
2014-03-22  9:08   ` Eli Zaretskii
2014-03-22  9:15     ` Daniel Colascione
2014-03-22 23:57   ` Richard Stallman
2014-03-23  1:58     ` GC bug investigation Daniel Colascione
2014-03-23  2:13       ` Daniel Colascione
2014-03-23 14:56         ` Richard Stallman
2014-03-23 14:57       ` Richard Stallman
2014-03-23 15:15         ` David Kastrup
2014-03-24 15:01           ` Richard Stallman
2014-03-23 15:22         ` Daniel Colascione
2014-03-23 16:14           ` Andreas Schwab
2014-03-24 15:01           ` Richard Stallman
2014-03-23 16:20         ` Eli Zaretskii
2014-03-23  3:57     ` Changes that should go into version 24.4 Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).