* 22.0.99 emacs dumper (?) problem
@ 2007-05-18 13:42 Chip Coldwell
2007-05-18 14:16 ` Neal Becker
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Chip Coldwell @ 2007-05-18 13:42 UTC (permalink / raw)
To: emacs-devel
I would like to bring the bug described here:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239344
to the attention of this mailing list. In a nutshell, what happens is
that an emacs binary built using the Fedora 7 build system will
seg-fault immediately when run on a Fedora 7 install. The stack trace
shows an assert failing in the glibc malloc code (usually it's
setlocale invoking malloc); it appears that something about the way
the dumper preserves the malloc state between the dumping emacs and
the dumped emacs isn't quite working.
Another theory that is being examined is that somewhere there is a
pointer to anonymous mmaped memory. It seems that prepending
MALLOC_MMAP_MAX_=0
to this line
LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
works, although there are some conflicting reports about this, also.
Anyway, if we could get some more eyes on this bug, it would probably
help a bunch.
Chip
--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-18 13:42 22.0.99 emacs dumper (?) problem Chip Coldwell
@ 2007-05-18 14:16 ` Neal Becker
2007-05-18 16:19 ` Chip Coldwell
2007-05-19 22:32 ` Richard Stallman
2007-05-21 3:26 ` Chong Yidong
2 siblings, 1 reply; 13+ messages in thread
From: Neal Becker @ 2007-05-18 14:16 UTC (permalink / raw)
To: emacs-devel
Chip Coldwell wrote:
>
> I would like to bring the bug described here:
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239344
>
> to the attention of this mailing list. In a nutshell, what happens is
> that an emacs binary built using the Fedora 7 build system will
> seg-fault immediately when run on a Fedora 7 install. The stack trace
> shows an assert failing in the glibc malloc code (usually it's
> setlocale invoking malloc); it appears that something about the way
> the dumper preserves the malloc state between the dumping emacs and
> the dumped emacs isn't quite working.
>
> Another theory that is being examined is that somewhere there is a
> pointer to anonymous mmaped memory. It seems that prepending
>
> MALLOC_MMAP_MAX_=0
>
> to this line
>
> LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
>
> works, although there are some conflicting reports about this, also.
>
> Anyway, if we could get some more eyes on this bug, it would probably
> help a bunch.
>
> Chip
>
This worked for me with emacs-23.0.0.1 until yesterday's update to glibc-2.6-1.
(No, I haven't rebuilt emacs today after the glibc-2.6-1 update)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-18 14:16 ` Neal Becker
@ 2007-05-18 16:19 ` Chip Coldwell
2007-05-18 23:00 ` Neal Becker
0 siblings, 1 reply; 13+ messages in thread
From: Chip Coldwell @ 2007-05-18 16:19 UTC (permalink / raw)
To: Neal Becker; +Cc: emacs-devel
On Fri, 18 May 2007, Neal Becker wrote:
> Chip Coldwell wrote:
>
> >
> > I would like to bring the bug described here:
> >
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239344
> >
> > to the attention of this mailing list. In a nutshell, what happens is
> > that an emacs binary built using the Fedora 7 build system will
> > seg-fault immediately when run on a Fedora 7 install. The stack trace
> > shows an assert failing in the glibc malloc code (usually it's
> > setlocale invoking malloc); it appears that something about the way
> > the dumper preserves the malloc state between the dumping emacs and
> > the dumped emacs isn't quite working.
> >
> > Another theory that is being examined is that somewhere there is a
> > pointer to anonymous mmaped memory. It seems that prepending
> >
> > MALLOC_MMAP_MAX_=0
> >
> > to this line
> >
> > LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
> >
> > works, although there are some conflicting reports about this, also.
> >
> > Anyway, if we could get some more eyes on this bug, it would probably
> > help a bunch.
> >
> > Chip
> >
>
> This worked for me with emacs-23.0.0.1 until yesterday's update to glibc-2.6-1.
> (No, I haven't rebuilt emacs today after the glibc-2.6-1 update)
Do you mean that dumping emacs-23.0.0.1 with "MALLOC_MMAP_MAX_=0"
built a binary that worked with glibc-2.5.90-22 but stopped working
after the update to glibc-2.6-1?
emacs-23 is the unicode2 branch, right? I would assume that the
malloc hooks, dumper, lisp allocator, etc are common to the two
branches.
Chip
--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-18 16:19 ` Chip Coldwell
@ 2007-05-18 23:00 ` Neal Becker
0 siblings, 0 replies; 13+ messages in thread
From: Neal Becker @ 2007-05-18 23:00 UTC (permalink / raw)
To: emacs-devel
Chip Coldwell wrote:
> On Fri, 18 May 2007, Neal Becker wrote:
>
>> Chip Coldwell wrote:
>>
>> >
>> > I would like to bring the bug described here:
>> >
>> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239344
>> >
>> > to the attention of this mailing list. In a nutshell, what happens is
>> > that an emacs binary built using the Fedora 7 build system will
>> > seg-fault immediately when run on a Fedora 7 install. The stack trace
>> > shows an assert failing in the glibc malloc code (usually it's
>> > setlocale invoking malloc); it appears that something about the way
>> > the dumper preserves the malloc state between the dumping emacs and
>> > the dumped emacs isn't quite working.
>> >
>> > Another theory that is being examined is that somewhere there is a
>> > pointer to anonymous mmaped memory. It seems that prepending
>> >
>> > MALLOC_MMAP_MAX_=0
>> >
>> > to this line
>> >
>> > LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
>> >
>> > works, although there are some conflicting reports about this, also.
>> >
>> > Anyway, if we could get some more eyes on this bug, it would probably
>> > help a bunch.
>> >
>> > Chip
>> >
>>
>> This worked for me with emacs-23.0.0.1 until yesterday's update to
>> glibc-2.6-1. (No, I haven't rebuilt emacs today after the glibc-2.6-1
>> update)
>
> Do you mean that dumping emacs-23.0.0.1 with "MALLOC_MMAP_MAX_=0"
> built a binary that worked with glibc-2.5.90-22 but stopped working
> after the update to glibc-2.6-1?
>
> emacs-23 is the unicode2 branch, right? I would assume that the
> malloc hooks, dumper, lisp allocator, etc are common to the two
> branches.
>
Yes. What's strange is, I did rebuild again with glibc-2.6-1, and now it's
working again.
Very strange.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-18 13:42 22.0.99 emacs dumper (?) problem Chip Coldwell
2007-05-18 14:16 ` Neal Becker
@ 2007-05-19 22:32 ` Richard Stallman
2007-05-21 3:26 ` Chong Yidong
2 siblings, 0 replies; 13+ messages in thread
From: Richard Stallman @ 2007-05-19 22:32 UTC (permalink / raw)
To: Chip Coldwell; +Cc: emacs-devel
There was a report that the same problem happened with the unicode-2
Emacs branch. That seems to have been spurious. But as far as I know
that doesn't mean your problem isn't real.
Can you please try building the previous pretests, and see if
the problem has appeared recently?
MALLOC_MMAP_MAX_=0
to this line
LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
works, although there are some conflicting reports about this, also.
It is possible that this is a Glibc bug, perhaps due to some
private change in Fedora's version of Glibc.
Here is an idea. You could try changing emacs_blocked_malloc
so that, when Vpurify_flag is non-nil, it uses mallopt to prevent
malloc mmap'ing. It could save the value of the N_MMAP_MAX
parameter, then set it to 0, then call malloc, then restore
the old value of the parameter. But it should do this
only when Vpurify_flag is non-nil.
Does that fix the problem?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-18 13:42 22.0.99 emacs dumper (?) problem Chip Coldwell
2007-05-18 14:16 ` Neal Becker
2007-05-19 22:32 ` Richard Stallman
@ 2007-05-21 3:26 ` Chong Yidong
2007-05-21 7:49 ` Glenn Morris
` (2 more replies)
2 siblings, 3 replies; 13+ messages in thread
From: Chong Yidong @ 2007-05-21 3:26 UTC (permalink / raw)
To: Chip Coldwell; +Cc: emacs-devel
> an emacs binary built using the Fedora 7 build system will seg-fault
> immediately when run on a Fedora 7 install. The stack trace shows
> an assert failing in the glibc malloc code (usually it's setlocale
> invoking malloc); it appears that something about the way the dumper
> preserves the malloc state between the dumping emacs and the dumped
> emacs isn't quite working.
>From reading the bugzilla entry, it appears that the problem is a set
of bugs in glibc that are exposed by the way Emacs uses
malloc_set_state. Changing Emacs to avoid this, even if it's
possible, will probably delay the Emacs 22.1 release by another year
or so, which is undesirable.
Therefore, I propose adding the following entry to the Emacs PROBLEMS
file.
** Emacs crashes on startup on Fedora Core 7.
This appears to be due to a change in the internal format of freed
blocks introduced between glibc-2.5.90-21 and glibc-2.5.90-22. As a
result, Emacs binaries built using older versions of glibc may crash
when run using newer versions of glibc. For details, see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239344
Rebuilding Emacs with newer versions of glibc, with the following
patch applied to the Emacs source tree, should remove the crashes.
--- emacs-22.0.95/src/Makefile.in-save 2007-05-13 11:53:01.000000000 -0700
+++ emacs-22.0.95/src/Makefile.in 2007-05-13 11:53:55.000000000 -0700
@@ -960,7 +960,7 @@
ln temacs${EXEEXT} emacs${EXEEXT}
#else
#ifdef HAVE_SHM
- LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
+ MALLOC_MMAP_MAX_=0 LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
#else /* ! defined (HAVE_SHM) */
LC_ALL=C $(RUN_TEMACS) -batch -l loadup dump
#endif /* ! defined (HAVE_SHM) */
Are there any objections?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-21 3:26 ` Chong Yidong
@ 2007-05-21 7:49 ` Glenn Morris
2007-05-21 13:24 ` Richard Stallman
2007-05-21 14:47 ` Chip Coldwell
2 siblings, 0 replies; 13+ messages in thread
From: Glenn Morris @ 2007-05-21 7:49 UTC (permalink / raw)
To: Chong Yidong; +Cc: Chip Coldwell, emacs-devel
Chong Yidong wrote:
> This appears to be due to a change in the internal format of freed
> blocks introduced between glibc-2.5.90-21 and glibc-2.5.90-22. As a
> result, Emacs binaries built using older versions of glibc may crash
> when run using newer versions of glibc.
I agree that adding a PROBLEMS entry, and trying to come up with a
proper fix for 22.2 (not 22.1), is the best course of action at this
stage.
I think rms usually does not read web-based bug reports, so he
probably has not looked at the long bugzilla entry, and may have
missed the important information that you state above. Namely, that
the problem is not that Emacs can't be built on FC7, but rather that
if the glibc version is changed Emacs needs to be rebuilt with the
stated workaround. Whilst it would be good to fix properyl, I hope
this can be treated as a non-showstopper right now.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-21 3:26 ` Chong Yidong
2007-05-21 7:49 ` Glenn Morris
@ 2007-05-21 13:24 ` Richard Stallman
2007-05-21 14:45 ` Chip Coldwell
2007-05-21 14:47 ` Chip Coldwell
2 siblings, 1 reply; 13+ messages in thread
From: Richard Stallman @ 2007-05-21 13:24 UTC (permalink / raw)
To: Chong Yidong; +Cc: coldwell, emacs-devel
I cannot reconcile the things that you've said about the problem.
If this is the full extent of the problem
This appears to be due to a change in the internal format of freed
blocks introduced between glibc-2.5.90-21 and glibc-2.5.90-22. As a
result, Emacs binaries built using older versions of glibc may crash
when run using newer versions of glibc.
then it ought to work just to relink temacs with the newer libc
and then dump Emacs again.
(Mr Coldwell, what happens if you do that?)
However, if this change is needed also
+ MALLOC_MMAP_MAX_=0 LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
to make Emacs work with the newer libc, it implies there is a worse
problem. It implies that Emacs fails to work correctly with the newer
libc, and never mind what the older libc did.
Which is it?
Meanwhile, is this workaround really ok? That depends on facts I
don't know. Maybe it only prevents use of mmap while dumping. That
is not a big price to pay. But maybe it also prevents use of mmap
when the dumped Emacs is subsequently run. That could be a high price
to pay.
Can Mr Coldwell (or anyone) tell us which?
If it only prevents use of mmap when running temacs, then it is ok as
a workaround, but I would like to generate it automatically thru the
makefile mechanism rather than ask users to patch it by hand.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-21 13:24 ` Richard Stallman
@ 2007-05-21 14:45 ` Chip Coldwell
0 siblings, 0 replies; 13+ messages in thread
From: Chip Coldwell @ 2007-05-21 14:45 UTC (permalink / raw)
To: Richard Stallman; +Cc: emacs-devel, coldwell, Chong Yidong
On Mon, 21 May 2007, Richard Stallman wrote:
> I cannot reconcile the things that you've said about the problem.
>
> If this is the full extent of the problem
>
> This appears to be due to a change in the internal format of freed
> blocks introduced between glibc-2.5.90-21 and glibc-2.5.90-22. As a
> result, Emacs binaries built using older versions of glibc may crash
> when run using newer versions of glibc.
>
> then it ought to work just to relink temacs with the newer libc
> and then dump Emacs again.
>
> (Mr Coldwell, what happens if you do that?)
Yes, that works; the resulting dumped emacs binary does not seg-fault
on startup.
> However, if this change is needed also
>
> + MALLOC_MMAP_MAX_=0 LC_ALL=C $(RUN_TEMACS) -nl -batch -l loadup dump
>
> to make Emacs work with the newer libc, it implies there is a worse
> problem. It implies that Emacs fails to work correctly with the newer
> libc, and never mind what the older libc did.
>
> Which is it?
My current belief is that it is the former -- changes in the internal
malloc data structures meant that the dumping temacs and dumped emacs
had to have the same glibc version. I currently believe the
MALLOC_MMAP_MAX_=0 is a red herring; I've build several binaries
without it that work fine.
dump_emacs contains this line
malloc_state_ptr = malloc_get_state();
just before the call to unexec. Then malloc_initialize_hook (bound to
the weak symbol __malloc_initialize_hook) does
if (initialized)
{
[ ... ]
malloc_set_state (malloc_state_ptr);
#ifndef XMALLOC_OVERRUN_CHECK
free (malloc_state_ptr);
#endif
}
The "initialized" variable is zero in the dumping emacs and nonzero in
the dumped emacs.
So malloc_get_state returns a pointer to an opaque data structure in
the .bss segment (I believe all calls to malloc in temacs are guarded
such that malloc always uses sbrk not mmap). The .bss gets dumped
into a new .data segment in the emacs binary with the opaque data
structure. The dumped emacs binary then starts up and passes the
pointer to the opaque data structure in the call to malloc_set_state.
Since the temacs binary and the emacs binary could be linked to
different versions of glibc, there can be no incompatible changes to
this opaque data structure between glibc versions. IOW, the data
structure is not quite as opaque as the glibc maintainers believed.
> Meanwhile, is this workaround really ok?
We are currently building without it; I believe this workaround isn't
even necessary.
> Can Mr Coldwell (or anyone) tell us which?
Once the glibc situation settles down, I will verify that I can dump
emacs on glibc-2.5.90-21 and run the resulting binary when it links to
glibc-2.6-whatever. Then we can put this to rest.
> If it only prevents use of mmap when running temacs, then it is ok as
> a workaround, but I would like to generate it automatically thru the
> makefile mechanism rather than ask users to patch it by hand.
I think that the reason the workaround worked has something to do with
the specifics of the malloc saved state opaque structure. If mmap was
off during dumping, then that saved state structure from previous
glibc versions was compatible with the new glibc version, just by
luck.
Chip
--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-21 3:26 ` Chong Yidong
2007-05-21 7:49 ` Glenn Morris
2007-05-21 13:24 ` Richard Stallman
@ 2007-05-21 14:47 ` Chip Coldwell
2007-05-21 15:27 ` Chong Yidong
2 siblings, 1 reply; 13+ messages in thread
From: Chip Coldwell @ 2007-05-21 14:47 UTC (permalink / raw)
To: Chong Yidong; +Cc: emacs-devel
On Sun, 20 May 2007, Chong Yidong wrote:
>
> Therefore, I propose adding the following entry to the Emacs PROBLEMS
> file.
>
>
> ** Emacs crashes on startup on Fedora Core 7.
We very genuinely hope to get this fixed in glibc before Fedora 7
(note that they are dropping the "Core") ships, so hopefully the
PROBLEMS entry will not be necessary.
Chip
--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-21 14:47 ` Chip Coldwell
@ 2007-05-21 15:27 ` Chong Yidong
2007-05-21 15:28 ` Chip Coldwell
2007-05-22 14:51 ` Richard Stallman
0 siblings, 2 replies; 13+ messages in thread
From: Chong Yidong @ 2007-05-21 15:27 UTC (permalink / raw)
To: Chip Coldwell; +Cc: emacs-devel
Chip Coldwell <coldwell@redhat.com> writes:
> On Sun, 20 May 2007, Chong Yidong wrote:
>>
>> Therefore, I propose adding the following entry to the Emacs PROBLEMS
>> file.
>>
>>
>> ** Emacs crashes on startup on Fedora Core 7.
>
> We very genuinely hope to get this fixed in glibc before Fedora 7
> (note that they are dropping the "Core") ships, so hopefully the
> PROBLEMS entry will not be necessary.
I think there is no harm documenting this, in case anyone happens to
use the affected versions of glibc. I'v added the following text to
PROBLEMS.
** Emacs crashes on startup after a glibc upgrade.
This is caused by a binary incompatible change to the malloc
implementation in glibc 2.5.90-22. As a result, Emacs binaries built
using prior versions of glibc crash when run under 2.5.90-22.
This problem was first seen in pre-release versions of Fedora 7, and
may be fixed in the final Fedora 7 release. To stop the crash from
happening, first try upgrading to the newest version of glibc; if this
does not work, rebuild Emacs with the same version of glibc that you
will run it under. For details, see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239344
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-21 15:27 ` Chong Yidong
@ 2007-05-21 15:28 ` Chip Coldwell
2007-05-22 14:51 ` Richard Stallman
1 sibling, 0 replies; 13+ messages in thread
From: Chip Coldwell @ 2007-05-21 15:28 UTC (permalink / raw)
To: Chong Yidong; +Cc: emacs-devel
On Mon, 21 May 2007, Chong Yidong wrote:
> Chip Coldwell <coldwell@redhat.com> writes:
>
> >
> > We very genuinely hope to get this fixed in glibc before Fedora 7
> > (note that they are dropping the "Core") ships, so hopefully the
> > PROBLEMS entry will not be necessary.
>
> I think there is no harm documenting this, in case anyone happens to
> use the affected versions of glibc. I'v added the following text to
> PROBLEMS.
>
>
> ** Emacs crashes on startup after a glibc upgrade.
>
> This is caused by a binary incompatible change to the malloc
> implementation in glibc 2.5.90-22. As a result, Emacs binaries built
> using prior versions of glibc crash when run under 2.5.90-22.
>
> This problem was first seen in pre-release versions of Fedora 7, and
> may be fixed in the final Fedora 7 release. To stop the crash from
> happening, first try upgrading to the newest version of glibc; if this
> does not work, rebuild Emacs with the same version of glibc that you
> will run it under. For details, see
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=239344
OK, sounds reasonable to me.
Chip
--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: 22.0.99 emacs dumper (?) problem
2007-05-21 15:27 ` Chong Yidong
2007-05-21 15:28 ` Chip Coldwell
@ 2007-05-22 14:51 ` Richard Stallman
1 sibling, 0 replies; 13+ messages in thread
From: Richard Stallman @ 2007-05-22 14:51 UTC (permalink / raw)
To: Chong Yidong; +Cc: coldwell, emacs-devel
I think there is no harm documenting this, in case anyone happens to
use the affected versions of glibc. I'v added the following text to
PROBLEMS.
This text is good. Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-05-22 14:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-18 13:42 22.0.99 emacs dumper (?) problem Chip Coldwell
2007-05-18 14:16 ` Neal Becker
2007-05-18 16:19 ` Chip Coldwell
2007-05-18 23:00 ` Neal Becker
2007-05-19 22:32 ` Richard Stallman
2007-05-21 3:26 ` Chong Yidong
2007-05-21 7:49 ` Glenn Morris
2007-05-21 13:24 ` Richard Stallman
2007-05-21 14:45 ` Chip Coldwell
2007-05-21 14:47 ` Chip Coldwell
2007-05-21 15:27 ` Chong Yidong
2007-05-21 15:28 ` Chip Coldwell
2007-05-22 14:51 ` Richard Stallman
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.