From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.emacs.devel Subject: Dumper problems and a possible solutions Date: Tue, 24 Jun 2014 13:19:55 -0400 Message-ID: <20140624171955.GS179@brightrain.aerifal.cx> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1403635674 25022 80.91.229.3 (24 Jun 2014 18:47:54 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 24 Jun 2014 18:47:54 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jun 24 20:47:49 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WzVkz-0001B4-7B for ged-emacs-devel@m.gmane.org; Tue, 24 Jun 2014 20:47:49 +0200 Original-Received: from localhost ([::1]:33369 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzVky-0000gB-PI for ged-emacs-devel@m.gmane.org; Tue, 24 Jun 2014 14:47:48 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33837) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzUto-000517-5f for emacs-devel@gnu.org; Tue, 24 Jun 2014 13:52:59 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WzUtg-0007Zb-LK for emacs-devel@gnu.org; Tue, 24 Jun 2014 13:52:52 -0400 Original-Received: from 216-12-86-13.cv.mvl.ntelos.net ([216.12.86.13]:44408 helo=brightrain.aerifal.cx) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzUtg-0007ZR-9E for emacs-devel@gnu.org; Tue, 24 Jun 2014 13:52:44 -0400 Original-Received: from dalias by brightrain.aerifal.cx with local (Exim 3.15 #2) id 1WzUNv-0007tH-00 for emacs-devel@gnu.org; Tue, 24 Jun 2014 17:19:55 +0000 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 216.12.86.13 X-Mailman-Approved-At: Tue, 24 Jun 2014 14:47:46 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:172685 Archived-At: Hi, I've been trying to get current emacs working on musl libc based systems, and running into trouble with the dumper. After a lot of hacking I got it to work, and I'm hoping something based on the ideas (not the implementation, which is hideously ugly) in my work could be acceptable upstream. By far the biggest problem is malloc-related. musl does not support overriding malloc, so I'm building with system malloc. The emacs dumper assumes all of the allocations that need to survive dumping end up in the brk segment, which is not a constraint musl's allocator can satisfy -- it has no support for huge allocations in brk, and will sometimes opt to use mmap rather than brk for extending the heap for small objects. The hack I used to solve this is really simple: I added to alloc.c a tiny allocator that just uses a giant static buffer which gets used for lisp object allocations prior to dumping (it has an extern flag that the dumper sets to indicate whether this code is running prior to dumping or afterwards). With free being a NOP, as it is in my implementation right now, I had to make this buffer 400 megs; that's the main reason the patch would be utterly unacceptable as-is. I believe the problem could be solved by writing a trivial "early malloc" implementation that uses a static buffer, but with proper recycling of free areas. There are also, however, at least two other issues which affect static linking. Dynamic linking does not seem to be affected because the dumper doesn't save libc's globals in the dynamic-linked case, and these involve global state in libc: One is that, with modern Linux kernels with brk randomization as part of ASLR, dumping saves malloc's idea of the current brk, and when it mismatches at runtime, malloc will either crash or "allocate" memory that's not even mapped. Trying to work around this in musl is not acceptable because it would penalize all programs with extra syscalls whenever malloc has to expand the brk. The other issue is that musl's clock_gettime and related functions store the pointer to the vdso version of this function. Since the kernel maps vdso at a random address, the stored value from before dumping will not be valid when the dumped executable is run. To solve ALL of the problems with the dumper (which seems to be a recurring theme), I have a proposed design to make it fully portable -- even moreso than xemacs "portable dumper" which is still an ugly hack. The idea is simple: after loading all of the lisp objects that need dumping, walk the lisp heap and output a representation for each object as a giant static array in C source format, then compile and link this new translation unit with the rest of the emacs .o files to produce a final emacs binary. No hacks with binary formats would be involved; everything would happen at the C source level. As part of the lisp heap dumping, address references to other objects would have to be relocated to refer to the object's position in the static array rather than the original address at which the object resided when created in temacs. That's some non-trivial work, but definitely no prohibitive, and as a bonus, the generated address-constant references in the static array would transform to load-address-relative relocations for the linker, allowing emacs to be built as a position-indepdendent executable (PIE) if desired. Does this sound like a viable direction? I'm not an emacs hacker by any means and don't think I'm qualified to do the lisp heap dumping implementation, but I could certainly help with design or any issues that arise during implementation if others are interested in working on it. If not, or if that's going to be a very long-term project, would a cleaned-up version of my current solution be acceptable upstream? Rich