From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: Skipping unexec via a big .elc file Date: Tue, 25 Oct 2016 09:14:55 -0700 Message-ID: <02b7c01e-9df1-bde6-9199-1ced140c143e@dancol.org> References: <838ttyhhzu.fsf@gnu.org> <871szqwu51.fsf@users.sourceforge.net> <831szqhbc2.fsf@gnu.org> <87d1itt79z.fsf_-_@users.sourceforge.net> <7baa18d4-2b09-caa8-005e-29008a383ad1@cs.ucla.edu> <83mvhwrgd5.fsf@gnu.org> <8539f38f-9a11-44c3-4de7-bb974c96206c@cs.ucla.edu> <8360ojpndr.fsf@gnu.org> <83shrnm0k1.fsf@gnu.org> <83insi5jy9.fsf@gnu.org> <83mvht50qb.fsf@gnu.org> <8c085c3e-361d-7d10-6f34-07c387eb3b43@dancol.org> <83a8dt4u3a.fsf@gnu.org> <83twc0whav.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1477412705 32315 195.159.176.226 (25 Oct 2016 16:25:05 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 25 Oct 2016 16:25:05 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 Cc: p.stephani2@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Oct 25 18:25:00 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bz4Wp-0006Cp-Rp for ged-emacs-devel@m.gmane.org; Tue, 25 Oct 2016 18:24:44 +0200 Original-Received: from localhost ([::1]:56357 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bz4Ws-00024T-4U for ged-emacs-devel@m.gmane.org; Tue, 25 Oct 2016 12:24:46 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39633) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bz4Nc-0003IX-Fo for emacs-devel@gnu.org; Tue, 25 Oct 2016 12:15:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bz4NY-0002hX-Ht for emacs-devel@gnu.org; Tue, 25 Oct 2016 12:15:12 -0400 Original-Received: from dancol.org ([2600:3c01::f03c:91ff:fedf:adf3]:54972) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1bz4NY-0002bZ-8v; Tue, 25 Oct 2016 12:15:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:Cc:References:To:Subject; bh=c0S3LpKUxHe9LucYml+iRkTp9hRDuhsa/jr4QxmbBp4=; b=TNPIuIhiuR6PeOxeGefhSdG/tz/ManU3O3tPcnpKFmovCspEi6v+dXhRbNHSgU0aIIlkr7fMmVM5AMd8+UDDm9Mr9ybwMDkwX2lgDFGlZ4Mh7R96TDRoAGggkkclN4EeTGaj7WsvUfn/nkjBorqXs4rNOv4XGdsp5bP5OapFttCPTWR/dWAn/P3+66aSiCa2XMj3Co85ZLrt8Oe2HvNsbPCEVTqERCuC1AMk2YL6E7Ex9DD87Ev2csg4l4Hh0Vi5bXBim/3GjV6ToA0s+Xr6ygNhk/OTJf0Y9bEmdu7ThoNhUC7bKNbLR7HKAco6GsGpubEjgrB9DEb+OEE6AcHm+A==; Original-Received: from c-73-97-199-232.hsd1.wa.comcast.net ([73.97.199.232] helo=[192.168.1.173]) by dancol.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1bz4NP-0004m5-VF; Tue, 25 Oct 2016 09:15:00 -0700 In-Reply-To: <83twc0whav.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2600:3c01::f03c:91ff:fedf:adf3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:208780 Archived-At: On 10/25/2016 08:59 AM, Eli Zaretskii wrote: >> From: Daniel Colascione >> Cc: Eli Zaretskii , monnier@iro.umontreal.ca, emacs-devel@gnu.org >> Date: Mon, 24 Oct 2016 12:47:56 -0700 >> >>> I'd argue that we are already in this situation. For example, nobody >>> knows how to make unexec work with ASLR or PIE; when I tried fuzzing >>> Emacs with AFL, the dumped binary would simply crash; the dumped >>> binary is not reproducible (i.e. bit-by-bit identical after every >>> build); and I think dumping also doesn't work with ASan. The fraction >>> of situation where unexec doesn't work any more gets larger and >>> larger. If we had people who could solve these problems, it should get >>> smaller instead. >> >> Everyone who's seriously thought about the unexec problem _understands_ >> the issue. > > The important point is that the number of people here who can claim > such understanding, enough so to fix the issues, is diminishingly > small, and gets smaller every year. There's no demand for more yet. There isn't a catastrophe --- just low demand for core-change expertise. There used* to be a lot more (at least per-capita) stonemasons in historical societies than in today's society. That doesn't mean we've forgotten how to cut stones, and if there were a sudden need to do it, more stonemasons would magically appear. >> My preferred approach is the portable dumper one: basically what we're >> doing today, except that instead of just blindly copying the data >> segment and heap to a new emacs binary, we'll write this information to >> a separate file, stored in a portable format, a file that we'll keep >> alongside the Emacs binary. We'll store in this file metadata about >> where the pointers are. (There are two kinds of pointers in this file: >> pointers to other parts of the file and pointers to the Emacs binary.) >> >> At startup, we'll load the dump file and walk the relocations, fixing up >> all the embedded addresses to account for the new process's different >> address space. > > Why do you think this will have better performance that reading a > single .elc file at startup? It's still mainly file I/O and > processing of the file's contents, just like with byte-compiled files. Because a portable dumper can do less, on both file I/O and processing of the file's contents. There's no lisp evaluation, no slurping a whole file into memory. Having to read all of Emacs into memory on startup is a burden even on a fast, modern machine like mine. ~/edev/trunk/src $ sync && sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' ~/edev/trunk/src $ time pv < emacs >/dev/null 48.6MiB 0:00:00 [ 455MiB/s] [=========================================================>] 100% real 0m0.116s user 0m0.000s sys 0m0.016s That's pretty fast, but it's not free. Not having to do this much IO on startup in the first place would be even better. > If we have no reason to believe this portable dumper will be > significantly faster, we should IMO investigate the .elc method first, > because it's so much simpler, both in its implementation and in future > maintenance. E.g., adding a new kind of Lisp object to Emacs would > require corresponding changes in the dumper. Adding a new kind of lisp object requires changes throughout core anyway. At the very least, you need to teach GC where your new object keeps its pointers, and that's exactly the knowledge that the dumper would need. >> We can't save all of the Emacs data segment this way, but we can >> relocate and restore anything that's marked with staticpro. The overall >> experience should be very similar to what we have today. >> [...] >> Speaking of COW faults: a refinement of this scheme is to do the >> relocations lazily, in a SIGSEGV handler. (Map the dump file PROT_NONE >> so any access traps.) In the SIGSEGV handler, we can relocate just the >> page we faulted, then continue. This way, we don't need to slurp in the >> entire dump file from disk just to start emacs -Q -batch: we can >> demand-page! > > Demand paging in an application, and an application such as Emacs on > top of that, makes little sense to me. Why? It's conceptually no different from autoload. There is no technique in computer science so rarefied that it's only good in ring zero. > This is the OS business, not > ours. Using mmap as a fast way to read a file, yes, that's done in > many applications. But please lets leave demand paging out of our > scope. Emacs isn't just an application. It's a Lisp virtual machine, and employing the optimization techniques used in other virtual machines can be important wins. (FWIW, mmap isn't a particularly fast way of doing bulk file reads. That's why GNU grep removed its mmap support.) > IMO the less we mess with low-level techniques that no other > applications use the better, both because we have very few people who > can do that and because doing so runs higher risk of becoming broken > by future developments in the platforms we deem important. The > long-term tendency in Emacs development should be to move away from > such techniques, not to acquire more of them. I'm for anything that delivers meaningful performance advantages.