From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Ken Raeburn Newsgroups: gmane.emacs.devel Subject: Re: compiled lisp file format (Re: Skipping unexec via a big .elc file) Date: Mon, 29 May 2017 05:33:50 -0400 Message-ID: References: <8A8DA980-13A7-4F8B-9D07-391728C673C9@raeburn.org> <83inmq53xk.fsf@gnu.org> <96D35768-314C-43F5-BD5E-B12187759DCA@raeburn.org> <123104DD-447F-4CDB-B3A0-CED80E3AC8C9@raeburn.org> <20170403165736.GA2851@acm> <2497A2D5-FDB1-47FF-AED3-FD4ABE2FE144@raeburn.org> <83lgrhpalq.fsf@gnu.org> <0D99B4FE-FEEF-4565-87D6-E230A05DEF3C@raeburn.org> <86lgrc4vob.fsf@molnjunk.nocrew.org> <834ly0oew1.fsf@gnu.org> <968E8F50-92F6-43C7-B7E4-EE8378943087@raeburn.org> <83wpawmj4d.fsf@gnu.org> <1e397033-8291-1625-8b78-a1e1c200aea5@gmail.com> <18196f08-408d-8b17-423e-8be54507bb84@gmail.com> <8360hkkcgj.fsf@gnu.org> <26b35c16-33e7-0e08-9cc5-6f9b81e40968@cs.ucla.edu> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_75315531-7390-4E95-BAED-EF24253357F7" X-Trace: blaine.gmane.org 1496111313 19472 195.159.176.226 (30 May 2017 02:28:33 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 30 May 2017 02:28:33 +0000 (UTC) Cc: Paul Eggert , Emacs developers To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon May 29 11:35:26 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dFH57-0002J8-7a for ged-emacs-devel@m.gmane.org; Mon, 29 May 2017 11:35:21 +0200 Original-Received: from localhost ([::1]:47765 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dFH5C-000861-LS for ged-emacs-devel@m.gmane.org; Mon, 29 May 2017 05:35:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44331) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dFH3j-0007g9-7n for emacs-devel@gnu.org; Mon, 29 May 2017 05:33:57 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dFH3h-0007SB-02 for emacs-devel@gnu.org; Mon, 29 May 2017 05:33:55 -0400 Original-Received: from mail-qk0-x232.google.com ([2607:f8b0:400d:c09::232]:34339) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dFH3g-0007S1-R2 for emacs-devel@gnu.org; Mon, 29 May 2017 05:33:52 -0400 Original-Received: by mail-qk0-x232.google.com with SMTP id d14so4171901qkb.1 for ; Mon, 29 May 2017 02:33:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raeburn-org.20150623.gappssmtp.com; s=20150623; h=subject:mime-version:from:in-reply-to:date:cc:message-id:references :to; bh=VgKL/uCwkIi4UVLzb+UaaskpYZFYTum577uib/WFUzA=; b=HC4BeXSRftrvaTbT/TSTHvyW88jH5QdwePgGzrdwMGWz2OlTI/F9ZPQAkxVKLJ9I48 qcfQDuUonh3GlgSWxvxqF8Ax1/5YCU1yedI62BDQrVmZZwXd3Dtb27UJfiv+0cMS37tX h+yf+T3Uu1YfiW8ns0Hi83Twb3tkEw3VjMa/ge0DxoaEExQK3D9/PDGtMabDxK5KkB4B CCx4j0+lnPw+VbL7wv/tQ7BVe5fcEx76/jFuwNOUwkcYCANClJT/pVTe3QSzrltS3Z0y b2lE/RTP2KbqyeYszKQ7eAxG8SBfjbHyRS8Zz5rskav0YGQeZXM08fsXyohzW7zE/yQh 323A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:mime-version:from:in-reply-to:date:cc :message-id:references:to; bh=VgKL/uCwkIi4UVLzb+UaaskpYZFYTum577uib/WFUzA=; b=piutF3yRtvDGu5UKVBfowWc0biw/huuwApOKCw7f0CIuYZH73hvRD53P8JrphUz2uq elGut1ypM9xShvmkPtDz1U0sPxxfn6CzuBjArSLJrB0z4BhySXaFT02922ERGLKCW6Bx TKl/yDfO/CPkHTvcFErUCOGmNh7YN4738VGDkwLLBUJXOMX7fz0NgRXwhzpk1pFZwtPl ERaS5PGIdcZZMXuXUtIsxilov26uP/akv01vk0xuU/TYElqRpsxq4P+Gcn0lo8uUeUhI ic4bz3JSVR9A7wVHKwLrHZc8tRnAQ26Q/AgVHBQdNFr3tANy41jHz3FInoEKVKD2ZVB2 9mag== X-Gm-Message-State: AODbwcBPMCCIIRjnCo07OhQnWssiae3UXS1T2MekANGRHAK07lzeuUPT 9NW/vmgab5E7nEhV X-Received: by 10.55.138.193 with SMTP id m184mr14624787qkd.192.1496050432224; Mon, 29 May 2017 02:33:52 -0700 (PDT) Original-Received: from [192.168.23.52] (c-73-253-167-23.hsd1.ma.comcast.net. [73.253.167.23]) by smtp.gmail.com with ESMTPSA id n19sm5900502qkn.66.2017.05.29.02.33.50 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 29 May 2017 02:33:51 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3124) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c09::232 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:215336 --Apple-Mail=_75315531-7390-4E95-BAED-EF24253357F7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On May 28, 2017, at 08:43, Philipp Stephani = wrote: >=20 >=20 > Ken Raeburn > schrieb = am So., 28. Mai 2017 um 13:07 Uhr: >=20 > On May 21, 2017, at 04:53, Paul Eggert > wrote: >=20 > > Ken Raeburn wrote: > >> The Guile project has taken this idea pretty far; they=E2=80=99re = generating ELF object files with a few special sections for Guile = objects, using the standard DWARF sections for debug information, etc. = While it has a certain appeal (making C modules and Lisp files look much = more similar, maybe being able to link Lisp and C together into one = executable image, letting GDB understand some of your data), switching = to a machine-specific format would be a pretty drastic change, when we = can currently share the files across machines. > > > > Although it does indeed sound like a big change, I don't see why it = would prevent us from sharing the files across machines. Emacs can use = standard ELF and DWARF format on any platform if Emacs is doing the = loading. And there should be some software-engineering benefit in using = the same format that Guile uses. >=20 > Sorry for the delay in responding. >=20 > The ELF format has header fields indicating the word size, endianness, = machine architecture (though there=E2=80=99s a value for =E2=80=9Cnone=E2=80= =9D), and OS ABI. Some fields vary in size or order depending on = whether the 32-bit or 64-bit format is in use. Some other format = details (e.g., relocation types, interpretation of certain ranges of = values in some fields) are architecture- or OS-dependent; we might not = care about many of those details, but relocations are likely needed if = we want to play linking games or use DWARF. >=20 > I think Guile is using whatever the native word size and architecture = are. If we do that for Emacs, they=E2=80=99re not portable between = platforms. Currently it works for me to put my Lisp files, both source = and compiled, into ~/elisp and use them from different kinds of machines = if my home directory is NFS-mounted. >=20 > We could instead pick fixed values (say, architecture =E2=80=9Cnone=E2=80= =9D, little-endian, 32-bit), but then there=E2=80=99s no guarantee that = we could use any of the usual GNU tools on them without a bunch of work, = or that we=E2=80=99d ever be able to use non-GNU tools to treat them as = object files. Then again, we couldn=E2=80=99t expect to do the latter = portably anyway, since some of the platforms don=E2=80=99t even use ELF. >=20 >=20 > Is there any significant advantage of using ELF, or could this just = use one of the standard binary serialization formats (protobuf, = flatbuffer, ...)?=20 That=E2=80=99s an interesting idea. If one of the popular serialization = libraries is compatibly licensed, easy to use, and performs well, it may = be better than rolling our own. It=E2=80=99ll need to handle data = structures with circular or cross-linked references. And we have the = doc string delayed-loading optimization (that currently uses #$ and #@ = syntaxes); presumably we=E2=80=99d like to keep that optimization in = some form. It would be good not to have to build all our data = structures on ones generated by the tool with its own bookkeeping = fields; having anything in a cons cell besides the =E2=80=9Ccar=E2=80=9D = and =E2=80=9Ccdr=E2=80=9D slots would mean a significant increase in = memory use. I initially said, =E2=80=9Cfollow the model of flat object file = formats=E2=80=9D, not =E2=80=9Cuse ELF=E2=80=9D; ELF is just one way of = organizing the data of an object file, with years of experience behind = it, which we could use wholesale or borrow some lessons from. One of = the typical advantages of object file formats is that the data is = grouped for efficient memory usage; some sections of a file will be = mapped into the address space read-only (shared between processes), = other sections read-write (possibly shared until copied on write), and = others not mapped at all. For example, we might put symbol names = (normally never modified but it can be done), doc strings (to be loaded = later, only if needed), byte code, and other strings into their own = sections, and create Lisp_String objects and such pointing to those = bytes as needed. We don=E2=80=99t keep much in the way of source = location information for Lisp code around, but if we ever change that, = arguably it could go in a file section that=E2=80=99s not mapped or read = until the debugger wants the information. The Guile project=E2=80=99s documentation says their use of ELF is = intended to build on existing work to invent a good object file format = with several desired characteristics = (https://www.gnu.org/software/guile/manual/html_node/Object-File-Format.ht= ml): =E2=80=A2 Above all else, it should be very cheap to load a = compiled file. =E2=80=A2 It should be possible to statically allocate constants = in the file. For example, a bytevector literal in source code can be = emitted directly into the object file. =E2=80=A2 The compiled file should enable maximum code and data = sharing between different processes. =E2=80=A2 The compiled file should contain debugging = information, such as line numbers, but that information should be = separated from the code itself. It should be possible to strip debugging = information if space is tight. They=E2=80=99re generating byte code currently, but are looking forward = towards generating native code as well (instead?). Their write-up implicitly assumes that, as with =E2=80=9Cnormal=E2=80=9D = object files, the idea is to mmap the data into the address space, some = of it read-only and some of it automatically getting some patching up, = and then using those in-memory objects directly. There=E2=80=99s no = explicit discussion of the tradeoffs of loading a file all at once = versus reading one object tree (S-expression) at a time from an input = stream, but especially when mapping and using much of the data = unmodified is feasible, I suspect the all-at-once approach is likely to = be more efficient. Whether that would be true in a case like Emacs, I = don=E2=80=99t know. They use DWARF for carrying some debug information, but so far I=E2=80=99m= unsure what information is actually stored there. Ken= --Apple-Mail=_75315531-7390-4E95-BAED-EF24253357F7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On May = 28, 2017, at 08:43, Philipp Stephani <p.stephani2@gmail.com> wrote:



Ken = Raeburn <raeburn@raeburn.org> schrieb am So., 28. Mai 2017 um = 13:07 Uhr:

On May 21, 2017, at 04:53, Paul Eggert <eggert@cs.ucla.edu> wrote:

> Ken Raeburn wrote:
>> The Guile project has taken this idea pretty far; they=E2=80=99re= generating ELF object files with a few special sections for Guile = objects, using the standard DWARF sections for debug information, = etc.  While it has a certain appeal (making C modules and Lisp = files look much more similar, maybe being able to link Lisp and C = together into one executable image, letting GDB understand some of your = data), switching to a machine-specific format would be a pretty drastic = change, when we can currently share the files across machines.
>
> Although it does indeed sound like a big change, I don't see why it = would prevent us from sharing the files across machines. Emacs can use = standard ELF and DWARF format on any platform if Emacs is doing the = loading. And there should be some software-engineering benefit in using = the same format that Guile uses.

Sorry for the delay in responding.

The ELF format has header fields indicating the word size, endianness, = machine architecture (though there=E2=80=99s a value for =E2=80=9Cnone=E2=80= =9D), and OS ABI.  Some fields vary in size or order depending on = whether the 32-bit or 64-bit format is in use.  Some other format = details (e.g., relocation types, interpretation of certain ranges of = values in some fields) are architecture- or OS-dependent; we might not = care about many of those details, but relocations are likely needed if = we want to play linking games or use DWARF.

I think Guile is using whatever the native word size and architecture = are.  If we do that for Emacs, they=E2=80=99re not portable between = platforms.  Currently it works for me to put my Lisp files, both = source and compiled, into ~/elisp and use them from different kinds of = machines if my home directory is NFS-mounted.

We could instead pick fixed values (say, architecture =E2=80=9Cnone=E2=80=9D= , little-endian, 32-bit), but then there=E2=80=99s no guarantee that we = could use any of the usual GNU tools on them without a bunch of work, or = that we=E2=80=99d ever be able to use non-GNU tools to treat them as = object files.  Then again, we couldn=E2=80=99t expect to do the = latter portably anyway, since some of the platforms don=E2=80=99t even = use ELF.


Is there any significant advantage of using ELF, or could = this just use one of the standard binary serialization formats = (protobuf, flatbuffer, ...)? 

That=E2=80=99= s an interesting idea.  If one of the popular serialization = libraries is compatibly licensed, easy to use, and performs well, = it may be better than rolling our own. =  It=E2=80=99ll need to handle data structures with circular or = cross-linked references.  And we have the doc string delayed-loading optimization = (that currently uses #$ and #@ syntaxes); presumably we=E2=80=99d like = to keep that optimization in some form.  It would = be good not to have to build all our data structures on ones generated = by the tool with its own bookkeeping fields; having anything in a cons = cell besides the =E2=80=9Ccar=E2=80=9D and =E2=80=9Ccdr=E2=80=9D slots = would mean a significant increase in memory use.


I initially = said, =E2=80=9Cfollow the model of flat object file formats=E2=80=9D,= not =E2=80=9Cuse ELF=E2=80=9D; ELF is just one way of organizing = the data of an object file, with years of experience behind it, which we = could use wholesale or borrow some lessons from.  One of the = typical advantages of object file formats is that the data is grouped = for efficient memory usage; some sections of a file will be mapped into = the address space read-only (shared between processes), other sections = read-write (possibly shared until copied on write), and others not = mapped at all.  For example, we might put symbol names (normally = never modified but it can be done), doc strings (to be loaded later, = only if needed), byte code, and other strings into their own sections, = and create Lisp_String objects and such pointing to those bytes as = needed.  We don=E2=80=99t keep much in the way of source location = information for Lisp code around, but if we ever change that, arguably = it could go in a file section that=E2=80=99s not mapped or read until = the debugger wants the information.


The Guile project=E2=80=99s = documentation says their use of ELF is intended to build on existing = work to invent a good object file format with several desired = characteristics (https://www.gnu.org/software/guile/manual/html_node/Object-File= -Format.html):

= =E2=80=A2 Above all else, it should be very cheap to load a = compiled file.
= =E2=80=A2 It should be possible to statically allocate = constants in the file. For example, a bytevector literal in source = code can be emitted directly into the object file.
=E2=80=A2 The compiled file = should enable maximum code and data sharing between different = processes.
= =E2=80=A2 The compiled file should contain debugging = information, such as line numbers, but that information should be = separated from the code itself. It should be possible to strip = debugging information if space is tight.

They=E2=80=99re = generating byte code currently, but are looking forward towards = generating native code as well (instead?).

Their write-up implicitly assumes that, = as with =E2=80=9Cnormal=E2=80=9D object files, the idea is to mmap the = data into the address space, some of it read-only and some of it = automatically getting some patching up, and then using those in-memory = objects directly.  There=E2=80=99s no explicit discussion of the = tradeoffs of loading a file all at once versus reading one object tree = (S-expression) at a time from an input stream, but especially when = mapping and using much of the data unmodified is feasible, I suspect the = all-at-once approach is likely to be more efficient.  Whether that = would be true in a case like Emacs, I don=E2=80=99t know.

They use DWARF for = carrying some debug information, but so far I=E2=80=99m unsure what = information is actually stored there.

Ken
= --Apple-Mail=_75315531-7390-4E95-BAED-EF24253357F7--