From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Ken Raeburn Newsgroups: gmane.emacs.devel Subject: Re: compiled lisp file format (Re: Skipping unexec via a big .elc file) Date: Sun, 2 Jul 2017 21:44:08 -0400 Message-ID: References: <8A8DA980-13A7-4F8B-9D07-391728C673C9@raeburn.org> <83inmq53xk.fsf@gnu.org> <96D35768-314C-43F5-BD5E-B12187759DCA@raeburn.org> <123104DD-447F-4CDB-B3A0-CED80E3AC8C9@raeburn.org> <20170403165736.GA2851@acm> <2497A2D5-FDB1-47FF-AED3-FD4ABE2FE144@raeburn.org> <83lgrhpalq.fsf@gnu.org> <0D99B4FE-FEEF-4565-87D6-E230A05DEF3C@raeburn.org> <86lgrc4vob.fsf@molnjunk.nocrew.org> <834ly0oew1.fsf@gnu.org> <968E8F50-92F6-43C7-B7E4-EE8378943087@raeburn.org> <83wpawmj4d.fsf@gnu.org> <1e397033-8291-1625-8b78-a1e1c200aea5@gmail.com> <18196f08-408d-8b17-423e-8be54507bb84@gmail.com> <8360hkkcgj.fsf@gnu.org> <26b35c16-33e7-0e08-9cc5-6f9b81e40968@cs.ucla.edu> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_F3466F37-8A99-47FD-BFF7-FD78D05231F7" X-Trace: blaine.gmane.org 1499046270 3494 195.159.176.226 (3 Jul 2017 01:44:30 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 3 Jul 2017 01:44:30 +0000 (UTC) Cc: Paul Eggert , Emacs developers To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jul 03 03:44:25 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dRqPY-0000VA-7t for ged-emacs-devel@m.gmane.org; Mon, 03 Jul 2017 03:44:24 +0200 Original-Received: from localhost ([::1]:59943 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dRqPa-0000Pn-Cv for ged-emacs-devel@m.gmane.org; Sun, 02 Jul 2017 21:44:26 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45069) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dRqPQ-0000PX-IM for emacs-devel@gnu.org; Sun, 02 Jul 2017 21:44:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dRqPN-0004eZ-D2 for emacs-devel@gnu.org; Sun, 02 Jul 2017 21:44:16 -0400 Original-Received: from mail-qk0-x233.google.com ([2607:f8b0:400d:c09::233]:36672) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dRqPN-0004d0-4h for emacs-devel@gnu.org; Sun, 02 Jul 2017 21:44:13 -0400 Original-Received: by mail-qk0-x233.google.com with SMTP id p21so135697364qke.3 for ; Sun, 02 Jul 2017 18:44:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raeburn-org.20150623.gappssmtp.com; s=20150623; h=subject:mime-version:from:in-reply-to:date:cc:message-id:references :to; bh=p+mODrM+ut8odyymJBQ8ZIV3zU9qrmf7BCEtHLDQrLQ=; b=pvqMexkgnDSC+H+Hk7rZkrWl5jvuqkold+9iobZVHQGURFoDwZX07Co6HZht+84/v1 7FL2OO7hu+UTK/hE1MjJ23I8esmtnCI69OPRrogm3nZd5XE2EEfF2yjNcQKzTQgh5pTV fClJ5gCXA2C/25AwKfbHQEz6ICVheWz0x2oYxQGfGtegvBpd1US4b24Yoa0VD6xjmtfj ypW2qcBGPs4cmYhkFg/8gZY9uP3REor8q6WSkR/PVAFV64Vf3p0EyclOlJOMXJ63KIyM WFekk9TH3BpiU8VFw4YQsPeTkXkgOHcxER+8ERwHacJxTmz4eBsXSmYbdHtZ1pw85g+V AxeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:mime-version:from:in-reply-to:date:cc :message-id:references:to; bh=p+mODrM+ut8odyymJBQ8ZIV3zU9qrmf7BCEtHLDQrLQ=; b=njYbeSuDWndKrfn3ePG5UpAoA1i2s9ZI6xjbuwT8Zxwswgb2ivgSaUHkMqUFhoINn5 tdrvzWje0yVZBj+Jpk05HZKDXTnxtXeGgcyD72bbl+SO9VH5sh3xCnctf+3IpJiWxE73 S14dhi+2X8T+/JkHf6omzKQMMX+cyKc+F8BTkAjbhqswAUthBcF+GvE+sswlPhiPmuyS KzbaF+Lgdqmq0bPjOA9GW9HiuoEh9Qmei0ORs95JZkqj55sasEsZWPzQduz4GxX3ugyZ 6hgZ65tDI9AA+XM04zgnhgYK5Owell3o65nf9oHBUAngUUaqF6dH5wPJk8d3HznfFgN9 eImg== X-Gm-Message-State: AKS2vOzsrLhCUsoSdxH8XhLtupFnSs2kJF0TVoOdKlND7FMF5hnJZeB7 WrxPMnud6E/ENxiP X-Received: by 10.55.24.15 with SMTP id j15mr40011592qkh.40.1499046250678; Sun, 02 Jul 2017 18:44:10 -0700 (PDT) Original-Received: from [192.168.23.52] (c-73-253-167-23.hsd1.ma.comcast.net. [73.253.167.23]) by smtp.gmail.com with ESMTPSA id c4sm12380966qtc.1.2017.07.02.18.44.09 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 02 Jul 2017 18:44:10 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3124) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c09::233 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:216125 Archived-At: --Apple-Mail=_F3466F37-8A99-47FD-BFF7-FD78D05231F7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On Jul 2, 2017, at 11:46, Philipp Stephani = wrote: > Ken Raeburn > schrieb = am Mo., 29. Mai 2017 um 11:33 Uhr: >=20 > On May 28, 2017, at 08:43, Philipp Stephani > wrote: >=20 >>=20 >>=20 >> Ken Raeburn > = schrieb am So., 28. Mai 2017 um 13:07 Uhr: >>=20 >> On May 21, 2017, at 04:53, Paul Eggert > wrote: >>=20 >> > Ken Raeburn wrote: >> >> The Guile project has taken this idea pretty far; they=E2=80=99re = generating ELF object files with a few special sections for Guile = objects, using the standard DWARF sections for debug information, etc. = While it has a certain appeal (making C modules and Lisp files look much = more similar, maybe being able to link Lisp and C together into one = executable image, letting GDB understand some of your data), switching = to a machine-specific format would be a pretty drastic change, when we = can currently share the files across machines. >> > >> > Although it does indeed sound like a big change, I don't see why it = would prevent us from sharing the files across machines. Emacs can use = standard ELF and DWARF format on any platform if Emacs is doing the = loading. And there should be some software-engineering benefit in using = the same format that Guile uses. >>=20 >> Sorry for the delay in responding. >>=20 >> The ELF format has header fields indicating the word size, = endianness, machine architecture (though there=E2=80=99s a value for = =E2=80=9Cnone=E2=80=9D), and OS ABI. Some fields vary in size or order = depending on whether the 32-bit or 64-bit format is in use. Some other = format details (e.g., relocation types, interpretation of certain ranges = of values in some fields) are architecture- or OS-dependent; we might = not care about many of those details, but relocations are likely needed = if we want to play linking games or use DWARF. >>=20 >> I think Guile is using whatever the native word size and architecture = are. If we do that for Emacs, they=E2=80=99re not portable between = platforms. Currently it works for me to put my Lisp files, both source = and compiled, into ~/elisp and use them from different kinds of machines = if my home directory is NFS-mounted. >>=20 >> We could instead pick fixed values (say, architecture =E2=80=9Cnone=E2=80= =9D, little-endian, 32-bit), but then there=E2=80=99s no guarantee that = we could use any of the usual GNU tools on them without a bunch of work, = or that we=E2=80=99d ever be able to use non-GNU tools to treat them as = object files. Then again, we couldn=E2=80=99t expect to do the latter = portably anyway, since some of the platforms don=E2=80=99t even use ELF. >>=20 >>=20 >> Is there any significant advantage of using ELF, or could this just = use one of the standard binary serialization formats (protobuf, = flatbuffer, ...)?=20 >=20 > That=E2=80=99s an interesting idea. If one of the popular = serialization libraries is compatibly licensed, easy to use, and = performs well, it may be better than rolling our own. >=20 > I've tried this out (with flatbuffers), but I haven't seen significant = speed improvements. It might very well be the case that during loading = the reader is already fast enough (e.g. for ELC files it doesn't do any = decoding), and it's the evaluator that's too slow. What=E2=80=99s your test case, and how are you measuring the = performance? In my tests with the one big elc file, using the Linux =E2=80=9Cperf=E2=80= =9D tool, it seems that readchar, read1, encode_char, and ungetc are = where a good chunk of CPU time is still spent =E2=80=94 about 1/4 in my = testing with the =E2=80=9Cbig elc file=E2=80=9D code. My experiment in = May cut down a chunk of the overall run time (start in batch mode, print = a message, and exit) with some ugly reader syntax hacks. Tests with = smaller files may have different characteristics though=E2=80=A6 Ken= --Apple-Mail=_F3466F37-8A99-47FD-BFF7-FD78D05231F7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On Jul 2, = 2017, at 11:46, Philipp Stephani <p.stephani2@gmail.com> wrote:

Ken Raeburn <raeburn@raeburn.org>= schrieb am Mo., 29. Mai 2017 um 11:33 Uhr:

On May 28, 2017, at 08:43, = Philipp Stephani <p.stephani2@gmail.com> = wrote:


Ken Raeburn <raeburn@raeburn.org> schrieb am So., = 28. Mai 2017 um 13:07 Uhr:

On May 21, = 2017, at 04:53, Paul Eggert <eggert@cs.ucla.edu> wrote:

> Ken Raeburn wrote:
>> = The Guile project has taken this idea pretty far; they=E2=80=99re = generating ELF object files with a few special sections for Guile = objects, using the standard DWARF sections for debug information, = etc.  While it has a certain appeal (making C modules and Lisp = files look much more similar, maybe being able to link Lisp and C = together into one executable image, letting GDB understand some of your = data), switching to a machine-specific format would be a pretty drastic = change, when we can currently share the files across machines.
>
> Although it does indeed sound like a = big change, I don't see why it would prevent us from sharing the files = across machines. Emacs can use standard ELF and DWARF format on any = platform if Emacs is doing the loading. And there should be some = software-engineering benefit in using the same format that Guile = uses.

Sorry for the delay in responding.

The ELF format has header fields indicating = the word size, endianness, machine architecture (though there=E2=80=99s = a value for =E2=80=9Cnone=E2=80=9D), and OS ABI.  Some fields vary = in size or order depending on whether the 32-bit or 64-bit format is in = use.  Some other format details (e.g., relocation types, = interpretation of certain ranges of values in some fields) are = architecture- or OS-dependent; we might not care about many of those = details, but relocations are likely needed if we want to play linking = games or use DWARF.

I think Guile is using = whatever the native word size and architecture are.  If we do that = for Emacs, they=E2=80=99re not portable between platforms.  = Currently it works for me to put my Lisp files, both source and = compiled, into ~/elisp and use them from different kinds of machines if = my home directory is NFS-mounted.

We could = instead pick fixed values (say, architecture =E2=80=9Cnone=E2=80=9D, = little-endian, 32-bit), but then there=E2=80=99s no guarantee that we = could use any of the usual GNU tools on them without a bunch of work, or = that we=E2=80=99d ever be able to use non-GNU tools to treat them as = object files.  Then again, we couldn=E2=80=99t expect to do the = latter portably anyway, since some of the platforms don=E2=80=99t even = use ELF.


Is there any significant advantage of = using ELF, or could this just use one of the standard binary = serialization formats (protobuf, flatbuffer, = ...)? 

That=E2=80=99s an interesting idea.  If one of the = popular serialization libraries is compatibly licensed, easy to use, and = performs well, it may be better than rolling our = own.

I've tried this out (with flatbuffers), = but I haven't seen significant speed improvements. It might very well be = the case that during loading the reader is already fast enough (e.g. for = ELC files it doesn't do any decoding), and it's the evaluator that's too = slow.

What=E2=80=99s your test case, and how are you = measuring the performance?

In my = tests with the one big elc file, using the Linux =E2=80=9Cperf=E2=80=9D = tool, it seems that readchar, read1, encode_char, and ungetc are where a = good chunk of CPU time is still spent =E2=80=94 about 1/4 in my testing = with the =E2=80=9Cbig elc file=E2=80=9D code.  My experiment in May = cut down a chunk of the overall run time (start in batch mode, print a = message, and exit) with some ugly reader syntax hacks. Tests with = smaller files may have different characteristics though=E2=80=A6

Ken
= --Apple-Mail=_F3466F37-8A99-47FD-BFF7-FD78D05231F7--