From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Ken Raeburn Newsgroups: gmane.emacs.devel Subject: Re: compiled lisp file format (Re: Skipping unexec via a big .elc file) Date: Wed, 27 Sep 2017 04:31:09 -0400 Message-ID: <633E485D-B414-427B-8257-699EE53C84F4@raeburn.org> References: <8A8DA980-13A7-4F8B-9D07-391728C673C9@raeburn.org> <83inmq53xk.fsf@gnu.org> <96D35768-314C-43F5-BD5E-B12187759DCA@raeburn.org> <123104DD-447F-4CDB-B3A0-CED80E3AC8C9@raeburn.org> <20170403165736.GA2851@acm> <2497A2D5-FDB1-47FF-AED3-FD4ABE2FE144@raeburn.org> <83lgrhpalq.fsf@gnu.org> <0D99B4FE-FEEF-4565-87D6-E230A05DEF3C@raeburn.org> <86lgrc4vob.fsf@molnjunk.nocrew.org> <834ly0oew1.fsf@gnu.org> <968E8F50-92F6-43C7-B7E4-EE8378943087@raeburn.org> <83wpawmj4d.fsf@gnu.org> <1e397033-8291-1625-8b78-a1e1c200aea5@gmail.com> <18196f08-408d-8b17-423e-8be54507bb84@gmail.com> <8360hkkcgj.fsf@gnu.org> <26b35c16-33e7-0e08-9cc5-6f9b81e40968@cs.ucla.edu> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_01B585D4-7062-4B19-B3E3-6FBC84D5605C" X-Trace: blaine.gmane.org 1506501332 5757 195.159.176.226 (27 Sep 2017 08:35:32 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 27 Sep 2017 08:35:32 +0000 (UTC) Cc: Paul Eggert , Emacs developers To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Sep 27 10:35:25 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dx7oM-0000RK-3R for ged-emacs-devel@m.gmane.org; Wed, 27 Sep 2017 10:35:19 +0200 Original-Received: from localhost ([::1]:53406 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dx7oQ-0005ND-Ez for ged-emacs-devel@m.gmane.org; Wed, 27 Sep 2017 04:35:22 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42691) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dx7kT-0002vg-5E for emacs-devel@gnu.org; Wed, 27 Sep 2017 04:31:24 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dx7kP-0004AD-51 for emacs-devel@gnu.org; Wed, 27 Sep 2017 04:31:17 -0400 Original-Received: from mail-qt0-x233.google.com ([2607:f8b0:400d:c0d::233]:45014) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dx7kO-00048y-UF for emacs-devel@gnu.org; Wed, 27 Sep 2017 04:31:13 -0400 Original-Received: by mail-qt0-x233.google.com with SMTP id o13so12885264qtf.1 for ; Wed, 27 Sep 2017 01:31:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raeburn-org.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc:message-id:references :to; bh=RHjOkrxKOG8AU9FbxD6bEVtQ8X1gTFz7m1W5d9h/s50=; b=UHiQbQxivwm2DYfRNQYS7r8fsyOAZMKTKZEs9VQ9K0q15noV41A98W+pN6xU4d6zPL HmzCU+Y+hdYYbmfBGkvsf33Nk8FQcFvqoXEbM9DIEQt8qxWPcMyWJg/JxPjR1conOHhs BthrhQmfqd33L1RaqPWV1DFJpxdPTJ1/viyPOdvLV3UW6oD/YYfKYG6X3OYDFYDzGAIO qPnM7w37dAatmUNBbK0x/fts/Snh7bDHs9K+pZsBEAp0/tu1+QouWbO9n3BR0Ffphfvh 2Q+zY1H8VJuMakoxGNb3HtInGx2dYcgncnJ8QJKfkRn++twFn++gFCswl/fHfPoKxfL+ Ge2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=RHjOkrxKOG8AU9FbxD6bEVtQ8X1gTFz7m1W5d9h/s50=; b=hrNgpWkEtGxXrhr3NJTse2G+0G9HWo5Y6jmXKpl2HmzSO7lyLBaow5YucAUVNsUGGb L3n2ZSAXMtNY/fBmyiq0wC3VdI9YGsosT5PkJw0uaxKwuV5YhxmSwi57wBVf57IbTJsQ 8q12jDVy6c1+eeb51ZzCbz9xN/Bc5fgCQPtSAFvYVOZmmZ3OAQ6fYSojuUnRGQzDjpe8 7BC291DqvUXbfjc1NraJvutc5q8+SSa3nBP2Cro55CIkFIEjnuyeVVbcAwa9KK++fK3+ fd0IMTRuHGL+PArAE/Au9J2J2CRbFmDx+DlG10KHKHpO322Iw8q8tk5sSyxPK72ytjwm /zsQ== X-Gm-Message-State: AHPjjUj8sVkuayQi+xwNukwUZmJsqEnsoYnIH9bIsE5GxB/HZJAQn7UO sqeOdELy04hFZVx35nW2Y7kYLg== X-Google-Smtp-Source: AOwi7QBP3ifH65CN7PAr+3FxkQwPpMZ27HLT67n33QtqhULqhndh5lnZUNEMm2GtFjK4eaKc+APxwA== X-Received: by 10.200.35.204 with SMTP id r12mr763184qtr.95.1506501071075; Wed, 27 Sep 2017 01:31:11 -0700 (PDT) Original-Received: from [192.168.23.135] (c-73-253-167-23.hsd1.ma.comcast.net. [73.253.167.23]) by smtp.gmail.com with ESMTPSA id r22sm8127721qtj.94.2017.09.27.01.31.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Sep 2017 01:31:09 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3273) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400d:c0d::233 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:218819 Archived-At: --Apple-Mail=_01B585D4-7062-4B19-B3E3-6FBC84D5605C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On Sep 24, 2017, at 09:57, Philipp Stephani = wrote: > Ken Raeburn > schrieb = am Mo., 3. Juli 2017 um 03:44 Uhr: >=20 > On Jul 2, 2017, at 11:46, Philipp Stephani > wrote: >=20 >> Ken Raeburn > = schrieb am Mo., 29. Mai 2017 um 11:33 Uhr: >>=20 >> On May 28, 2017, at 08:43, Philipp Stephani > wrote: >>=20 >>>=20 >>>=20 >>> Ken Raeburn > = schrieb am So., 28. Mai 2017 um 13:07 Uhr: >>>=20 >>> On May 21, 2017, at 04:53, Paul Eggert > wrote: >>>=20 >>> > Ken Raeburn wrote: >>> >> The Guile project has taken this idea pretty far; they=E2=80=99re = generating ELF object files with a few special sections for Guile = objects, using the standard DWARF sections for debug information, etc. = While it has a certain appeal (making C modules and Lisp files look much = more similar, maybe being able to link Lisp and C together into one = executable image, letting GDB understand some of your data), switching = to a machine-specific format would be a pretty drastic change, when we = can currently share the files across machines. >>> > >>> > Although it does indeed sound like a big change, I don't see why = it would prevent us from sharing the files across machines. Emacs can = use standard ELF and DWARF format on any platform if Emacs is doing the = loading. And there should be some software-engineering benefit in using = the same format that Guile uses. >>>=20 >>> Sorry for the delay in responding. >>>=20 >>> The ELF format has header fields indicating the word size, = endianness, machine architecture (though there=E2=80=99s a value for = =E2=80=9Cnone=E2=80=9D), and OS ABI. Some fields vary in size or order = depending on whether the 32-bit or 64-bit format is in use. Some other = format details (e.g., relocation types, interpretation of certain ranges = of values in some fields) are architecture- or OS-dependent; we might = not care about many of those details, but relocations are likely needed = if we want to play linking games or use DWARF. >>>=20 >>> I think Guile is using whatever the native word size and = architecture are. If we do that for Emacs, they=E2=80=99re not portable = between platforms. Currently it works for me to put my Lisp files, both = source and compiled, into ~/elisp and use them from different kinds of = machines if my home directory is NFS-mounted. >>>=20 >>> We could instead pick fixed values (say, architecture =E2=80=9Cnone=E2= =80=9D, little-endian, 32-bit), but then there=E2=80=99s no guarantee = that we could use any of the usual GNU tools on them without a bunch of = work, or that we=E2=80=99d ever be able to use non-GNU tools to treat = them as object files. Then again, we couldn=E2=80=99t expect to do the = latter portably anyway, since some of the platforms don=E2=80=99t even = use ELF. >>>=20 >>>=20 >>> Is there any significant advantage of using ELF, or could this just = use one of the standard binary serialization formats (protobuf, = flatbuffer, ...)?=20 >>=20 >> That=E2=80=99s an interesting idea. If one of the popular = serialization libraries is compatibly licensed, easy to use, and = performs well, it may be better than rolling our own. >>=20 >> I've tried this out (with flatbuffers), but I haven't seen = significant speed improvements. It might very well be the case that = during loading the reader is already fast enough (e.g. for ELC files it = doesn't do any decoding), and it's the evaluator that's too slow. >=20 > What=E2=80=99s your test case, and how are you measuring the = performance? >=20 > IIRC I've repeatedly loaded one of the biggest .elc files shipped with = Emacs and measured the total loading time. I haven't done any detailed = profiling, since I was hoping for a significant speed increase that = would justify the work. It=E2=80=99ll depend on what the code in that file is doing. In the raeburn-startup branch, the last bit of profiling I did =E2=80=94 = you can see a graph at http://www.mit.edu/~raeburn/emacs.svg = and if you haven=E2=80=99t read = up on flame graphs (http://www.brendangregg.com/flamegraphs.html = ), they provide a nice = visualization of the CPU time consumption broken down by what the = current call stack looks like =E2=80=94 showed nearly 1/3 of the CPU = time of a simple run of Emacs in batch mode was spent reading and = parsing the saved Lisp environment. Most of the rest of the CPU time = was spent executing the loaded code (lots of fset and setplist calls), = but the biggest chunk of that was executing a nested load of = international/characters.elc; during that nested load, most of the time = was spent in execution (mostly char table processing) and very little in = parsing. So=E2=80=A6 for the saved Lisp environment file, excluding the nested = load, reading and parsing is about 2/3 of the CPU time used; for = characters.elc, reading and parsing is a minuscule portion of the CPU = time. Loading a Lisp file internally uses the Lisp =E2=80=9Cread=E2=80=9D = routine, which requires an input stream of character values (not byte = values) to be supplied; we examine the stream object and dispatch to = various bits of code depending on its type (buffer, marker, function, = certain special symbols), *for each character*. Each byte is examined = to see if it=E2=80=99s part of a multibyte character. Each character is = considered to see if it=E2=80=99s allowed to be part of a symbol name or = string or whatever we=E2=80=99re in the middle of parsing, or if it=E2=80=99= s a backslash quoting some other character, etc. Hence my hopes for a non-text-based format, designed to streamline = reading data from files, where we can do things like specify a vector = length or string length up front instead of having to consider each = character and process character quoting sequences, stuff like that. = E.g., here=E2=80=99s a unibyte string of 47 bytes, so just copy the = bytes without considering every one separately. No human-readable = printed form, no escape sequences needed. Another help might be finding a faster way to load the character data. = I=E2=80=99ve got the branch loading characters.elc at startup because = saving and parsing the generated tables was even slower than evaluating = the Lisp code to generate them. Perhaps we can do some processing of = them during the build and convert them into some other form that lets us = start up faster. > If people are generally interested in pursuing this further, I'd be = happy to put my code into a scratch branch. I=E2=80=99d be curious to take a look=E2=80=A6 Ken= --Apple-Mail=_01B585D4-7062-4B19-B3E3-6FBC84D5605C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On Sep = 24, 2017, at 09:57, Philipp Stephani <p.stephani2@gmail.com> wrote:
Ken Raeburn <raeburn@raeburn.org>= schrieb am Mo., 3. Juli 2017 um 03:44 Uhr:

On Jul 2, 2017, at 11:46, = Philipp Stephani <p.stephani2@gmail.com> = wrote:

Ken = Raeburn <raeburn@raeburn.org> schrieb am Mo., 29. Mai 2017 um = 11:33 Uhr:

On May 28, 2017, at 08:43, Philipp Stephani <p.stephani2@gmail.com> wrote:



Ken Raeburn <raeburn@raeburn.org> schrieb am So., 28. Mai 2017 um = 13:07 Uhr:

On May 21, 2017, at 04:53, = Paul Eggert <eggert@cs.ucla.edu> wrote:

> Ken Raeburn wrote:
>> The Guile = project has taken this idea pretty far; they=E2=80=99re generating ELF = object files with a few special sections for Guile objects, using the = standard DWARF sections for debug information, etc.  While it has a = certain appeal (making C modules and Lisp files look much more similar, = maybe being able to link Lisp and C together into one executable image, = letting GDB understand some of your data), switching to a = machine-specific format would be a pretty drastic change, when we can = currently share the files across machines.
>
> Although it does indeed sound like a big change, I don't = see why it would prevent us from sharing the files across machines. = Emacs can use standard ELF and DWARF format on any platform if Emacs is = doing the loading. And there should be some software-engineering benefit = in using the same format that Guile uses.

Sorry for the delay in responding.

The ELF format has header fields indicating the word size, = endianness, machine architecture (though there=E2=80=99s a value for = =E2=80=9Cnone=E2=80=9D), and OS ABI.  Some fields vary in size or = order depending on whether the 32-bit or 64-bit format is in use.  = Some other format details (e.g., relocation types, interpretation of = certain ranges of values in some fields) are architecture- or = OS-dependent; we might not care about many of those details, but = relocations are likely needed if we want to play linking games or use = DWARF.

I think Guile is using whatever the = native word size and architecture are.  If we do that for Emacs, = they=E2=80=99re not portable between platforms.  Currently it works = for me to put my Lisp files, both source and compiled, into ~/elisp and = use them from different kinds of machines if my home directory is = NFS-mounted.

We could instead pick fixed = values (say, architecture =E2=80=9Cnone=E2=80=9D, little-endian, = 32-bit), but then there=E2=80=99s no guarantee that we could use any of = the usual GNU tools on them without a bunch of work, or that we=E2=80=99d = ever be able to use non-GNU tools to treat them as object files.  = Then again, we couldn=E2=80=99t expect to do the latter portably anyway, = since some of the platforms don=E2=80=99t even use ELF.


Is there any significant advantage of using ELF, or could = this just use one of the standard binary serialization formats = (protobuf, flatbuffer, = ...)? 

That=E2=80=99s an interesting idea.  If one of the = popular serialization libraries is compatibly licensed, easy to use, and = performs well, it may be better = than rolling our own.

I've tried this out (with flatbuffers), = but I haven't seen significant speed improvements. It might very well be = the case that during loading the reader is already fast enough (e.g. for = ELC files it doesn't do any decoding), and it's the evaluator that's too = slow.

What=E2=80=99s = your test case, and how are you measuring the = performance?

IIRC I've repeatedly loaded one of the = biggest .elc files shipped with Emacs and measured the total loading = time. I haven't done any detailed profiling, since I was hoping for a = significant speed increase that would justify the = work.

It=E2=80=99ll depend on what the code in that file = is doing.

In the raeburn-startup = branch, the last bit of profiling I did =E2=80=94 you can see a graph at = http://www.mit.edu/~raeburn/emacs.svg and if you = haven=E2=80=99t read up on flame graphs (http://www.brendangregg.com/flamegraphs.html), they = provide a nice visualization of the CPU time consumption broken down by = what the current call stack looks like =E2=80=94 showed nearly 1/3 = of the CPU time of a simple run of Emacs in batch mode was spent reading = and parsing the saved Lisp environment.  Most of the rest of the = CPU time was spent executing the loaded code (lots of fset and setplist = calls), but the biggest chunk of that was executing a nested load of = international/characters.elc; during that nested load, most of the time = was spent in execution (mostly char table processing) and very little in = parsing.

So=E2=80=A6 for the saved = Lisp environment file, excluding the nested load, reading and parsing is = about 2/3 of the CPU time used; for characters.elc, reading and parsing = is a minuscule portion of the CPU time.

Loading a Lisp file internally uses the Lisp = =E2=80=9Cread=E2=80=9D routine, which requires an input stream of = character values (not byte values) to be supplied; we examine the stream = object and dispatch to various bits of code depending on its type = (buffer, marker, function, certain special symbols), *for each = character*.  Each byte is examined to see if it=E2=80=99s part of a = multibyte character.  Each character is considered to see if it=E2=80= =99s allowed to be part of a symbol name or string or whatever we=E2=80=99= re in the middle of parsing, or if it=E2=80=99s a backslash quoting some = other character, etc.

Hence my hopes = for a non-text-based format, designed to streamline reading data from = files, where we can do things like specify a vector length or string = length up front instead of having to consider each character and process = character quoting sequences, stuff like that.  E.g., here=E2=80=99s = a unibyte string of 47 bytes, so just copy the bytes without considering = every one separately.  No human-readable printed form, no escape = sequences needed.

Another help might = be finding a faster way to load the character data.  I=E2=80=99ve = got the branch loading characters.elc at startup because saving and = parsing the generated tables was even slower than evaluating the Lisp = code to generate them.  Perhaps we can do some processing of them = during the build and convert them into some other form that lets us = start up faster.

If people are generally interested = in pursuing this further, I'd be happy to put my code into a scratch = branch.

I=E2=80=99d be = curious to take a look=E2=80=A6

Ken
= --Apple-Mail=_01B585D4-7062-4B19-B3E3-6FBC84D5605C--