unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Lynn Winebarger <owinebar@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel <emacs-devel@gnu.org>
Subject: Re: native compilation units
Date: Sun, 26 Jun 2022 10:14:52 -0400	[thread overview]
Message-ID: <CAM=F=bAhtN98hrLgfD3z1DjJKncBDHqqUOdZCi763aLQASo2ug@mail.gmail.com> (raw)
In-Reply-To: <CAM=F=bDtmx18oDwgRjipU=Q+0og+Tnd5Rf1DdnQzMm7Fqt7wHw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8479 bytes --]

On Sat, Jun 25, 2022, 2:12 PM Lynn Winebarger <owinebar@gmail.com> wrote:

> The part that's incompatible with current semantics of symbols is
>> importing that symbol as
>> an immutable symbolic reference.  Not really a "variable" reference, but
>> as a binding
>> of a symbol to a value in the run-time namespace (or package in CL
>> terminology, although
>> CL did not allow any way to specify what I'm suggesting either, as far as
>> I know).
>>
>> However, that would capture the semantics of ELF shared objects with the
>> text and ro_data
>> segments loaded into memory that is in fact immutable for a userspace
>> program.
>>
>
> It looks to me like the portable dump code/format could be adapted to
> serve the purpose I have in mind here.  What needs to be added is a way to
> limit the scope of the dump so only the appropriate set of objects are
> captured.
>

I'm going to start with a copy of pdumper.c and pdumper.h renamed to
ndumper (n for namespace).  The pdmp format conceptually organizes the
emacs executable space into a graph with three nodes - an "Emacs
executable" node (or the temacs text and ro sections),  "Emacs static"
(sections of the executable loaded into writeable memory), and a "dump"
node, corresponding to heap-allocated objects that were live at the time of
the dump.  The dump node has relocations that can point into itself or to
the emacs executable, and "discardable" relocations for values instantiated
into the "Emacs static".  While the data structure doesn't require it, the
only values saved from the Emacs static data are symbols, primitive subrs
(not native compiled), and the thread structure for the main thread.

There can be cycles between these nodes in the memory graph, but cutting
the edge[s] between the emacs executable and the Emacs static nodes yields
a DAG.
Note, pdumper does not make the partition I'm describing explicitly.  I'm
inferring that there must be such a partition.  The discardable relocations
should be ones that instantiate into static data of the temacs executable.

My plan is to refine the structure of the Emacs process introduced by
pdumper to yield a namespace graph structure with the same property -
cutting the edge from executable to runtime state yields a DAG whose only
root is the emacs executable.

Each ndmp namespace (or module or cl-package) would have its own symbol
table and a unique namespace identifier, with a runtime mapping to the file
backing it (if loaded from a file).

Interned symbols will be extended with three additional properties: static
value, constant value and constant function.  For variables, scope
resolution will be done at compile time:
* Value if not void (undefined), else
* Static value
A constant symbol is referenced by importing a constant symbol, either from
another namespace or a variable in the current namespace's compile-time
environment.  The attempt at run-time to rebind a symbol bound by an import
form will signal an error.  Multiple imports binding a particular symbol at
run-time will effectively cause the shadowing of an earlier binding by the
later binding.  Any sequence of imports and other forms that would result
in the ambiguity of the resolution of a particular variable at compile time
will signal an error.  That is, a given symbol will have only one
associated binding in the namespace scope during a particular evaluation
time (eval, compile, compile-compile, etc)

A static value binding will be global but not dynamic.  A constant value
binding will result from an export form in an eval-when-compile form
encountered while compiling the source of the ndmp module.  Since static
bindings capture the "global" aspect of the current semantics of special
variable bindings, dynamic scope can be safely restricted to provide
thread-local semantics.  Instantiation of a compiled ndmp object will
initialize the bindings to be consistent with the current semantics of
defvar and setq  in global scope, as well as the separation of compile-time
and eval-time variable bindings.  [I am not certain what the exact approach
will be to ensure that will be].  Note constant bindings are only created
by "importing" from the compile-time environment through eval-when-compile
under the current semantics model.  This approach simply avoids the beta
substitution of compile-time variable references performed in the current
implementation of eval-when-compile semantics.  Macro expansion is still
available to insert such values directly in forms from the compile-time
environment.

A function symbol will resolve to the function property if not void, and
the constant function property otherwise.

Each ndmp module will explicitly identify the symbols it exports, and those
it imports.  The storage of variable bindings for unexported symbols will
not be directly referenceable from any other namespace.  Constant bindings
may be enforced by loading into a read-only page of memory, a write barrier
implemented by the system, or unenforced. In other words, attempting to set
a constant binding is an error with unspecified effect.  Additional
declarations may be provided to require the signaling of an error, the
enforcement of constancy (without an error), both, or neither.  The storage
of static and constant variables may or may not be incorporated directly in
the symbol object.  For example, such storage may be allocated using
separate hash tables for static and constant symbol tables to reduce the
allocation of space for variables without a static or constant binding.

When compiling a form that imports a symbol from an ndmp module, importing
in an eval-when-compile context will resolve to the constant value binding
of the symbol, as though the source forms were concatenated during
compilation to have a single compile time environment. Otherwise, the
resolution will proceed as described above.

There will be a distinguished ndmp object that contains relocations
instantiated into the Emacs static nodes, serving the baseline function of
pdmp.  There will also be a distinguished ndmp object "ELISP" that exports
all the primitives of Emacs lisp.  The symbols of this namespace will be
implicitly imported into every ndmp unless overridden by a special form to
be specified.  In this way, a namespace may use an alternative lisp
semantic model, eg CL.  Additonal forms for importing symbols from other
namespaces remain to be specified.

Ideally the byte code vm would be able to treat an ndmp object as an
extended byte code vector, but the restriction of the byte-codes to 16-bit
addressing is problematic.
For 64-bit machines, the ndmp format will restrict the (stored) addresses
to 32 bits, and use the remaining bits of relocs not already used for
administrative purposes as an index into a vector of imported namespaces in
the ndmp file itself, where the 0 value corresponds to an "un-interned"
namespace that is not backed by a (permanent) file.  I don't know what the
split should be in 32-bit systems (without the wide-int option).  The
interpretation of the bits is specific to file-backed compiled namespaces,
so it may restrict the number of namespace imports in a compiled object
without restricting the number of namespaces imported in the runtime
namespace.

Once implemented, this functionality should significantly reduce the need
for a monolithic dump  or "redumping" functionality.  Or rather, "dumping"
will be done incrementally.

My ultimate goal is to introduce a clean way to express a compiled object
that has multiple code labels, and a mechanism to call or jump to them
directly, so that the expressible control-flow structure of native and byte
compiled code will be equivalent (I believe the technical term is that
there will be a bisimulation between their operational semantics, but it's
been a while).  An initial version might move in this direction by encoding
the namespaces using a byte-code vector to trampoline
to the code-entry points, but this would not provide a bisimulation.
Eventually, the byte-code VM and compiler will have to be modified to make
full use of ndmp objects as primary semantic objects without intermediation
through byte-code vectors as currently implemented.

If there's an error in my interpretation of current implementation
(particular pdumper), I'd be happy to find out about it now.

As a practical matter, I've been working with the 28.1 source.  Am I better
off continuing with that, or starting from a more recent commit to the main
branch?

Lynn

[-- Attachment #2: Type: text/html, Size: 10355 bytes --]

  reply	other threads:[~2022-06-26 14:14 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-31  1:02 native compilation units Lynn Winebarger
2022-06-01 13:50 ` Andrea Corallo
2022-06-03 14:17   ` Lynn Winebarger
2022-06-03 16:05     ` Eli Zaretskii
     [not found]       ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com>
2022-06-04  5:57         ` Eli Zaretskii
2022-06-05 13:53           ` Lynn Winebarger
2022-06-03 18:15     ` Stefan Monnier
2022-06-04  2:43       ` Lynn Winebarger
2022-06-04 14:32         ` Stefan Monnier
2022-06-05 12:16           ` Lynn Winebarger
2022-06-05 14:08             ` Lynn Winebarger
2022-06-05 14:46               ` Stefan Monnier
2022-06-05 14:20             ` Stefan Monnier
2022-06-06  4:12               ` Lynn Winebarger
2022-06-06  6:12                 ` Stefan Monnier
2022-06-06 10:39                   ` Eli Zaretskii
2022-06-06 16:23                     ` Lynn Winebarger
2022-06-06 16:58                       ` Eli Zaretskii
2022-06-07  2:14                         ` Lynn Winebarger
2022-06-07 10:53                           ` Eli Zaretskii
2022-06-06 16:13                   ` Lynn Winebarger
2022-06-07  2:39                     ` Lynn Winebarger
2022-06-07 11:50                       ` Stefan Monnier
2022-06-07 13:11                         ` Eli Zaretskii
2022-06-14  4:19               ` Lynn Winebarger
2022-06-14 12:23                 ` Stefan Monnier
2022-06-14 14:55                   ` Lynn Winebarger
2022-06-08  6:56           ` Andrea Corallo
2022-06-11 16:13             ` Lynn Winebarger
2022-06-11 16:37               ` Stefan Monnier
2022-06-11 17:49                 ` Lynn Winebarger
2022-06-11 20:34                   ` Stefan Monnier
2022-06-12 17:38                     ` Lynn Winebarger
2022-06-12 18:47                       ` Stefan Monnier
2022-06-13 16:33                         ` Lynn Winebarger
2022-06-13 17:15                           ` Stefan Monnier
2022-06-15  3:03                             ` Lynn Winebarger
2022-06-15 12:23                               ` Stefan Monnier
2022-06-19 17:52                                 ` Lynn Winebarger
2022-06-19 23:02                                   ` Stefan Monnier
2022-06-20  1:39                                     ` Lynn Winebarger
2022-06-20 12:14                                       ` Lynn Winebarger
2022-06-20 12:34                                       ` Lynn Winebarger
2022-06-25 18:12                                       ` Lynn Winebarger
2022-06-26 14:14                                         ` Lynn Winebarger [this message]
2022-06-08  6:46         ` Andrea Corallo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM=F=bAhtN98hrLgfD3z1DjJKncBDHqqUOdZCi763aLQASo2ug@mail.gmail.com' \
    --to=owinebar@gmail.com \
    --cc=akrl@sdf.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).