unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Lynn Winebarger <owinebar@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org
Subject: Re: native compilation units
Date: Sun, 12 Jun 2022 13:38:40 -0400	[thread overview]
Message-ID: <CAM=F=bD_iXDWpFi_RRBHWtGLyWV8xVhNgQx=AyVUzM2QOtVkwQ@mail.gmail.com> (raw)
In-Reply-To: <jwvedzvc59m.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 14046 bytes --]

On Sat, Jun 11, 2022 at 4:34 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> >> In which sense would it be different from:
> >>
> >>     (cl-flet
> >>         ...
> >>       (defun ...)
> >>       (defun ...)
> >>       ...)
> >>
> >>
> > Good point - it's my scheme background confusing me.  I was thinking
> defun
> > would operate with similar scoping rules as defvar and establish a local
> > binding, where fset (like setq) would not create any new bindings.
>
> I was not talking about performance but about semantics (under the
> assumption that if the semantics is the same then it should be possible
> to get the same performance somehow).
>

I'm trying to determine if there's a set of expressions for which it is
semantically sound
to perform the intraprocedural optimizations by -O3 - that is, where it is
correct to
treat functions in operator position as constants rather than a reference
through a
symbol's function cell.


>
> > (1) I don't know how much performance difference (if any) there is
> between
> >      (fsetq exported-fxn #'internal-implementation)
> > and
> >      (defun exported-fxn (x y ...) (internal-implementation x y ...))
>
> If you don't want the indirection, then use `defalias` (which is like
> `fset` but registers the action as one that *defines* the function, for
> the purpose of `C-h f` and the likes, and they also have slightly
> different semantics w.r.t advice).
>
> What I'm looking for is for a function as a first class value, whether as
a byte-code vector,
a symbolic reference to a position in the .text section (or equivalent) of
a shared object that may or may
not have been loaded, or a pointer to a region that is allowed to be
executed.


> > (2) I'm also thinking of more aggressively forcing const-ness at run-time
> > with something like:
> > (eval-when-compile
> >   (cl-flet ((internal-implemenation (x y ...) body ...))
> >      (fset exported-fxn #'internal-implementation)))
> > (fset exported-fxn (eval-when-compile #'exported-fxn))
> >
> > If that makes sense, is there a way to do the same thing with defun?
>
> I don't know what the above code snippet is intended to show/do, sorry :-(
>

I'm trying to capture a function as a first class value.
Better example - I put the following in ~/test1.el and byte compiled it
(with emacs 28.1 running on cygwin).

-------------
(require 'cl-lib)

(eval-when-compile
  (cl-labels ((my-evenp (n) (if (= n 0) t (my-oddp (1- n))))
              (my-oddp (n) (if (= n 0) nil (my-evenp (1- n)))))
    (defun my-global-evenp (n) (my-evenp n))
    (defun my-global-oddp (n) (my-oddp n))))
-----------------

I get the following (expected) error when running in batch (or
interactively, if only loading the compiled file)

$ emacs -batch --eval '(load "~/test1.elc")' --eval '(message "%s"
(my-global-evenp 5))'
Loading ~/test1.elc...
Debugger entered--Lisp error: (void-function my-global-evenp)
  (my-global-evenp 5)
  (message "%s" (my-global-evenp 5))
  eval((message "%s" (my-global-evenp 5)) t)
  command-line-1(("--eval" "(load \"~/test1.elc\")" "--eval" "(message
\"%s\" (my-global-evenp 5))"))
  command-line()
  normal-top-level()

The function symbol is only defined at compile time by the defun, so it is
undefined when the byte-compiled file is loaded in a clean environment.
When I tried using (fset 'my-global-evenp (eval-when-compile
#'my-ct-global-evenp) it just produced a symbol indirection, which was
disappointing.
So here there are global compile time variables being assigned trampolines
to the local functions at compile time as values.

-------------------------------
(require 'cl-lib)
(eval-when-compile
  (defvar my-ct-global-evenp nil)
  (defvar my-ct-global-oddp nil)
  (cl-labels ((my-evenp (n) (if (= n 0) t (my-oddp (1- n))))
              (my-oddp (n) (if (= n 0) nil (my-evenp (1- n)))))
    (setq my-ct-global-evenp (lambda (n) (my-evenp n)))
    (setq my-ct-global-oddp (lambda (n) (my-oddp n)))))
(fset 'my-global-evenp (eval-when-compile my-ct-global-evenp))
(fset 'my-global-oddp (eval-when-compile my-ct-global-oddp))
-------------------------------

Then I get

$ emacs -batch --eval '(load "~/test2.elc")' --eval '(message "%s"
(my-global-evenp 5))'
Loading ~/test2.elc...
Debugger entered--Lisp error: (void-variable --cl-my-evenp--)
  my-global-evenp(5)
  (message "%s" (my-global-evenp 5))
  eval((message "%s" (my-global-evenp 5)) t)
  command-line-1(("--eval" "(load \"~/test2.elc\")" "--eval" "(message
\"%s\" (my-global-evenp 5))"))
  command-line()
  normal-top-level()

This I did not expect.  Maybe the variable name is just an artifact of the
way cl-labels is implemented and not a fundamental limitation.
Third attempt to express a statically allocated closure with constant code
(which is one way of viewing an ELF shared object):
--------------------------------
(require 'cl-lib)
(eval-when-compile
  (defvar my-ct-global-evenp nil)
  (defvar my-ct-global-oddp nil)
  (let (my-evenp my-oddp)
    (setq my-evenp (lambda (n) (if (= n 0) t (funcall my-oddp (1- n)))))
    (setq my-oddp (lambda (n) (if (= n 0) nil (funcall my-evenp (1- n)))))
    (setq my-ct-global-evenp (lambda (n) (funcall my-evenp n)))
    (setq my-ct-global-oddp (lambda (n) (funcall my-oddp n)))))

(fset 'my-global-evenp (eval-when-compile my-ct-global-evenp))
(fset 'my-global-oddp (eval-when-compile my-ct-global-oddp))
--------------------------------
And the result is worse:

$ emacs -batch --eval '(load "~/test3.elc")' --eval '(message "%s"
(my-global-evenp 5))'
Loading ~/test3.elc...
Debugger entered--Lisp error: (void-variable my-evenp)
  my-global-evenp(5)
  (message "%s" (my-global-evenp 5))
  eval((message "%s" (my-global-evenp 5)) t)
  command-line-1(("--eval" "(load \"~/test3.elc\")" "--eval" "(message
\"%s\" (my-global-evenp 5))"))
  command-line()
  normal-top-level()

This was not expected with lexical scope.

$ emacs -batch --eval '(load "~/test3.elc")' --eval "(message \"%s\"
(symbol-function 'my-global-evenp))"
Loading ~/test3.elc...
#[(n)   !\207 [my-evenp n] 2]

At least my-global-evenp has byte-code as a value, not a symbol, which was
the intent.  I get the same result if I wrap the two lambdas
stored in the my-ct-* variables with "byte-compile", which is what I
intended (for the original to be equivalent to explicitly compiling the
form).

However, what I expected would have been the byte-code equivalent of an ELF
object with 2 symbols defined for relocation.
So why is the compiler producing code that would correspond to the "let"
binding my-evenp and my-oddp being dynamically scoped?
That made me curious, so I found https://rocky.github.io/elisp-bytecode.pdf
and reviewed it.
I believe I see the issue now.  With the current byte-codes, there's just
no way to express a call to an offset in the current byte-vector.
There's not even a way to reference the address of the current byte vector
to use as an argument to funcall.  There's no way to reference
symbols that were resolved at compile-time at all, which would require the
equivalent of dl symbols embedded in a code vector
that would be patched at load time.  That forces the compiler to emit a
call to a symbol.  And when the manual talks about lexical scope,
it's only for "variables" not function symbols.
That explains a lot.  The reason Andrea had to use LAP as the starting
point for optimizations, for example.  I can't find a spec for
Emacs's version of LAP, but I'm guessing it can still express symbolic
names for local function expressions in a way byte-code
simply cannot.
I don't see how the language progresses without resolving the
inconsistency between what's expressible in ELF and what's expressible
in a byte-code object.
One possible set of changes to make the two compatible - and I'd use the
relative goto byte codes if they haven't been produced by emacs since v19.
I'd also add a few special registers.  There's already one used to enable
GOTO (i.e. the program counter)

   - byte codes for call/returns directly into/from byte code objects
      - CALL-RELATIVE - execute a function call to the current byte-vector
      object with the pc set to the pc+operand0 - basically PIC code
       If a return is required, the byte compiler should arrange for the
      return address to be pushed before other operands to the function being
      called
      No additional manipulation of the stack is required, since funcall
      would just pop the arguments and then immediately push them again.
      Alternatively, you could have a byte-code that explicitly allocates a
      stack frame (if needed), push the return offset, then goto
      - CALL-ABSOLUTE - execute a function call to a specified byte-vector
      object +  pc as the first 2 operands,  This is useless until the
byte-code
      object
      supports a notional of relocation symbols, i.e. named compile-time
      constants that get patched on load in one way or another, e.g. directly by
      modifying the byte-string with the value at run-time (assuming eager
      loading), or indirectly by adding a "linkage table" of external symbols
      that will be filled in at load and specifying an index into that
      table.
      - RETURN-RELATIVE - operand is the number of items that have to be
      popped from the stack to get the return address, which is an
offset in the
      current
      byte-vector object. Alternatively, could be implemented as "discardN
      <n>; goto"
      - RETURN-ABSOLUTE - same as return-relative, but the return address
      is given by two operands, a byte-vector and offset in the byte-vector
      - Alternate formulation
      - RESERVE-STACK operand is a byte-vector object (reference) that will
      be used to determine how much total stack space will be required for
      safety, and
      ensure enough space is allocated.
      - GOTO-ABSOLUTE - operand is a byte-vector object and an offset.
      Immediate control transfer to the specified context
      - These two are adequate to implement the above
   - Additional registers and related instructions
      - PC - register already exists
         - PUSH-PC - the opposite of goto, which pops the stack into the PC
         register.
      - GOT - a table of byte-vectors + offsets corresponding to a PLT
      section of the byte-vector specifying the compile-time symbols
that have to
      be resolved
         - The byte-vector references + offset in the "absolute"
         instructions above would be specified as an index into this table.
         Otherwise the byte-vector could
         not be saved and directly loaded for later execution.
      - STATIC - a table for the lexical variables allocated and accessible
      to the closures at compile-time.  Compiler should treat all sexp as
      occuring at the
      top-level with regard to the run-time lexical environment.  A form
      like (let ((x 5)) (byte-compile (lambda (n) (+ n
(eval-when-compile x)))))
      should produce
      byte-code with the constant 5, while (let ((x 5)) (byte-compile
      (lambda (n) (+ n x)))) should produce byte code adding the argument n to
      the value of the
      global variable x at run-time
         - PUSH-STATIC
         - POP-STATIC
      - ENV - the environment register.
         - ENV-PUSH-FRAME - operand is number of stack items to capture as
         a (freshly allocated) frame, which is then added as a rib to a new
                                             environment pointed to by the
         ENV register
         - PUSH-ENV - push the value of ENV onto the stack
         - POP-ENV - pop the top of the stack into ENV, discarding any
         value there
      - Changes to byte-code object
      - IMPORTS table of symbols defined at compile-time requiring
      resolution to constants at load-time, particularly for references to
      compilation units
      (byte-vector or native code) and exported symbols bound to constants
      (really immutable)
      Note - the "relative" versions of call and return above could be
      eliminated if "IMPORTS" includes self-references into the byte-vector
      object itself
      - EXPORTS table of symbols available to be called or referenced
      externally
      - Static table with values initialized from the values in the closure
      at compile-time
      - Constant table and byte string remain
   - Changes to byte-code loader
      - Read the new format
      - Resolve symbols - should link to specific compilation units rather
      than "features", as compilation units will define specific exported
      symbols, while
      features do not support that detail.  Source could still use
      "require", but the symbols referenced from the compile-time environment
      would have
      to be traced back to the compilation unit supplying them (unless they
      are recorded as constants by an expression like
      (eval-when-compile (setq v (eval-when-compile some-imported-symbol)))
      - Allocate and initialize the static segment
      - Create a "static closure" for the compilation unit = loaded
      object + GOT + static frame - record as singleton entry mapping
compilation
      units to closures (hence "static")
   - Changes to funcall
      - invoking a function from a compilation unit would require setting
      the GOT, STATIC and setting the ENV register to point to STATIC as the
      first rib (directly or indirectly)
      - invoking a closure with a "code" element pointing to an "exported"
      symbol from a compilation unit + an environment pointer
         - Set GOT and STATIC according to the byte-vector's static closure
      - Dispatch according to whether compilation unit is native or
      byte-compiled, but both have the above elements
   - Changes to byte-compiler
      - Correct the issues with compile-time evaluation + lexical scope of
      function names above
      - Emit additional sections in byte-code
      - Should be able to implement the output of native-compiler pass
      (pre-libgccjit) with "-O3" flags in byte-code correctly


Lynn

[-- Attachment #2: Type: text/html, Size: 19242 bytes --]

  reply	other threads:[~2022-06-12 17:38 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-31  1:02 native compilation units Lynn Winebarger
2022-06-01 13:50 ` Andrea Corallo
2022-06-03 14:17   ` Lynn Winebarger
2022-06-03 16:05     ` Eli Zaretskii
     [not found]       ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com>
2022-06-04  5:57         ` Eli Zaretskii
2022-06-05 13:53           ` Lynn Winebarger
2022-06-03 18:15     ` Stefan Monnier
2022-06-04  2:43       ` Lynn Winebarger
2022-06-04 14:32         ` Stefan Monnier
2022-06-05 12:16           ` Lynn Winebarger
2022-06-05 14:08             ` Lynn Winebarger
2022-06-05 14:46               ` Stefan Monnier
2022-06-05 14:20             ` Stefan Monnier
2022-06-06  4:12               ` Lynn Winebarger
2022-06-06  6:12                 ` Stefan Monnier
2022-06-06 10:39                   ` Eli Zaretskii
2022-06-06 16:23                     ` Lynn Winebarger
2022-06-06 16:58                       ` Eli Zaretskii
2022-06-07  2:14                         ` Lynn Winebarger
2022-06-07 10:53                           ` Eli Zaretskii
2022-06-06 16:13                   ` Lynn Winebarger
2022-06-07  2:39                     ` Lynn Winebarger
2022-06-07 11:50                       ` Stefan Monnier
2022-06-07 13:11                         ` Eli Zaretskii
2022-06-14  4:19               ` Lynn Winebarger
2022-06-14 12:23                 ` Stefan Monnier
2022-06-14 14:55                   ` Lynn Winebarger
2022-06-08  6:56           ` Andrea Corallo
2022-06-11 16:13             ` Lynn Winebarger
2022-06-11 16:37               ` Stefan Monnier
2022-06-11 17:49                 ` Lynn Winebarger
2022-06-11 20:34                   ` Stefan Monnier
2022-06-12 17:38                     ` Lynn Winebarger [this message]
2022-06-12 18:47                       ` Stefan Monnier
2022-06-13 16:33                         ` Lynn Winebarger
2022-06-13 17:15                           ` Stefan Monnier
2022-06-15  3:03                             ` Lynn Winebarger
2022-06-15 12:23                               ` Stefan Monnier
2022-06-19 17:52                                 ` Lynn Winebarger
2022-06-19 23:02                                   ` Stefan Monnier
2022-06-20  1:39                                     ` Lynn Winebarger
2022-06-20 12:14                                       ` Lynn Winebarger
2022-06-20 12:34                                       ` Lynn Winebarger
2022-06-25 18:12                                       ` Lynn Winebarger
2022-06-26 14:14                                         ` Lynn Winebarger
2022-06-08  6:46         ` Andrea Corallo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM=F=bD_iXDWpFi_RRBHWtGLyWV8xVhNgQx=AyVUzM2QOtVkwQ@mail.gmail.com' \
    --to=owinebar@gmail.com \
    --cc=akrl@sdf.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).