all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Lynn Winebarger <owinebar@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org
Subject: Re: native compilation units
Date: Tue, 14 Jun 2022 23:03:29 -0400	[thread overview]
Message-ID: <CAM=F=bAuqB1tUH5czXw7hBpqoTNa1=Ht4cjOKmBs+My39bdVEA@mail.gmail.com> (raw)
In-Reply-To: <jwv7d5ky08j.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 10140 bytes --]

On Mon, Jun 13, 2022 at 1:15 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> > To be clear, I'm trying to first understand what Andrea means by "safe".
> > I'm assuming it means the result agrees with whatever the byte
> > compiler and VM would produce for the same code.
>
> Not directly.  It means that it agrees with the intended semantics.
> That semantics is sometimes accidentally defined by the actual
> implementation in the Lisp interpreter or the bytecode compiler, but
> that's secondary.
>

What I mean is, there's not really a spec defining the semantics to
judge against.  But every emacs has a working byte compiler, and only
some have a native compiler.  If the users with the byte compiler get a
different result than the users that have the native compiler, my guess
is that the code would be expected to be rewritten so that it produces the
expected result from the byte compiler (at least until the byte compiler is
revised).  To the extent the byte compiler is judged to produce an
incorrect
result, it's probably an area of the language that was not considered
well-defined
enough (or useful enough) to have been used previously.  Or it was known
that the byte compiler's semantics weren't very useful for a particular
family
of expressions.


> The semantic issue is that if you call
>
>     (foo bar baz)
>
> it normally (when `foo` is a global function) means you're calling the
> function contained in the `symbol-function` of the `foo` symbol *at the
> time of the function call*.  So compiling this to jump directly to the
> code that happens to be contained there during compilation (or the code
> which the compiler expects to be there at that point) is unsafe in
> the sense that you don't know whether that symbol's `symbol-function`
> will really have that value when we get to executing that function call.
>
> The use of `cl-flet` (or `cl-labels`) circumvents this problem since the
> call to `foo` is now to a lexically-scoped function `foo`, so the
> compiler knows that the code that is called is always that same one
> (there is no way to modify it between the compilation time and the
> runtime).
>

The fact that cl-flet (and cl-labels) are defined to provide immutable
bindings is really a surprise to me.
However, what I was trying to do originally was figure out if there was
any situation where Andrea's statement (in another reply):

the compiler can't take advantage of interprocedural optimizations (such
> as inline etc) as every function in Lisp can be redefined in every
> moment.


Remember, I was asking whether concatenating a bunch of files
together as a library would have the same meaning as compiling and linking
the object files.

There is one kind of expression where Andrea isn't quite correct, and that
is with respect to (eval-when-compile ...).  Those *can* be treated as
constants,
even without actually compiling them first.  If I understand the
CL-Hyperspec/Emacs Lisp manual, the following expression:
------------------------------------
(let ()
  (eval-when-compile (defvar a (lambda (f) (lambda (x) (f (+ x 5))))))
  (eval-when-compile (defvar b (lambda (y) (* y 3))))
  (let ((f (eval-when-compile (a b))))
    (lambda (z)
      (pow z (f 6)))))
------------------------------------

can be rewritten (using a new form "define-eval-time-constant") as

------------------------------------
(eval-when-compile
  (define-eval-time-constant ct-r1 (defvar a (lambda (f) (lambda (x) (f (+
x 5))))))
  (define-eval-time-constant ct-r2 (defvar b (lambda (y) (* y 3))))
  (define-eval-time-constant ct-r3 (a b)))
(let ()
  ct-r1
  ct-r2
  (let ((f ct-r3))
    (lambda (z)
      (pow z (f 6)))))
------------------------------------
Now the optimizer can treat ct-r1,ct-r2, and ct-r3 as constants for the
purpose of propagation,
*without actually determining their value*.  So this could be rewritten as
-------------------------------------------
(eval-when-compile
  (define-eval-time-constant ct-r1 (defvar a (lambda (f) (lambda (x) (f (+
x 5))))))
  (define-eval-time-constant ct-r2 (defvar b (lambda (y) (* y 3))))
  (define-eval-time-constant ct-r3 (a b)))
(let ()
  (lambda (z)
    (pow z (ct-r3 6))))
------------------------------------------------

If I wanted to "link" files A, B, and C together, with A exporting symbols
a1,..., and b exporting symbols b1,....,
I could do the following:
(eval-when-compile
  (eval-when-compile
     <text of A>
    )
  <text of B with a1,...,and replaced by (eval-when-compile a1), ....>
)
<text of C with a1,... replaced by (eval-when-compile (eval-when-compile
a1))... and b1,... replaced by (eval-when-compile b1),...

And now the (eval-when-compile) expressions can be freely propagated within
the  code of each file,
as they are constant expressions.

I don't know how the native compiler is handling "eval-when-compile"
expressions now, but this should
give that optimizer pass a class of expressions where "-O3" is in fact safe
to apply.
Then it's just a matter of creating the macros to make producing those
expressions in appropriate contexts
convenient to do in practice.

> I doubt I'm bringing up topics or ideas that are new to you.  But if
> > I do make use of semantic/wisent, I'd like to know the result can be
> > fast (modulo garbage collection, anyway).
>
> It's also "modulo enough work on the compiler (and potentially some
> primitive functions) to make the code fast".
>

Absolutely, it just doesn't look to me like a very big lift compared to,
say, what Andrea did.


> > I've been operating under the assumption that
> >
> >    - Compiled code objects should be first class in the sense that
> >    they can be serialized just by using print and read.  That seems to
> >    have been important historically, and was true for byte-code
> >    vectors for dynamically scoped functions.  It's still true for
> >    byte-code vectors of top-level functions, but is not true for
> >    byte-code vectors for closures (and hasn't been for at least
> >    a decade, apparently).
>
> It's also true for byte-compiled closures, although, inevitably, this
> holds only for closures that capture only serializable values.
>


> > But I see that closures are being implemented by calling an ordinary
> > function that side-effects the "constants" vector.
>
> I don't think that's the case.  Where do you see that?
>
My misreading, unfortunately.
That does seem like a lot of copying for anyone relying on efficient
closures.
Does this mean the native compiled code can only produce closures in
byte-code
form?  Assuming dlopen loads the shared object into read-only memory for
execution.


> > Wedging closures into the byte-code format that works for dynamic scoping
> > could be made to work with shared structures, but you'd need to modify
> > print to always capture shared structure (at least for byte-code
> vectors),
> > not just when there's a cycle.
>
> It already does.
>
> Ok, I must be missing it.  I know eval_byte_code *creates* the result shown
below with shared structure (the '(5)], but I don't see anything in the
printed
text to indicate it if read back in.

(defvar z
  (byte-compile-sexp
   '(let ((lx 5))
      (let ((f (lambda () lx))
   (g (lambda (ly) (setq lx ly))))
`(,f ,g)))))
(ppcb z)
(byte-code "\300C\301\302 \"\301\303 \" D\207"
  [5 make-closure
     #[0 "\300\242\207"
 [V0]
 1]
     #[257 "\300 \240\207"
   [V0]
   3 "\n\n(fn LY)"]]
  5)
(defvar zv (eval z))
(ppcb zv)
(#[0 "\300\242\207"
     [(5)]
     1]
 #[257 "\300 \240\207"
       [(5)]
       3 "\n\n(fn LY)"])


(defvar zvs (prin1-to-string zv))
(ppcb zvs)
"(#[0 \"\\300\\242\\207\" [(5)] 1] #[257 \"\\300 \\240\\207\" [(5)] 3
\"\n\n(fn LY)\"])"

(defvar zz (car (read-from-string zvs)))
(ppcb zz)
(#[0 "\300\242\207"
     [(5)]
     1]
 #[257 "\300 \240\207"
       [(5)]
       3 "\n\n(fn LY)"])
(let ((f (car zz)) (g (cadr zz)))
  (print (eq (aref (aref f 2) 0) (aref (aref g 2) 0)) (current-buffer)))

nil

Of course, those last bindings of f and g were just vectors, not byte-code
vectors, but
the (5) is no longer shared state.


> > Then I think the current approach is suboptimal.  The current
> > byte-code representation is analogous to the a.out format.
> > Because the .elc files run code on load you can put an arbitrary
> > amount of infrastructure in there to support an implementation of
> > compilation units with exported compile-time symbols, but it puts
> > a lot more burden on the compiler and linker/loader writers than just
> > being explicit would.
>
> I think the practical performance issues with ELisp code are very far
> removed from these problems.  Maybe some day we'll have to face them,
> but we still have a long way to go.
>

I'm sure you're correct in terms of the current code base.  But isn't the
history
of these kinds of improvements in compilers for functional languages that
coding styles that had been avoided in the past can be adopted and produce
faster code than the original?  In this case, it would be enabling the
pervasive
use of recursion and less reliance on side-effects.  Improvements in the gc
wouldn't hurt, either.


> >> You explicitly write `(require 'cl-lib)` but I don't see any
> >>
> >>     -*- lexical-binding:t -*-
> >>
> >> anywhere, so I suspect you forgot to add those cookies that are needed
> >> to get proper lexical scoping.
> >> Ok, wow, I really misread the NEWS for 28.1 where it said
> > The 'lexical-binding' local variable is always enabled.
>
> Are you sure?  How do you do that?
> Some of the errors you showed seem to point very squarely towards the
> code being compiled as dyn-bound ELisp.
>
> My quoting wasn't very effective.  That last line was actually line 2902
of NEWS.28:
     "** The 'lexical-binding' local variable is always enabled.

    Previously, if 'enable-local-variables' was nil, a 'lexical-binding'
    local variable would not be heeded.  This has now changed, and a file
    with a 'lexical-binding' cookie is always heeded.  To revert to the

        old behavior, set 'permanently-enabled-local-variables' to nil."

I feel a little less silly about my optimistic misreading of the first
line, at least.

Lynn

[-- Attachment #2: Type: text/html, Size: 15237 bytes --]

  reply	other threads:[~2022-06-15  3:03 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-31  1:02 native compilation units Lynn Winebarger
2022-06-01 13:50 ` Andrea Corallo
2022-06-03 14:17   ` Lynn Winebarger
2022-06-03 16:05     ` Eli Zaretskii
     [not found]       ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com>
2022-06-04  5:57         ` Eli Zaretskii
2022-06-05 13:53           ` Lynn Winebarger
2022-06-03 18:15     ` Stefan Monnier
2022-06-04  2:43       ` Lynn Winebarger
2022-06-04 14:32         ` Stefan Monnier
2022-06-05 12:16           ` Lynn Winebarger
2022-06-05 14:08             ` Lynn Winebarger
2022-06-05 14:46               ` Stefan Monnier
2022-06-05 14:20             ` Stefan Monnier
2022-06-06  4:12               ` Lynn Winebarger
2022-06-06  6:12                 ` Stefan Monnier
2022-06-06 10:39                   ` Eli Zaretskii
2022-06-06 16:23                     ` Lynn Winebarger
2022-06-06 16:58                       ` Eli Zaretskii
2022-06-07  2:14                         ` Lynn Winebarger
2022-06-07 10:53                           ` Eli Zaretskii
2022-06-06 16:13                   ` Lynn Winebarger
2022-06-07  2:39                     ` Lynn Winebarger
2022-06-07 11:50                       ` Stefan Monnier
2022-06-07 13:11                         ` Eli Zaretskii
2022-06-14  4:19               ` Lynn Winebarger
2022-06-14 12:23                 ` Stefan Monnier
2022-06-14 14:55                   ` Lynn Winebarger
2022-06-08  6:56           ` Andrea Corallo
2022-06-11 16:13             ` Lynn Winebarger
2022-06-11 16:37               ` Stefan Monnier
2022-06-11 17:49                 ` Lynn Winebarger
2022-06-11 20:34                   ` Stefan Monnier
2022-06-12 17:38                     ` Lynn Winebarger
2022-06-12 18:47                       ` Stefan Monnier
2022-06-13 16:33                         ` Lynn Winebarger
2022-06-13 17:15                           ` Stefan Monnier
2022-06-15  3:03                             ` Lynn Winebarger [this message]
2022-06-15 12:23                               ` Stefan Monnier
2022-06-19 17:52                                 ` Lynn Winebarger
2022-06-19 23:02                                   ` Stefan Monnier
2022-06-20  1:39                                     ` Lynn Winebarger
2022-06-20 12:14                                       ` Lynn Winebarger
2022-06-20 12:34                                       ` Lynn Winebarger
2022-06-25 18:12                                       ` Lynn Winebarger
2022-06-26 14:14                                         ` Lynn Winebarger
2022-06-08  6:46         ` Andrea Corallo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM=F=bAuqB1tUH5czXw7hBpqoTNa1=Ht4cjOKmBs+My39bdVEA@mail.gmail.com' \
    --to=owinebar@gmail.com \
    --cc=akrl@sdf.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.