On Mon, Jun 13, 2022 at 1:15 PM Stefan Monnier wrote: > > To be clear, I'm trying to first understand what Andrea means by "safe". > > I'm assuming it means the result agrees with whatever the byte > > compiler and VM would produce for the same code. > > Not directly. It means that it agrees with the intended semantics. > That semantics is sometimes accidentally defined by the actual > implementation in the Lisp interpreter or the bytecode compiler, but > that's secondary. > What I mean is, there's not really a spec defining the semantics to judge against. But every emacs has a working byte compiler, and only some have a native compiler. If the users with the byte compiler get a different result than the users that have the native compiler, my guess is that the code would be expected to be rewritten so that it produces the expected result from the byte compiler (at least until the byte compiler is revised). To the extent the byte compiler is judged to produce an incorrect result, it's probably an area of the language that was not considered well-defined enough (or useful enough) to have been used previously. Or it was known that the byte compiler's semantics weren't very useful for a particular family of expressions. > The semantic issue is that if you call > > (foo bar baz) > > it normally (when `foo` is a global function) means you're calling the > function contained in the `symbol-function` of the `foo` symbol *at the > time of the function call*. So compiling this to jump directly to the > code that happens to be contained there during compilation (or the code > which the compiler expects to be there at that point) is unsafe in > the sense that you don't know whether that symbol's `symbol-function` > will really have that value when we get to executing that function call. > > The use of `cl-flet` (or `cl-labels`) circumvents this problem since the > call to `foo` is now to a lexically-scoped function `foo`, so the > compiler knows that the code that is called is always that same one > (there is no way to modify it between the compilation time and the > runtime). > The fact that cl-flet (and cl-labels) are defined to provide immutable bindings is really a surprise to me. However, what I was trying to do originally was figure out if there was any situation where Andrea's statement (in another reply): the compiler can't take advantage of interprocedural optimizations (such > as inline etc) as every function in Lisp can be redefined in every > moment. Remember, I was asking whether concatenating a bunch of files together as a library would have the same meaning as compiling and linking the object files. There is one kind of expression where Andrea isn't quite correct, and that is with respect to (eval-when-compile ...). Those *can* be treated as constants, even without actually compiling them first. If I understand the CL-Hyperspec/Emacs Lisp manual, the following expression: ------------------------------------ (let () (eval-when-compile (defvar a (lambda (f) (lambda (x) (f (+ x 5)))))) (eval-when-compile (defvar b (lambda (y) (* y 3)))) (let ((f (eval-when-compile (a b)))) (lambda (z) (pow z (f 6))))) ------------------------------------ can be rewritten (using a new form "define-eval-time-constant") as ------------------------------------ (eval-when-compile (define-eval-time-constant ct-r1 (defvar a (lambda (f) (lambda (x) (f (+ x 5)))))) (define-eval-time-constant ct-r2 (defvar b (lambda (y) (* y 3)))) (define-eval-time-constant ct-r3 (a b))) (let () ct-r1 ct-r2 (let ((f ct-r3)) (lambda (z) (pow z (f 6))))) ------------------------------------ Now the optimizer can treat ct-r1,ct-r2, and ct-r3 as constants for the purpose of propagation, *without actually determining their value*. So this could be rewritten as ------------------------------------------- (eval-when-compile (define-eval-time-constant ct-r1 (defvar a (lambda (f) (lambda (x) (f (+ x 5)))))) (define-eval-time-constant ct-r2 (defvar b (lambda (y) (* y 3)))) (define-eval-time-constant ct-r3 (a b))) (let () (lambda (z) (pow z (ct-r3 6)))) ------------------------------------------------ If I wanted to "link" files A, B, and C together, with A exporting symbols a1,..., and b exporting symbols b1,...., I could do the following: (eval-when-compile (eval-when-compile ) ) I doubt I'm bringing up topics or ideas that are new to you. But if > > I do make use of semantic/wisent, I'd like to know the result can be > > fast (modulo garbage collection, anyway). > > It's also "modulo enough work on the compiler (and potentially some > primitive functions) to make the code fast". > Absolutely, it just doesn't look to me like a very big lift compared to, say, what Andrea did. > > I've been operating under the assumption that > > > > - Compiled code objects should be first class in the sense that > > they can be serialized just by using print and read. That seems to > > have been important historically, and was true for byte-code > > vectors for dynamically scoped functions. It's still true for > > byte-code vectors of top-level functions, but is not true for > > byte-code vectors for closures (and hasn't been for at least > > a decade, apparently). > > It's also true for byte-compiled closures, although, inevitably, this > holds only for closures that capture only serializable values. > > > But I see that closures are being implemented by calling an ordinary > > function that side-effects the "constants" vector. > > I don't think that's the case. Where do you see that? > My misreading, unfortunately. That does seem like a lot of copying for anyone relying on efficient closures. Does this mean the native compiled code can only produce closures in byte-code form? Assuming dlopen loads the shared object into read-only memory for execution. > > Wedging closures into the byte-code format that works for dynamic scoping > > could be made to work with shared structures, but you'd need to modify > > print to always capture shared structure (at least for byte-code > vectors), > > not just when there's a cycle. > > It already does. > > Ok, I must be missing it. I know eval_byte_code *creates* the result shown below with shared structure (the '(5)], but I don't see anything in the printed text to indicate it if read back in. (defvar z (byte-compile-sexp '(let ((lx 5)) (let ((f (lambda () lx)) (g (lambda (ly) (setq lx ly)))) `(,f ,g))))) (ppcb z) (byte-code "\300C\301\302 \"\301\303 \" D\207" [5 make-closure #[0 "\300\242\207" [V0] 1] #[257 "\300 \240\207" [V0] 3 "\n\n(fn LY)"]] 5) (defvar zv (eval z)) (ppcb zv) (#[0 "\300\242\207" [(5)] 1] #[257 "\300 \240\207" [(5)] 3 "\n\n(fn LY)"]) (defvar zvs (prin1-to-string zv)) (ppcb zvs) "(#[0 \"\\300\\242\\207\" [(5)] 1] #[257 \"\\300 \\240\\207\" [(5)] 3 \"\n\n(fn LY)\"])" (defvar zz (car (read-from-string zvs))) (ppcb zz) (#[0 "\300\242\207" [(5)] 1] #[257 "\300 \240\207" [(5)] 3 "\n\n(fn LY)"]) (let ((f (car zz)) (g (cadr zz))) (print (eq (aref (aref f 2) 0) (aref (aref g 2) 0)) (current-buffer))) nil Of course, those last bindings of f and g were just vectors, not byte-code vectors, but the (5) is no longer shared state. > > Then I think the current approach is suboptimal. The current > > byte-code representation is analogous to the a.out format. > > Because the .elc files run code on load you can put an arbitrary > > amount of infrastructure in there to support an implementation of > > compilation units with exported compile-time symbols, but it puts > > a lot more burden on the compiler and linker/loader writers than just > > being explicit would. > > I think the practical performance issues with ELisp code are very far > removed from these problems. Maybe some day we'll have to face them, > but we still have a long way to go. > I'm sure you're correct in terms of the current code base. But isn't the history of these kinds of improvements in compilers for functional languages that coding styles that had been avoided in the past can be adopted and produce faster code than the original? In this case, it would be enabling the pervasive use of recursion and less reliance on side-effects. Improvements in the gc wouldn't hurt, either. > >> You explicitly write `(require 'cl-lib)` but I don't see any > >> > >> -*- lexical-binding:t -*- > >> > >> anywhere, so I suspect you forgot to add those cookies that are needed > >> to get proper lexical scoping. > >> Ok, wow, I really misread the NEWS for 28.1 where it said > > The 'lexical-binding' local variable is always enabled. > > Are you sure? How do you do that? > Some of the errors you showed seem to point very squarely towards the > code being compiled as dyn-bound ELisp. > > My quoting wasn't very effective. That last line was actually line 2902 of NEWS.28: "** The 'lexical-binding' local variable is always enabled. Previously, if 'enable-local-variables' was nil, a 'lexical-binding' local variable would not be heeded. This has now changed, and a file with a 'lexical-binding' cookie is always heeded. To revert to the old behavior, set 'permanently-enabled-local-variables' to nil." I feel a little less silly about my optimistic misreading of the first line, at least. Lynn