On Sun, Jun 19, 2022 at 7:02 PM Stefan Monnier wrote: > > Currently compiling a top-level expression wrapped in > > eval-when-compile by itself leaves no residue in the compiled output, > > `eval-when-compile` has 2 effects: > > 1- Run the code within the compiler's process. > E.g. (eval-when-compile (require 'cl-lib)). > This is somewhat comparable to loading a gcc plugin during > a compilation: it affects the GCC process itself, rather than the > code it emits. > > 2- It replaces the (eval-when-compile ...) thingy with the value > returned by the evaluation of this code. So you can do (defvar > my-str (eval-when-compile (concat "foo" "bar"))) and you know that > the concatenation will be done during compilation. > > > but I would want to make the above evaluate to an object at run-time > > where the exported symbols in the obstack are immutable. > > Then it wouldn't be called `eval-when-compile` because it would do > something quite different from what `eval-when-compile` does :-) > > The informal semantics of "eval-when-compile" from the elisp info file are that This form marks BODY to be evaluated at compile time but not when the compiled program is loaded. The result of evaluation by the compiler becomes a constant which appears in the compiled program. If you load the source file, rather than compiling it, BODY is evaluated normally. I'm not sure what I have proposed that would be inconsistent with "the result of evaluation by the compiler becomes a constant which appears in the compiled program". The exact form of that appearance in the compiled program is not specified. For example, the byte-compile of (eval-when-compile (cl-labels ((f...) (g ...))) currently produces a byte-code vector in which f and g are byte-code vectors with shared structure. However, that representation is only one choice. It is inconsistent with the semantics of *symbols* as they currently stand, as I have already admitted. Even there, you could advance a model where it is not inconsistent. For example, if you view the binding of symbol to value as having two components - the binding and the cell holding the mutable value during the extent of the symbol as a global/dynamically scoped variable, then having the binding of the symbol to the final value of the cell before the dynamic extent of the variable terminates would be consistent. That's not how it's currently implemented, because there is no way to express the final compile-time environment as a value after compilation has completed with the current semantics. The part that's incompatible with current semantics of symbols is importing that symbol as an immutable symbolic reference. Not really a "variable" reference, but as a binding of a symbol to a value in the run-time namespace (or package in CL terminology, although CL did not allow any way to specify what I'm suggesting either, as far as I know). However, that would capture the semantics of ELF shared objects with the text and ro_data segments loaded into memory that is in fact immutable for a userspace program. > > byte-code (or native-code) instruction arrays. This would in turn enable > > implementing proper tail recursion as "goto with arguments". > > Proper tail recursion elimination would require changing the *normal* > function call protocol. I suspect you're thinking of a smaller-scale version of it specifically tailored to self-recursion, kind of like > what `named-let` provides. Note that such ad-hoc TCO tends to hit the same > semantic issues as the -O3 optimization of the native compiler. > E.g. in code like the following: > > (defun vc-foo-register (file) > (when (some-hint-is-true) > (load "vc-foo") > (vc-foo-register file))) > > the final call to `vc-foo-register` is in tail position but is not > a self call because loading `vc-foo` is expected to redefine > `vc-foo-register` with the real implementation. > > I'm only talking about the steps that are required to allow the compiler to produce code that implements proper tail recursion. With the abstract machine currently implemented by the byte-code VM, the "call[n]" instructions will always be needed to call out according to the C calling conventions. The call[-absolute/relative] or [goto-absolute] instructions I suggested *would be* used in the "normal" function-call protocol in place of the current funcall dispatch, at least to functions defined in lisp. This is necessary but not sufficient for proper tail recursion. To actually get proper tail recursion requires the compiler to use the instructions for implementing the appropriate function call protocol, especially if "goto-absolute" is the instruction provided for changing the PC register. Other instructions would have to be issued to manage the stack frame explicitly if that were the route taken. Or, a more CISCish call-absolute type of instruction could be used that would perform that stack frame management implicitly. EIther way, it's the compiler that has to determine whether a return instruction following a control transfer can be safely eliminated or not. If the "goto-absolute" instruction were used, the compiler would have to decide whether the address following the "goto-absolute" should be pushed in a new frame, or if it can be "pre-emptively garbage collected" at compile time because it's a tail call. > > I'm not familiar with emacs's profiling facilities. Is it possible to > > tell how much of the allocated space/time spent in gc is due to the > > constant vectors of lexical closures? In particular, how much of the > > constant vectors are copied elements independent of the lexical > > environment? That would provide some measure of any gc-related > > benefit that *might* be gained from using an explicit environment > > register for closures, instead of embedding it in the > > byte-code vector. > > No, I can't think of any profiling tool we currently have that can help > with that, sorry :-( > > Note that when support for native closures is added to the native > compiler, it will hopefully not be using this clunky representation > where capture vars are mixed in with the vector of constants, so that > might be a more promising direction (may be able to skip the step where > we need to change the bytecode). > > The trick is to make the implementation of the abstract machine by each of the compilers have enough in common to support calling one from the other. The extensions I've suggested for the byte-code VM and lisp semantics are intended to support that interoperation, so the semantics of the byte-code implementation won't unnecessarily constrain the semantics of the native-code implementation. Lynn