From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lynn Winebarger Newsgroups: gmane.emacs.devel Subject: Re: native compilation units Date: Mon, 13 Jun 2022 12:33:19 -0400 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000044279605e156d9bd" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25980"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Andrea Corallo , emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jun 13 18:34:32 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o0n1L-0006T1-Nn for ged-emacs-devel@m.gmane-mx.org; Mon, 13 Jun 2022 18:34:31 +0200 Original-Received: from localhost ([::1]:45490 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o0n1K-0008Md-LG for ged-emacs-devel@m.gmane-mx.org; Mon, 13 Jun 2022 12:34:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49258) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o0n0R-0007Qi-Qz for emacs-devel@gnu.org; Mon, 13 Jun 2022 12:33:35 -0400 Original-Received: from mail-vs1-xe35.google.com ([2607:f8b0:4864:20::e35]:42854) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1o0n0P-0000rT-Bp for emacs-devel@gnu.org; Mon, 13 Jun 2022 12:33:35 -0400 Original-Received: by mail-vs1-xe35.google.com with SMTP id i186so6389172vsc.9 for ; Mon, 13 Jun 2022 09:33:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=R6zsuS1+7YPp3i8s+5qnzb4ts4hRGI+uDirQpeNnHsY=; b=JfJ9q5xY+0HpPJb1BZ2PX+OikT1DYwD8LGBdMxZ1U4nZkbGG0vkNnGywh12WQlD3HB nHgkzolBgYxZkeLcKKlX7dR963CJ1BPxhQFs/gXu5qiCsRPt50rpBp2rc89dryrPz/eW enDff1HPhNcfdwVw8bwGp4pBQs8ByoFmW+/tIIXPVhkGenGcJmDBHqKEXGzyHhgCFjn4 6ZkuG1JWQ5QkGtRIVAbLxnplDRlPpmpNrkW1jdPVoau2QVHEY5TnM+aaAIbmc4HmhP6J tL6DG59ijhZweNQboAr7cRRR6wLebJeKvssZlMs8ZtfYqr0CAOaJ8Mz3Tvzs8T2859qo kZWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=R6zsuS1+7YPp3i8s+5qnzb4ts4hRGI+uDirQpeNnHsY=; b=H6HaJY/2YI0WACAXOHY8xJLXd5Yj8ZFUT8daq/vMdpgXf/vL9uoCgRYxm/NBcEFB9y /+OuVx1h6J0rVf6XT6Mx1ZDAQC4aCgLi4AOLof9Q1ZMakex2WRIJHsXHhW4tYFsCH1Wl 8ca1OwUpDWsSXfm0q5+WWaXFIdO/JbOKpi0mYa28mbp5p9vDGRRzoZAGqpAvVMeyGb0T rFRdJnYRZDeE8hj1kkx49K6kBgZ2tCMee1NJSw2v+dEnNkaBH2BxSqa9YzLGbdomJG5X ZfUYLVB8Y7CJPQwyZBhFDNbo/Cuu+C0IyxBLE9bBaStH5bzO5C8tgs7lrb7XVa8j/KRN 0C0g== X-Gm-Message-State: AJIora/NW4c3eYPf9bZvJMiafMWNnbngnD9yj64HCDlfTUOJScAN+pgG AjFXtUdDsgT4B3DAWloIFXNJSF+lv+nErNYHDFM= X-Google-Smtp-Source: AGRyM1uh08dmUGbyL7ZaSK00StOFS762IC+LP9oFEpW0n3PohSjU4aw61Uxja/Wrzagz67/pzH6ElaDoYVZJOlGpw18= X-Received: by 2002:a05:6102:6c5:b0:34c:498e:f78 with SMTP id m5-20020a05610206c500b0034c498e0f78mr475726vsg.56.1655138012111; Mon, 13 Jun 2022 09:33:32 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2607:f8b0:4864:20::e35; envelope-from=owinebar@gmail.com; helo=mail-vs1-xe35.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:291151 Archived-At: --00000000000044279605e156d9bd Content-Type: text/plain; charset="UTF-8" On Sun, Jun 12, 2022 at 2:47 PM Stefan Monnier wrote: > >> >> In which sense would it be different from: > >> >> > >> >> (cl-flet > >> >> ... > >> >> (defun ...) > >> >> (defun ...) > >> >> ...) > >> >> > > I'm trying to determine if there's a set of expressions for which it > > is semantically sound to perform the intraprocedural optimizations > > The cl-flet above is such an example, AFAIK. Or maybe I don't > understand what you mean. > To be clear, I'm trying to first understand what Andrea means by "safe". I'm assuming it means the result agrees with whatever the byte compiler and VM would produce for the same code. I doubt I'm bringing up topics or ideas that are new to you. But if I do make use of semantic/wisent, I'd like to know the result can be fast (modulo garbage collection, anyway). I've been operating under the assumption that - Compiled code objects should be first class in the sense that they can be serialized just by using print and read. That seems to have been important historically, and was true for byte-code vectors for dynamically scoped functions. It's still true for byte-code vectors of top-level functions, but is not true for byte-code vectors for closures (and hasn't been for at least a decade, apparently). - It's still worthwhile to have a class of code objects that are immutable in the VM semantics, but now because there are compiler passes implemented that can make use of that as an invariant - cl-flet doesn't allow mutual recursion, and there is no shared state above, so there's nothing to optimize intraprocedurally. - cl-labels is implemented with closures, so (as I understand it) the native compiler would not be able to produce code if you asked it to compile the closure returned by a form like (cl-labels ((f ..) (g...) ...) f) I also mistakenly thought byte-code-vectors of the sort saved in ".elc" files would not be able to represent closures without being consed, as the components (at least the first 4) are nominally constant. But I see that closures are being implemented by calling an ordinary function that side-effects the "constants" vector. That's unfortunate because it means the optimizer cannot assume byte-vectors are constants that can be freely propagated. OTOH, prior to commit https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=d0c47652e527397cae96444c881bf60455c763c1 it looks like the closures were constructed at compile time rather than by side-effect, which would mean the VM would be expected to treat them as immutable, at least. Wedging closures into the byte-code format that works for dynamic scoping could be made to work with shared structures, but you'd need to modify print to always capture shared structure (at least for byte-code vectors), not just when there's a cycle. The approach that's been implemented only works at run-time when there's shared state between closures, at least as far asI can tell. However, it's a hack that will never really correspond closely to the semantics of shared objects with explicit tracking and load-time linking of compile-time symbols, because the relocations are already performed and there's no way to back out where they occured from the value itself. If a goal is to have a semantics in which you can 1. unambiguously specify that at load/run time a function or variable name is resolved in the compile time environment provided by a separate compilation unit as an immutable constant at run-time 2. serialize compiled closures as compilation units that provide a well-defined compile-time environment for linking 3. reduce the headaches of the compiler writer by making it easy to produce code that is eligible for their optimizations Then I think the current approach is suboptimal. The current byte-code representation is analogous to the a.out format. Because the .elc files run code on load you can put an arbitrary amount of infrastructure in there to support an implementation of compilation units with exported compile-time symbols, but it puts a lot more burden on the compiler and linker/loader writers than just being explicit would. And I'm not sure what the payoff is. When there wasn't a native compiler (and associated optimization passes), I suppose there was no pressing reason to upend backward compatibility. Then again, I've never been responsible for maintaining a 3-4 decade old application with I don't have any idea how large an installed user base ranging in size from chips running "smart" electric switches to (I assume) the biggest of "big iron", whatever that means these days. > > I'm trying to capture a function as a first class value. > > Functions are first class values and they can be trivially captured via > things like (setq foo (lambda ...)), (defalias 'foo (lambda ...)) and > a lot more, so I there's some additional constraint you're expecting but > I don't know what that is. > Yes, I thought byte-code would be treated as constant. I still think it makes a lot of sense to make it so. > > > This was not expected with lexical scope. > > You explicitly write `(require 'cl-lib)` but I don't see any > > -*- lexical-binding:t -*- > > anywhere, so I suspect you forgot to add those cookies that are needed > to get proper lexical scoping. > > Ok, wow, I really misread the NEWS for 28.1 where it said The 'lexical-binding' local variable is always enabled. As meaning "always set". My fault. > With the current byte-codes, there's just no way to express a call to > > an offset in the current byte-vector. > > Indeed, but you can call a byte-code object instead. > > Creating the byte code with shared structure was what I meant by one of the solutions being to "patch compile-time constants" at load, i.e. perform the relocations directly. The current implementation effectively inlines copies of the constants (byte-code objects), which is fine for shared code but not for shared variables. That is, the values that are assigned to my-global-oddp and my-global-evenp (for test2 after correcting the lexical-binding setting) do not reference each other. Each is created with an independent copy of the other. to --00000000000044279605e156d9bd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, Jun 12, 2022 at 2:47 PM Stefan Mo= nnier <mon= nier@iro.umontreal.ca> wrote:
>> >> In which = sense would it be different from:
>> >>
>> >>=C2=A0 =C2=A0 =C2=A0(cl-flet
>> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0...
>> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0(defun ...)
>> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0(defun ...)
>> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0...)
>> >>
> I'm trying to determine if there's a set = of expressions for which it
> is semantically sound to perform the intraprocedural optimizations

The cl-flet above is such an example, AFAIK.=C2=A0 Or maybe I don't
understand what you mean.

To be clear, = I'm trying to first understand what Andrea means by "safe".= =C2=A0 I'm assuming it
means the result agrees with whatever = the byte compiler and VM would produce for the
same code.=C2=A0 I= doubt I'm bringing up topics or ideas that are new to you.=C2=A0 But i= f I do make
use of semantic/wisent, I'd like to know the resu= lt can be fast (modulo garbage collection, anyway).=C2=A0
I'v= e been operating under the assumption that
  • Compiled code = objects should be first class in the sense that they can be=C2=A0serialized=
    just by using print and read.=C2=A0 That seems to have been important h= istorically, and
    was true for byte-code vectors for dynamically scoped f= unctions.=C2=A0 It's still true for
    byte-code vectors of top-level f= unctions, but is not true for byte-code vectors for
    closures (and hasn&= #39;t been for at least a decade, apparently).=C2=A0=C2=A0
  • It&#= 39;s still worthwhile to have a class of code objects that are immutable in= the VM
    semantics, but now because there are compiler passes implemented= that can
    make use of that as an invariant
  • cl-flet doesn't a= llow mutual recursion, and there is no shared state above,
    so there'= s nothing to optimize intraprocedurally.=C2=A0=C2=A0
  • cl-labels is i= mplemented with closures, so (as I understand it) the native
    compiler wo= uld not be able to produce code if you asked it to compile
    the closure r= eturned by a form like (cl-labels ((f ..) (g...) ...) f)
I also mistakenly thought byte-code-vectors of the sort saved in ".e= lc" files would not=C2=A0
be able to represent closures = without being consed, as the components (at least=C2=A0
the first= 4) are nominally constant.=C2=A0 But I see that closures are being impleme= nted
by calling an ordinary function that side-effects the "= constants" vector.=C2=A0 That's unfortunate
because it m= eans the optimizer cannot assume byte-vectors are constants that can be
freely propagated.=C2=A0 OTOH, prior to commit=C2=A0
it looks= like the closures were constructed at compile time rather than by side-eff= ect,
which would mean the VM would be expected to treat them as i= mmutable, at least.

Wedging closures into the byte= -code format that works for dynamic scoping
could be made to work= with shared structures, but you'd need to modify
print to al= ways capture shared structure (at least for byte-code vectors),=C2=A0
=
not just when there's a cycle.=C2=A0 The approach that's been = implemented only=C2=A0
works at run-time when there's shared = state between closures, at least as far
asI can tell.
<= br>
However, it's a hack that will never really correspond cl= osely to the semantics
of shared objects with explicit tracking and load= -time linking of compile-time=C2=A0
symbols, because the relocati= ons are already performed and there's no way to
back out wher= e they=C2=A0occured from the=C2=A0value itself.=C2=A0 If a goal is to have = a=C2=A0
semantics in which you can=C2=A0
  1. unambi= guously specify that at load/run time a function or variable name
    is re= solved in the compile time environment provided by a separate
    compilatio= n unit as an immutable constant at run-time
  2. =C2=A0serialize com= piled closures as compilation=C2=A0units that provide a well-defined
    com= pile-time environment for linking
  3. reduce the headaches of the compi= ler writer by making it easy to
    produce code that is eligible for their = optimizations
Then I think the current approach is suboptimal= .=C2=A0 The current byte-code representation
is analogous t= o the a.out format.=C2=A0 Because the .elc files run code on load you can
put an arbitrary amount of infrastructure in there to support an i= mplementation
of compilation units with exported compile-time sym= bols, but it puts a lot more
burden on the compiler and linker/lo= ader writers than just being explicit would.

And I= 'm not sure what the payoff is.=C2=A0 When there wasn't a native co= mpiler (and
associated optimization passes), I suppose there was = no pressing reason
to upend backward compatibility.=C2=A0 Then ag= ain, I've never been responsible
for maintaining a 3-4 decade= old application with I don't have any idea how
large an inst= alled=C2=A0user base ranging in size from chips running "smart" e= lectric
switches to (I assume) the biggest of "big iron"= ;, whatever that means these days.
=C2=A0
> I'm trying to capture a function as a first class value.

Functions are first class values and they can be trivially captured via
things like (setq foo (lambda ...)), (defalias 'foo (lambda ...)) and a lot more, so I there's some additional constraint you're expectin= g but
I don't know what that is.

Yes, I t= hought byte-code would be treated as constant.=C2=A0 I still think it makes= a lot of sense
to make it so.=C2=A0=C2=A0
=C2=A0

> This was not expected with lexical scope.

You explicitly write `(require 'cl-lib)` but I don't see any

=C2=A0 =C2=A0 -*- lexical-binding:t -*-

anywhere, so I suspect you forgot to add those cookies that are needed
to get proper lexical scoping.

Ok, wow, I really misread the NEWS for 28.1 where it = said=C2=A0
Th= e 'lexical-binding' local variable is always enabled.
<= /div>
As meaning "always s= et".=C2=A0 My fault.

> With the current byte-codes, there's just no = way to express a call to
> an offset in the current byte-vector.

Indeed, but you can call a byte-code object instead.

Creating the byte code with shared structure was what I meant by one of = the solutions being to=C2=A0
"patch compile-time constants&q= uot; at load, i.e. perform the relocations directly.=C2=A0 The current
implementation effectively inlines copies of the constants (byte-code= objects), which is fine for shared code but not
for shared varia= bles.=C2=A0 That is, the values that are assigned to my-global-oddp and my-= global-evenp (for test2 after
correcting the lexical-binding sett= ing) do not reference each other.=C2=A0 Each is created with=C2=A0 an indep= endent copy of
the other.=C2=A0=C2=A0

to= =C2=A0

=C2=A0
--00000000000044279605e156d9bd--