From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Rocky Bernstein <rocky@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Bytecode interoperability: the good and bad
Date: Fri, 22 Dec 2017 12:41:30 -0500
Message-ID: <CANCp2gaOUtmgivNkxiFTNfoWk_1vpZtfOeYjeasFCuDQocAHxw@mail.gmail.com>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="001a11466198b7cfcb0560f15309"
X-Trace: blaine.gmane.org 1513964427 16607 195.159.176.226 (22 Dec 2017 17:40:27 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Fri, 22 Dec 2017 17:40:27 +0000 (UTC)
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Dec 22 18:40:23 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1eSRIv-0003aK-8u
	for ged-emacs-devel@m.gmane.org; Fri, 22 Dec 2017 18:40:17 +0100
Original-Received: from localhost ([::1]:59705 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1eSRKt-0005XQ-Ln
	for ged-emacs-devel@m.gmane.org; Fri, 22 Dec 2017 12:42:19 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57974)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rocky.bernstein@gmail.com>) id 1eSRKA-0005Wz-4M
	for emacs-devel@gnu.org; Fri, 22 Dec 2017 12:41:35 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rocky.bernstein@gmail.com>) id 1eSRK8-0001ha-J2
	for emacs-devel@gnu.org; Fri, 22 Dec 2017 12:41:34 -0500
Original-Received: from mail-qk0-x22b.google.com ([2607:f8b0:400d:c09::22b]:33277)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <rocky.bernstein@gmail.com>)
	id 1eSRK8-0001hO-Br
	for emacs-devel@gnu.org; Fri, 22 Dec 2017 12:41:32 -0500
Original-Received: by mail-qk0-x22b.google.com with SMTP id x7so18258320qkb.0
	for <emacs-devel@gnu.org>; Fri, 22 Dec 2017 09:41:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:sender:from:date:message-id:subject:to;
	bh=GzeJLE/wufd6xuFvwDtVYzlvKbW0313xqJcLZWZNsB0=;
	b=Xqvj6AIt38C20rQrbmpOCe+pF1EG+dH9uWaTQThANpBqjPGdLpoF0oCLD0mijhQ0qo
	LdaMWrc4iGxB3y6dAPlNice9FQ1t8wPvhGQG49Hji+fksVjOM00kZkhh1N32MBG04xIB
	nmu7dCZPcHY7aPtLhvMWRI9whd3876MjuVtsyIVy1j0Gjkt2ntzupeM0vQsRCo9zVOgR
	ohEqK3LFv+W8QI82tZ+IYLedLRgCnSifRT/oWnlEJXGbrAafaIBdl5avO3yk6AtY9WkF
	jzEMaDYiOK+iRc9q2CGMOAc0B2b8ExTUqiy+RghJ6ehfU2ZbtwK80b7pGvLhWD1S07Oa
	ZIGw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:sender:from:date:message-id:subject
	:to; bh=GzeJLE/wufd6xuFvwDtVYzlvKbW0313xqJcLZWZNsB0=;
	b=sNZMYFZ5TtG8KSYyvmRxX4Hn6zllo4G8qg07zSskdPc4CIf247nNxGiiqiMPRtQjQa
	YBs0QPAPadX+MFfwyRRqvCALiwFJVLM9bB7muzoSsTiU5X2hEr2r2PStFw1yTMngxENt
	wMCW7EGX9Fhr55lzSIKPScEJWgzogZWtWaTzoJ12E0OykD17E6NPil3RQhbwSGb3VMP3
	+5bAV7A+IhAwZK8caJgX9NYn/p5ce18c5B4TWnOpLKFESj4W9Le1m/ruk9f31PnG6FZ6
	nBjUo+xgdC4opxgT6q8ASaZAURg7unH0n2+qkjvyyf5hdNbregjxTh2Z+wGpslbmM7e4
	Qe/g==
X-Gm-Message-State: AKGB3mIylRKL2P+JET3z8wc4USArp9kLjyiQVuxuLmHHha/LdS/z6U7t
	7cn/NRDbM9P3WrXfhF7wywuf8x3qEDGs16XlHH8uGiwQ
X-Google-Smtp-Source: ACJfBosEJe4lE8j3tNV0EZ4EkkM1qLxJiuntHrWXtRTcqR8IliVy1nydHvt4P2FrjZhKV/UPo75YVTNGujqMUbXAy14=
X-Received: by 10.55.215.144 with SMTP id t16mr21302389qkt.15.1513964491491;
	Fri, 22 Dec 2017 09:41:31 -0800 (PST)
Original-Received: by 10.12.197.8 with HTTP; Fri, 22 Dec 2017 09:41:30 -0800 (PST)
X-Google-Sender-Auth: twDsSXlQj6vMeUIDE1yR2rSeupQ
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:400d:c09::22b
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:221340
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/221340>

--001a11466198b7cfcb0560f15309
Content-Type: text/plain; charset="UTF-8"

On Fri, 22 Dec 2017 09:08:33 -050 Stefan Monnier advised:

> In other kinds of bytecode such as the one for C Python, a bytecode
> version
> > number is stored in the bytecode file.  When there is a change to the
> > bytecode, that number is changed.
>
> So far, the only changes that have been made to the byte-code language
> is to add new (previously unused) byte codes.  So from this perspective
> we have always maintained backward compatibility (you can run a .elc
> compiled with an older Emacs).
>

While this is a nice intention, it isn't always true. And it is not with
downsides.

In the "not true" department, there are instructions 0153 scan_buffer and
0163 set_mark which aren't handled in the current interpreter sources in
bytecode.c

And as pipcet points out, there is this in lread.c:


  if (! version || version >= 22)
    readevalloop (Qget_file_char, &input, hist_file_name,
		  0, Qnil, Qnil, Qnil, Qnil);
  else
    {
      /* We can't handle a file which was compiled with
	 byte-compile-dynamic by older version of Emacs.  */
      specbind (Qload_force_doc_strings, Qt);
      readevalloop (Qget_emacs_mule_file_char, &input, hist_file_name,
		    0, Qnil, Qnil, Qnil, Qnil);
    }

In the "not without downsides" department, this means that when someone
looks at the bytecode interpreter, it is filled with garbage and bloat.
This has to have a technology debt associated with it.

We do not aim to maintain forward compatibility (so whether a .elc file
> compiled with a more recent Emacs will work is not guaranteed), although
> it sometimes does work.  When encountering an unknown byte-code, Emacs
> signals an error, so it shouldn't cause a crash nor "something unintended".
>

It is likely that the code that purports to handle obsolete (or no longer
emitted) instructions is broken, since I doubt any of this behavior is
tested. Subtle changes in the semantics of instructions can cause
unintended effects.


> Compatibility problems with .elc files compiled with other Emacs
> versions can also come from macros, and those tend to be more frequent
> than the problems introduced by changes to the byte-code.  So detecting
> a different byte-code version is not sufficient to catch the most common
> problems anyway.
>

My understanding of how this work in a more rational way would be that
there shouldn't be incompatible changes between major releases. So I would
hope that incompatible macro changes wouldn't happen within a major release
but between major releases, the same as I hope would be the case for
bytecode changes.

If someone is up for it, a possibly interesting program to write might be a
bytecode lint and report tool that shows the meta comment in bytecode to
describe what version of Emacs the bytecode was compiled under (comparing
with the current loaded version), what level of optimization is reported.
Possibly a scan over the instructions to look for incompatibility both in
the forward and backward direction.  It might optionally have knowledge of
specific version incompatibilities say because of macro changes between
versions.

Maybe this could be incorporated into a "safe-load-file" function.


> FWIW, I think Emacs deserves a new Elisp compilation system (either
> a new kind of bytecode (maybe using something like vmgen), or a JIT or
> something): the bytecode we use is basically identical to the one we had
> 20 years ago, yet the tradeoffs have changed substantially in the
> mean time.
>

I would  be interested in elaboration here about what specific  trade offs
you mean.

>From what I've seen of Emacs Lisp bytecode, I think it would be a bit
difficult to use something like vmgen without a lot of effort.  In the
interpreter for vmgen the objects are basically C kinds of objects, not
Lisp Objects. Perhaps that could be negotiated, but it would not be trivial.

As for JITing bytecode, haven't there been a couple of efforts in that
direction already? Again, this is probably hard.

I'm not saying it shouldn't be done. Just that these are very serious
projects requiring a lot of effort that would take a bit of time, and might
cause instability in the interim. All while  Emacs is moving forward on its
own.

But in any event, a prerequisite for considering doing this is to
understand what we got right now. That's why I'm trying to document that
more people at least have an understanding of what we are talking about in
the replacing or modifying the existing system.

Right now I feel that there are only a handful of people who understand
bytecode, and even there maybe not in entirety.

--001a11466198b7cfcb0560f15309
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_extra"><div class=3D"gmail_quote">=
On Fri, 22 Dec 2017 09:08:33 -050 Stefan Monnier advised: <br></div><div cl=
ass=3D"gmail_quote"><br><blockquote class=3D"gmail_quote" style=3D"margin:0=
px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
&gt; In other kinds of bytecode such as the one for C Python, a bytecode ve=
rsion<br>
&gt; number is stored in the bytecode file.=C2=A0 When there is a change to=
 the<br>
&gt; bytecode, that number is changed.<br>
<br>
So far, the only changes that have been made to the byte-code language<br>
is to add new (previously unused) byte codes.=C2=A0 So from this perspectiv=
e<br>
we have always maintained backward compatibility (you can run a .elc<br>
compiled with an older Emacs).<br></blockquote><div><br></div><div><div>Whi=
le this is a nice intention, it isn&#39;t always true. And it is not with d=
ownsides. <br></div><div><br></div><div>In the &quot;not true&quot; departm=
ent, there are instructions 0153 scan_buffer and 0163 set_mark which aren&#=
39;t handled in the current interpreter sources in bytecode.c<br></div><div=
><br></div><div>And as pipcet points out, there is this in <span style=3D"f=
ont-family:monospace,monospace">lread.c: </span><br></div><div><br></div><d=
iv><pre lang=3D"if"><code>   <br>  if (! version || version &gt;=3D 22)<br>=
    readevalloop (Qget_file_char, &amp;input, hist_file_name,
		  0, Qnil, Qnil, Qnil, Qnil);
  else
    {
      /* We can&#39;t handle a file which was compiled with
	 byte-compile-dynamic by older version of Emacs.  */
      specbind (Qload_force_doc_strings, Qt);
      readevalloop (Qget_emacs_mule_file_char, &amp;input, hist_file_name,
		    0, Qnil, Qnil, Qnil, Qnil);
    }<br><br></code></pre></div></div><div>In the &quot;not without downsid=
es&quot; department, this means that when someone looks at the bytecode int=
erpreter, it is filled with garbage and bloat. This has to have a technolog=
y debt associated with it. <br></div><div> <br></div><blockquote class=3D"g=
mail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204=
,204,204);padding-left:1ex">
We do not aim to maintain forward compatibility (so whether a .elc file<br>
compiled with a more recent Emacs will work is not guaranteed), although<br=
>
it sometimes does work.=C2=A0 When encountering an unknown byte-code, Emacs=
<br>
signals an error, so it shouldn&#39;t cause a crash nor &quot;something uni=
ntended&quot;.<br></blockquote><div><br></div><div>It is likely that the co=
de that purports to handle obsolete=20
(or no longer emitted) instructions is broken, since I doubt any of this be=
havior is=20
tested. Subtle changes in the semantics of instructions can cause unintende=
d effects. <br></div><div><br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex">
<br>
Compatibility problems with .elc files compiled with other Emacs<br>
versions can also come from macros, and those tend to be more frequent<br>
than the problems introduced by changes to the byte-code.=C2=A0 So detectin=
g<br>
a different byte-code version is not sufficient to catch the most common<br=
>
problems anyway.<br></blockquote><div><br></div><div>My understanding of ho=
w this work in a more rational way would be that there shouldn&#39;t be inc=
ompatible changes between major releases. So I would hope that incompatible=
 macro changes wouldn&#39;t happen within a major release but between major=
 releases, the same as I hope would be the case for bytecode changes.</div>=
<div><br></div><div>If someone is up for it, a possibly interesting program=
 to write might be a bytecode lint and report tool that shows the meta comm=
ent in bytecode to describe what version of Emacs the bytecode was compiled=
 under (comparing with the current loaded version), what level of optimizat=
ion is reported. Possibly a scan over the instructions to look for incompat=
ibility both in the forward and backward direction.=C2=A0 It might optional=
ly have knowledge of specific version incompatibilities say because of macr=
o changes between versions. <br></div><div><br></div><div>Maybe this could =
be incorporated into a &quot;safe-load-file&quot; function. <br></div><div>=
<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
FWIW, I think Emacs deserves a new Elisp compilation system (either<br>
a new kind of bytecode (maybe using something like vmgen), or a JIT or<br>
something): the bytecode we use is basically identical to the one we had<br=
>
20 years ago, yet the tradeoffs have changed substantially in the<br>
mean time.<br></blockquote><div><br></div><div>I would=C2=A0 be interested =
in elaboration here about what specific=C2=A0 trade offs you mean. <br></di=
v><div><br></div><div> From what I&#39;ve seen of Emacs Lisp bytecode, I th=
ink it would be a bit difficult to use something like vmgen without a lot o=
f effort.=C2=A0 In the interpreter for vmgen the objects are basically C ki=
nds of objects, not Lisp Objects. Perhaps that could be negotiated, but it =
would not be trivial.</div><div><br></div><div>As for JITing bytecode, have=
n&#39;t there been a couple of efforts in that direction already? Again, th=
is is probably hard. <br></div><div><br></div><div>I&#39;m not saying it sh=
ouldn&#39;t be done. Just that these are very serious projects requiring a =
lot of effort that would take a bit of time, and might cause instability in=
 the interim. All while=C2=A0 Emacs is moving forward on its own.<br></div>=
<div><br></div><div>But in any event, a prerequisite for considering doing =
this is to understand what we got right now. That&#39;s why I&#39;m trying =
to document that more people at least have an understanding of what we are =
talking about in the replacing or modifying the existing system. <br></div>=
<div><br></div><div>Right now I feel that there are only a handful of peopl=
e who understand bytecode, and even there maybe not in entirety. <br></div>=
<div><br></div></div></div></div>

--001a11466198b7cfcb0560f15309--