Bytecode interoperability: the good and bad

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Bytecode interoperability: the good and bad
@ 2017-12-22 10:51 Rocky Bernstein
  2017-12-22 14:08 ` Stefan Monnier
  0 siblings, 1 reply; 4+ messages in thread
From: Rocky Bernstein @ 2017-12-22 10:51 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1544 bytes --]

In documenting the ELisp Bytcode format and the compilation process, I have
a question.

In what I am calling a "Bytecode Function Literal" -- the vector of 4-6
objects that contains the parameter list, bytecode instructions, max stack,
constants vector, docstring , and interactive specification -  there is no
notion of what bytecode version is in effect.

(Is there a better term for Bytecode Function Literal?)

 The next larger kind of grouping of code is a bytecode file. A bytecode
file does contain a comment indicating the version of Emacs that was used
in compilation is recorded. However I am not sure that fact is made use of
such as to decide if a bytecode file can be run or not.

I have been able to run bytecode compiled in Emacs 25 on Emacs 24 and vice
versa, I think using load-file.

Is there a determination made in advance of whether a bytecode file
compatible with the current version of Emacs in effect? If so, how is that
done?

This is probably obvious, but I'll mention it anyway. The good side of
allowing bytecode across different releases of Emacs is that it allows one
to run bytecode from older and newer versions interoperably - when it
works. And that's the rub. Unless there is some check made, you don't know
if it will work. The program could crash, or worse do something unintended.

In other kinds of bytecode such as the one for C Python, a bytecode version
number is stored in the bytecode file. When there is a change to the
bytecode, that number is changed.

How does this work in Emacs Lisp?

Thanks.

[-- Attachment #2: Type: text/html, Size: 1880 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bytecode interoperability: the good and bad
  2017-12-22 10:51 Rocky Bernstein
@ 2017-12-22 14:08 ` Stefan Monnier
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2017-12-22 14:08 UTC (permalink / raw)
  To: emacs-devel

> In other kinds of bytecode such as the one for C Python, a bytecode version
> number is stored in the bytecode file.  When there is a change to the
> bytecode, that number is changed.

So far, the only changes that have been made to the byte-code language
is to add new (previously unused) byte codes.  So from this perspective
we have always maintained backward compatibility (you can run a .elc
compiled with an older Emacs).

We do not aim to maintain forward compatibility (so whether a .elc file
compiled with a more recent Emacs will work is not guaranteed), although
it sometimes does work.  When encountering an unknown byte-code, Emacs
signals an error, so it shouldn't cause a crash nor "something unintended".

Compatibility problems with .elc files compiled with other Emacs
versions can also come from macros, and those tend to be more frequent
than the problems introduced by changes to the byte-code.  So detecting
a different byte-code version is not sufficient to catch the most common
problems anyway.

FWIW, I think Emacs deserves a new Elisp compilation system (either
a new kind of bytecode (maybe using something like vmgen), or a JIT or
something): the bytecode we use is basically identical to the one we had
20 years ago, yet the tradeoffs have changed substantially in the
mean time.

        Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bytecode interoperability: the good and bad
@ 2017-12-22 17:41 Rocky Bernstein
  2017-12-22 20:05 ` Stefan Monnier
  0 siblings, 1 reply; 4+ messages in thread
From: Rocky Bernstein @ 2017-12-22 17:41 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4574 bytes --]

On Fri, 22 Dec 2017 09:08:33 -050 Stefan Monnier advised:

> In other kinds of bytecode such as the one for C Python, a bytecode
> version
> > number is stored in the bytecode file.  When there is a change to the
> > bytecode, that number is changed.
>
> So far, the only changes that have been made to the byte-code language
> is to add new (previously unused) byte codes.  So from this perspective
> we have always maintained backward compatibility (you can run a .elc
> compiled with an older Emacs).
>

While this is a nice intention, it isn't always true. And it is not with
downsides.

In the "not true" department, there are instructions 0153 scan_buffer and
0163 set_mark which aren't handled in the current interpreter sources in
bytecode.c

And as pipcet points out, there is this in lread.c:

  if (! version || version >= 22)
    readevalloop (Qget_file_char, &input, hist_file_name,
		  0, Qnil, Qnil, Qnil, Qnil);
  else
    {
      /* We can't handle a file which was compiled with
	 byte-compile-dynamic by older version of Emacs.  */
      specbind (Qload_force_doc_strings, Qt);
      readevalloop (Qget_emacs_mule_file_char, &input, hist_file_name,
		    0, Qnil, Qnil, Qnil, Qnil);
    }

In the "not without downsides" department, this means that when someone
looks at the bytecode interpreter, it is filled with garbage and bloat.
This has to have a technology debt associated with it.

We do not aim to maintain forward compatibility (so whether a .elc file
> compiled with a more recent Emacs will work is not guaranteed), although
> it sometimes does work.  When encountering an unknown byte-code, Emacs
> signals an error, so it shouldn't cause a crash nor "something unintended".
>

It is likely that the code that purports to handle obsolete (or no longer
emitted) instructions is broken, since I doubt any of this behavior is
tested. Subtle changes in the semantics of instructions can cause
unintended effects.

> Compatibility problems with .elc files compiled with other Emacs
> versions can also come from macros, and those tend to be more frequent
> than the problems introduced by changes to the byte-code.  So detecting
> a different byte-code version is not sufficient to catch the most common
> problems anyway.
>

My understanding of how this work in a more rational way would be that
there shouldn't be incompatible changes between major releases. So I would
hope that incompatible macro changes wouldn't happen within a major release
but between major releases, the same as I hope would be the case for
bytecode changes.

If someone is up for it, a possibly interesting program to write might be a
bytecode lint and report tool that shows the meta comment in bytecode to
describe what version of Emacs the bytecode was compiled under (comparing
with the current loaded version), what level of optimization is reported.
Possibly a scan over the instructions to look for incompatibility both in
the forward and backward direction.  It might optionally have knowledge of
specific version incompatibilities say because of macro changes between
versions.

Maybe this could be incorporated into a "safe-load-file" function.

> FWIW, I think Emacs deserves a new Elisp compilation system (either
> a new kind of bytecode (maybe using something like vmgen), or a JIT or
> something): the bytecode we use is basically identical to the one we had
> 20 years ago, yet the tradeoffs have changed substantially in the
> mean time.
>

I would  be interested in elaboration here about what specific  trade offs
you mean.

From what I've seen of Emacs Lisp bytecode, I think it would be a bit
difficult to use something like vmgen without a lot of effort.  In the
interpreter for vmgen the objects are basically C kinds of objects, not
Lisp Objects. Perhaps that could be negotiated, but it would not be trivial.

As for JITing bytecode, haven't there been a couple of efforts in that
direction already? Again, this is probably hard.

I'm not saying it shouldn't be done. Just that these are very serious
projects requiring a lot of effort that would take a bit of time, and might
cause instability in the interim. All while  Emacs is moving forward on its
own.

But in any event, a prerequisite for considering doing this is to
understand what we got right now. That's why I'm trying to document that
more people at least have an understanding of what we are talking about in
the replacing or modifying the existing system.

Right now I feel that there are only a handful of people who understand
bytecode, and even there maybe not in entirety.

[-- Attachment #2: Type: text/html, Size: 6015 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bytecode interoperability: the good and bad
  2017-12-22 17:41 Bytecode interoperability: the good and bad Rocky Bernstein
@ 2017-12-22 20:05 ` Stefan Monnier
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2017-12-22 20:05 UTC (permalink / raw)
  To: emacs-devel

> In the "not true" department, there are instructions 0153 scan_buffer and
> 0163 set_mark which aren't handled in the current interpreter sources in
> bytecode.c

Right, there are a few exceptions where we did remove old instructions.
I haven't heard of anyone using a new enough Emacs with an old enough
.elc file to bump into this problem, so I'm not worried.

> In the "not without downsides" department, this means that when someone
> looks at the bytecode interpreter, it is filled with garbage and bloat.
> This has to have a technology debt associated with it.

Of course, backward compatibility has its costs.

> It is likely that the code that purports to handle obsolete (or no longer
> emitted) instructions is broken,

It's possible, indeed.  Not sure about "likely", tho.

> since I doubt any of this behavior is tested.  Subtle changes in the
> semantics of instructions can cause unintended effects.

In any case, Emacs has plenty of real confirmed bugs affecting real
users that I don't worry too much about such hypotheticals.

I think Emacs should evolve (and is evolving) towards a model where .elc
files are handled completely automatically, so there's no need to
preserve backward compatibility at all, because we can just recompile
the source file.
[ Modulo supporting enough backward compatibility for bootstrapping
  purposes, since I also think we should get rid of the interpreter.  ]

> My understanding of how this work in a more rational way would be that
> there shouldn't be incompatible changes between major releases.  So I would
> hope that incompatible macro changes wouldn't happen within a major release
> but between major releases, the same as I hope would be the case for
> bytecode changes.

In theory, that's what we aim for, yes.

> Maybe this could be incorporated into a "safe-load-file" function.

Define "safe".

>> FWIW, I think Emacs deserves a new Elisp compilation system (either
>> a new kind of bytecode (maybe using something like vmgen), or a JIT or
>> something): the bytecode we use is basically identical to the one we had
>> 20 years ago, yet the tradeoffs have changed substantially in the
>> mean time.
> I would  be interested in elaboration here about what specific trade offs
> you mean.

Obviously, the performance characteristics of computers has changed
drastically, e.g. in terms of memory available, in terms of relative
costs of ALU instructions vs memory accesses, etc...

But more importantly, the kind of Elisp code run is quite different from
when the bytecode was introduced.  E.g. it's odd to have a byte-code for
`skip_chars_forward` but not for `apply`.  This said, I haven't done any
real bytecode profiling to say how much deserves to change.

> From what I've seen of Emacs Lisp bytecode, I think it would be a bit
> difficult to use something like vmgen without a lot of effort.  In the
> interpreter for vmgen the objects are basically C kinds of objects,
> not Lisp Objects.  Perhaps that could be negotiated, but it would not
> be trivial.

I haven't looked closely enough to be sure, but I didn't see anything
problematic: Lisp_Object in the C source code is very much a C object,
and that's what the current bytecode manipulates.

> As for JITing bytecode, haven't there been a couple of efforts in that
> direction already?  Again, this is probably hard.

It's a significant effort, yes, but the speed up could be significant
(the kind of JITing attempts so far haven't tried to optimize the code
at all, so it just removes some of the bytecode interpreter overhead,
whereas there is a lot more opportunity if you try to eliminate the type
checks included in each operation).

There are many fairly good experimental JITs for Javascript, so it's not
*that* hard.  It'd probably take an MSc thesis to get a prototype working.

> I'm not saying it shouldn't be done. Just that these are very serious
> projects requiring a lot of effort that would take a bit of time, and might
> cause instability in the interim. All while  Emacs is moving forward on its
> own.

Indeed.  Note that Emacs's bytecode hasn't been moving very much, so the
"parallel" development shouldn't be a problem.

> But in any event, a prerequisite for considering doing this is to
> understand what we got right now. That's why I'm trying to document that
> more people at least have an understanding of what we are talking about in
> the replacing or modifying the existing system.

I agree that documenting the current bytecode is a very good idea, and
I thank you for undertaking such an effort.

        Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-12-22 20:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-22 17:41 Bytecode interoperability: the good and bad Rocky Bernstein
2017-12-22 20:05 ` Stefan Monnier
  -- strict thread matches above, loose matches on Subject: below --
2017-12-22 10:51 Rocky Bernstein
2017-12-22 14:08 ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).