Re: Emacs-devel Digest, Vol 166, Issue 137

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: Emacs-devel Digest, Vol 166, Issue 137
       [not found] <mailman.13940.1513973159.27992.emacs-devel@gnu.org>
@ 2017-12-22 23:46 ` Rocky Bernstein
  2017-12-23  1:30   ` Stefan Monnier
  0 siblings, 1 reply; 4+ messages in thread
From: Rocky Bernstein @ 2017-12-22 23:46 UTC (permalink / raw)
  To: emacs-devel; +Cc: Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 6409 bytes --]

On Fri, 22 Dec 2017 15:05:39 -050 Stefan Monnier informs:

>
> I think Emacs should evolve (and is evolving) towards a model where .elc
> files are handled completely automatically, so there's no need to
> preserve backward compatibility at all, because we can just recompile
> the source file.
>

If you mean always keep the source code around in the bytecode file, I'm
all for that!

If not, we're back to that discussion on how to find the source text for a
given bytecode file and failing that (or in addition to that) having decent
decompilers for bytecode.

[ Modulo supporting enough backward compatibility for bootstrapping
>   purposes, since I also think we should get rid of the interpreter.  ]
>
> > My understanding of how this work in a more rational way would be that
> > there shouldn't be incompatible changes between major releases.  So I
> would
> > hope that incompatible macro changes wouldn't happen within a major
> release
> > but between major releases, the same as I hope would be the case for
> > bytecode changes.
>
> In theory, that's what we aim for, yes.
>

Good. If that's the case then most of the cases you report, such as where
the macro expansion is incompatible,  could be detected just by checking if
the compiler used in compilation has the same major number as the bytecode
interpreter.

> > Maybe this could be incorporated into a "safe-load-file" function.
>
> Define "safe"
>

Okay. Let me call it then "safer" then. And I will define that: detecting
problems that can be reasonably detected in advance of hitting them instead
of giving a ¯\_(ツ)_/¯ traceback.
Recently have come to learn it can be worse because checks are not done on
bytecode...

Want to crash emacs immediately without a traceback? Run

emacs -batch -Q --eval '(print (#[0 "\300\207" [] 0]))'

How many times this year have I run into the problem this year, also
seen by others judging by reports on the Internet, of Emacs blithely
running probably an incompatible version of cl-lib.

The bytecode file for cl-lib no doubt had in it "Hey, I'm emacs 24."
and I probably ran that on Emacs 25 where there was an incompatibility
that can happen between major releases.

If that were the case (and although probably it is not the *only*
scenario case)  how much nicer would it have been if a safer-load-file
 warned me about running version 24 bytecode.

And if such a safer-load-file package were in ELPA or something where
packages are updated much more frequently than Emacs, when such
conditions arise, the safer-load-file could add a check for this
particular cl-lib incompatibility between the particular major
releases

¯
> >> FWIW, I think Emacs deserves a new Elisp compilation system (either
> >> a new kind of bytecode (maybe using something like vmgen), or a JIT or
> >> something): the bytecode we use is basically identical to the one we had
> >> 20 years ago, yet the tradeoffs have changed substantially in the
> >> mean time.
> > I would  be interested in elaboration here about what specific trade offs
> > you mean.
>
> Obviously, the performance characteristics of computers has changed
> drastically, e.g. in terms of memory available, in terms of relative
> costs of ALU instructions vs memory accesses, etc...
>
> But more importantly, the kind of Elisp code run is quite different from
> when the bytecode was introduced.  E.g. it's odd to have a byte-code for
> `skip_chars_forward` but not for `apply`.  This said, I haven't done any
> real bytecode profiling to say how much deserves to change.
>

There are free opcode space available. "apply" could be added is someone
chooses to add it.

> > From what I've seen of Emacs Lisp bytecode, I think it would be a bit
> > difficult to use something like vmgen without a lot of effort.  In the
> > interpreter for vmgen the objects are basically C kinds of objects,
> > not Lisp Objects.  Perhaps that could be negotiated, but it would not
> > be trivial.
>
> I haven't looked closely enough to be sure, but I didn't see anything
> problematic: Lisp_Object in the C source code is very much a C object,
> and that's what the current bytecode manipulates.
>

There may be some glibness here. The benefits of using a lower-level
general-purpose intermediate language like LLVM IR or vmgen is that because
it a lower level, working with registers and pointers, understands some
structure layouts, and is more statically typed. So efficiency can be
gained by specialization.  But if one doesn't break down Lisp_Object and
uses that in the same way the C interpreter currently does, then I don't
see why vmgen will be any faster than the current interpreter. (Other than
the benefit that would also be had by rewriting the interpreter without the
bloat and compatibility overhead)

> > As for JITing bytecode, haven't there been a couple of efforts in that
> > direction already?  Again, this is probably hard.
>
> It's a significant effort, yes, but the speed up could be significant
> (the kind of JITing attempts so far haven't tried to optimize the code
> at all, so it just removes some of the bytecode interpreter overhead,
> whereas there is a lot more opportunity if you try to eliminate the type
> checks included in each operation).
>
> There are many fairly good experimental JITs for Javascript, so it's not
> *that* hard.  It'd probably take an MSc thesis to get a prototype working.
>
> > I'm not saying it shouldn't be done. Just that these are very serious
> > projects requiring a lot of effort that would take a bit of time, and
> might
> > cause instability in the interim. All while  Emacs is moving forward on
> its
> > own.
>
> Indeed.  Note that Emacs's bytecode hasn't been moving very much, so the
> "parallel" development shouldn't be a problem.
>
> > But in any event, a prerequisite for considering doing this is to
> > understand what we got right now. That's why I'm trying to document that
> > more people at least have an understanding of what we are talking about
> in
> > the replacing or modifying the existing system.
>
> I agree that documenting the current bytecode is a very good idea, and
> I thank you for undertaking such an effort.
>

Thanks for the kind words. It's not something I feel all that knowledgeable
or qualified to do.

[-- Attachment #2: Type: text/html, Size: 8403 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Emacs-devel Digest, Vol 166, Issue 137
  2017-12-22 23:46 ` Emacs-devel Digest, Vol 166, Issue 137 Rocky Bernstein
@ 2017-12-23  1:30   ` Stefan Monnier
  2017-12-23  2:42     ` Rocky Bernstein
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Monnier @ 2017-12-23  1:30 UTC (permalink / raw)
  To: Rocky Bernstein; +Cc: emacs-devel

> If you mean always keep the source code around in the bytecode file, I'm
> all for that!

No, I'm thinking rather of not keeping the bytecode file around at all,
or to store it in an "internal cache" (which could be kept in the
file-system but could be erased at any point and without any command to
"load a bytecode file").

>> > Maybe this could be incorporated into a "safe-load-file" function.
>> Define "safe"
> Okay. Let me call it then "safer" then. And I will define that: detecting
> problems that can be reasonably detected in advance of hitting them instead
> of giving a ¯\_(ツ)_/¯ traceback.

As long as we want to support the case of using a .elc file without the
corresponding .el file, some users would be very annoyed if Emacs-27
prevents them from using a file compiled with Emacs-23 just because
there's a risk that you might hit an error later.

> Recently have come to learn it can be worse because checks are not done on
> bytecode...
> Want to crash emacs immediately without a traceback?  Run
> emacs -batch -Q --eval '(print (#[0 "\300\207" [] 0]))'

Yes, BYTE_CODE_SAFE is not safe enough, agreed.  Patches welcome.

> The bytecode file for cl-lib no doubt had in it "Hey, I'm emacs 24."
> and I probably ran that on Emacs 25 where there was an incompatibility
> that can happen between major releases.

I doubt that's the problem [ cl-lib has rather unusual problems in
this respect, and I suspect that you hit one of those, which don't have
anything to do with the bytecode but with the fact that the cl-lib on
GNU ELPA only works on Emacsen that don't come with cl-lib.
It includes hacks to try and avoid the problem, but maybe you've seen
people using odd configurations or older versions of GNU ELPA's cl-lib
where those hacks weren't sufficiently refined.  ]

> If that were the case (and although probably it is not the *only*
> scenario case)  how much nicer would it have been if a safer-load-file
> warned me about running version 24 bytecode.

Do you have actual byte-code generated by Emacs-24's bytecomp.el which
causes a serious error when used on more recent Emacsen?

For cl-lib, the problem is not that the cl-lib compiled with Emacs-24.1
contains invalid bytecode for Emacs-27, but that for some reason that
cl-lib ended up earlier in the load-path than the cl-lib that comes with
Emacs-27 and it's the bundled cl-lib which should be loaded.

> There are free opcode space available. "apply" could be added is someone
> chooses to add it.

I'm just pointing out differences that illustrate the fact that
tradeoffs were quite different.

> There may be some glibness here. The benefits of using a lower-level
> general-purpose intermediate language like LLVM IR or vmgen is that because
> it a lower level, working with registers and pointers, understands some
> structure layouts, and is more statically typed. So efficiency can be
> gained by specialization.  But if one doesn't break down Lisp_Object and
> uses that in the same way the C interpreter currently does, then I don't
> see why vmgen will be any faster than the current interpreter. (Other than
> the benefit that would also be had by rewriting the interpreter without the
> bloat and compatibility overhead)

The idea is not to rewrite the same thing with vmgen, indeed.  It's to
design a new bytecode (which would hopefully streamline some use cases,
e.g. allow processing function calls more efficiently, maybe also
`apply` more efficiently, which would allow allocating closures with
a single malloc of an object of size proportional to the number of free
variables, ...), and while we're at it, use something like vmgen, so
that we benefit from other people's efforts at improving the efficiency
of the dispatch.

> Thanks for the kind words. It's not something I feel all that knowledgeable
> or qualified to do.

That makes you a good person to write it.
Feel free to ask questions about parts you don't understand or aren't sure.

        Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Emacs-devel Digest, Vol 166, Issue 137
  2017-12-23  1:30   ` Stefan Monnier
@ 2017-12-23  2:42     ` Rocky Bernstein
  2017-12-23  3:06       ` Stefan Monnier
  0 siblings, 1 reply; 4+ messages in thread
From: Rocky Bernstein @ 2017-12-23  2:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 6337 bytes --]

On Fri, Dec 22, 2017 at 8:30 PM, Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> > If you mean always keep the source code around in the bytecode file, I'm
> > all for that!
>
> No, I'm thinking rather of not keeping the bytecode file around at all,
> or to store it in an "internal cache" (which could be kept in the
> file-system but could be erased at any point and without any command to
> "load a bytecode file").
>

Sounds like a JIT. Sure, that would address some of the run-time
deficiencies I feel are there.

> >> > Maybe this could be incorporated into a "safe-load-file" function.
> >> Define "safe"
> > Okay. Let me call it then "safer" then. And I will define that: detecting
> > problems that can be reasonably detected in advance of hitting them
> instead
> > of giving a ¯\_(ツ)_/¯ traceback.
>
> As long as we want to support the case of using a .elc file without the
> corresponding .el file, some users would be very annoyed if Emacs-27
> prevents them from using a file compiled with Emacs-23 just because
> there's a risk that you might hit an error later.
>

Is there *any *testing done running emacs code on older bytecode, either in
the normal course of testing or before a release? If not, given the
expectations that you ascribe to users, there probably should be.

safer-load-file should have the knowledge of what is safe; and this can be
as fine grained as one is willing to put into safer-load-file. (Some of
this could be determined by the old package and bytecode testing).

If this seems cumbersome, I feel it is part of the responsibility incurred
in overhead of compatibility. Other languages  just store a version number
in the object file and only allow you to run things that have been compiled
by the current major release version compiler.

If you are saying that *all* Emacs-23 packages and bytecode runs on
Emacs-27 then, sure, safer-load-file doesn't need to do anything.

> > Recently have come to learn it can be worse because checks are not done
> on
> > bytecode...
> > Want to crash emacs immediately without a traceback?  Run
> > emacs -batch -Q --eval '(print (#[0 "\300\207" [] 0]))'
>
> Yes, BYTE_CODE_SAFE is not safe enough, agreed.  Patches welcome.
>

Actually, here, I think the way to go is in the Emacs-in-rust. project. So
if I address this, it will probably be via rust. And I will probably
include better run-time support for tracebacks that I think is lacking.

> > The bytecode file for cl-lib no doubt had in it "Hey, I'm emacs 24."
> > and I probably ran that on Emacs 25 where there was an incompatibility
> > that can happen between major releases.
>
> I doubt that's the problem [ cl-lib has rather unusual problems in
> this respect, and I suspect that you hit one of those, which don't have
> anything to do with the bytecode but with the fact that the cl-lib on
> GNU ELPA only works on Emacsen that don't come with cl-lib.
> It includes hacks to try and avoid the problem, but maybe you've seen
> people using odd configurations or older versions of GNU ELPA's cl-lib
> where those hacks weren't sufficiently refined.  ]
>
> > If that were the case (and although probably it is not the *only*
> > scenario case)  how much nicer would it have been if a safer-load-file
> > warned me about running version 24 bytecode.
>
> Do you have actual byte-code generated by Emacs-24's bytecomp.el which
> causes a serious error when used on more recent Emacsen?
>

I think you are missing the point. You said that internal package
incompatibilities are kept between major release. So the compile version
that is stored in comments in the bytecode file is a proxy for sets of
versions of the standard library packages. It's not just about bytecode
itself.

A tool like safer-load-file could be told which sets of *packages* are
incompatible and warn when it sees a problem.

> For cl-lib, the problem is not that the cl-lib compiled with Emacs-24.1
> contains invalid bytecode for Emacs-27, but that for some reason that
> cl-lib ended up earlier in the load-path than the cl-lib that comes with
> Emacs-27 and it's the bundled cl-lib which should be loaded.
>

> > There are free opcode space available. "apply" could be added is someone
> > chooses to add it.
>
> I'm just pointing out differences that illustrate the fact that
> tradeoffs were quite different.
>

Personally I think you are ascribing too much intentionality; this could
just as easily be explained by oversight.

> > There may be some glibness here. The benefits of using a lower-level
> > general-purpose intermediate language like LLVM IR or vmgen is that
> because
> > it a lower level, working with registers and pointers, understands some
> > structure layouts, and is more statically typed. So efficiency can be
> > gained by specialization.  But if one doesn't break down Lisp_Object and
> > uses that in the same way the C interpreter currently does, then I don't
> > see why vmgen will be any faster than the current interpreter. (Other
> than
> > the benefit that would also be had by rewriting the interpreter without
> the
> > bloat and compatibility overhead)
>
> The idea is not to rewrite the same thing with vmgen, indeed.  It's to
> design a new bytecode (which would hopefully streamline some use cases,
> e.g. allow processing function calls more efficiently, maybe also
> `apply` more efficiently, which would allow allocating closures with
> a single malloc of an object of size proportional to the number of free
> variables, ...), and while we're at it, use something like vmgen, so
> that we benefit from other people's efforts at improving the efficiency
> of the dispatch.
>

This still all sounds a little loose, and not fully formed.  But if people
want to work on this, far be it for me to suggest what others work on. I'm
sure there is benefit; and above all, I hope I am not spoiling your fun.

> > Thanks for the kind words. It's not something I feel all that
> knowledgeable
> > or qualified to do.
>
> That makes you a good person to write it.
> Feel free to ask questions about parts you don't understand or aren't sure.
>

Sure. When this is more fully finished I'll send out notice.

>
>         Stefan
>

[-- Attachment #2: Type: text/html, Size: 8637 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Emacs-devel Digest, Vol 166, Issue 137
  2017-12-23  2:42     ` Rocky Bernstein
@ 2017-12-23  3:06       ` Stefan Monnier
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Monnier @ 2017-12-23  3:06 UTC (permalink / raw)
  To: Rocky Bernstein; +Cc: emacs-devel

>> No, I'm thinking rather of not keeping the bytecode file around at all,
>> or to store it in an "internal cache" (which could be kept in the
>> file-system but could be erased at any point and without any command to
>> "load a bytecode file").
> Sounds like a JIT.

Yes and no.  JIT usually applies to the case of generating machine code,
whereas the above scheme doesn't necessarily imply anything else than
the current scheme of compiling to a .elc file.

> Is there *any* testing done running emacs code on older bytecode, either in
> the normal course of testing or before a release?

Testing, as in a test-suite?  No.

Only testing in the form of people out there trying out Emacs from Git,
or from the pretests and reporting problems.

Note that this kind of testing leaves a lot of false negatives: often
they don't report the problems, because they try to workaround the
problems themselves (they don't behave as testers and don't feel
a responsibility to file bug reports when running prerelease versions;
instead they'll ask for help in various fora where other users give them
hints for how to workaround problems).

> If not, given the expectations that you ascribe to users, there
> probably should be.

Probably.  Don't count on me to do that work, tho.

> If you are saying that *all* Emacs-23 packages and bytecode runs on
> Emacs-27 then, sure, safer-load-file doesn't need to do anything.

They don't all run, but they "should" all run.  We know of some
exceptions, and there are probably some more exceptions we don't know
about, but by and large this has worked well enough (largely controlled
by the amount of bug reports we receive).

> A tool like safer-load-file could be told which sets of *packages* are
> incompatible and warn when it sees a problem.

We've had some such warnings in the past for some particular known
problems, but nothing "systematic".  I think the problem is a lack of
motivation because in most cases the problems will be apparent soon
enough that there's not much benefit to extra checks.

>> > There are free opcode space available. "apply" could be added is someone
>> > chooses to add it.
>> I'm just pointing out differences that illustrate the fact that
>> tradeoffs were quite different.
> Personally I think you are ascribing too much intentionality; this could
> just as easily be explained by oversight.

Could be.

> This still all sounds a little loose, and not fully formed.

It's not fully formed at all.  Otherwise I'd probably have a patch for it.

        Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-12-23  3:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.13940.1513973159.27992.emacs-devel@gnu.org>
2017-12-22 23:46 ` Emacs-devel Digest, Vol 166, Issue 137 Rocky Bernstein
2017-12-23  1:30   ` Stefan Monnier
2017-12-23  2:42     ` Rocky Bernstein
2017-12-23  3:06       ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).