From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Rocky Bernstein Newsgroups: gmane.emacs.devel Subject: Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Date: Fri, 20 Mar 2020 17:23:22 -0400 Message-ID: References: <20200319203449.GA4180@ACM> <20200320201005.GC5255@ACM> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000d5c3d205a14fe54d" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="26407"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Stefan Monnier , emacs-devel To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Mar 20 22:24:14 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jFP7l-0006ha-5r for ged-emacs-devel@m.gmane-mx.org; Fri, 20 Mar 2020 22:24:13 +0100 Original-Received: from localhost ([::1]:59346 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jFP7k-0003YT-6z for ged-emacs-devel@m.gmane-mx.org; Fri, 20 Mar 2020 17:24:12 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44274) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jFP7C-000367-2S for emacs-devel@gnu.org; Fri, 20 Mar 2020 17:23:40 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jFP79-00066m-Ti for emacs-devel@gnu.org; Fri, 20 Mar 2020 17:23:37 -0400 Original-Received: from mail-lj1-f193.google.com ([209.85.208.193]:40993) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jFP79-00065x-JT for emacs-devel@gnu.org; Fri, 20 Mar 2020 17:23:35 -0400 Original-Received: by mail-lj1-f193.google.com with SMTP id o10so8012645ljc.8 for ; Fri, 20 Mar 2020 14:23:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zs8Qz5lOc0GRnpnwKgVV06SIlEiM7zP1hrwF5J6GXGk=; b=RLJyNiHNCtKXCnNpOvvwvdLA8zRLbNXC5viBDYZKB/4nB5af/zEKzhjxe8Sxpu2//d 1vK9cC33O1U3+gRQ8q1wutJVnIcAESHs/I6xhFgJnU1cihIZZpslKk1mddWkH/4GRZlw MBeoLF6WCa/o6ksgsHO/3w6bf3nw+uUt08wAR5je7Tt0stXB3bowZCcp6hdmqMka3+TZ bw02/7g605JMDIK6uWeNM3neVxCxJzVtuKgoWb/MX5FEZCOEqKu8uZ+78hPV6zhLEwFc EONCBZlNqE+DBo+wlRpbA1GvCv190+pzotmP8mCIYuazD783dP2fSFmZPZW+Fxad4HhH lxTQ== X-Gm-Message-State: ANhLgQ0PMO6fFrX4WVZhS02lsRgfpBNuA/3eUoY2Z4bpnXwtHnc4IVgo I17bFIwk4+eqNHgXb7ph3EtknkN9dEknQp2otps= X-Google-Smtp-Source: ADFU+vtvQACZi+0lLE+WT9qpcp7jdThiHj9iqNA5urJQ3l9IJEPdjPuwy/R22o00ZGQUrbNbziigyZvtE9T8Rga76u4= X-Received: by 2002:a2e:7e0a:: with SMTP id z10mr6510395ljc.42.1584739414031; Fri, 20 Mar 2020 14:23:34 -0700 (PDT) In-Reply-To: <20200320201005.GC5255@ACM> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.85.208.193 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:245613 Archived-At: --000000000000d5c3d205a14fe54d Content-Type: text/plain; charset="UTF-8" Before I begin, as has been pointed out, let us be clear that the discussion has changed. Originally I was interested in better call stack and traceback information, a *run-time *thing, which I was proposing as a Summer of Code project. The discussion now is compiler locations at *compile* time. So be it. The problems however do have one thing in common: how to represent a location. Let me also correct one earler correction that the "better" location was construed to be a single line and column number. A better way, I believe, to think of locations is as 1. an *container*, where *container* is defined to be something, 2. an offset off of that container, of some kind where "units" are defined to be something, and 3. an optional length of those units. When the length isn't given it is assumed to be the value one. For example, if you are intersted in only representing a line and column number, one value, an offset would do it. Note that this abstraction works equally well for other kinds of things like bytecode and the offset would be the bytecode offset. Many times contiguous sequence of bytecode many times maps to a contiguous sequence in the source code. Of course that's not necessarily *always* the case, but already this is wandering astray of the proposal to follow for me describe how to deal with this. But let me say again if you just care about a single bytecode instruction, set the length to be 1 or leave out the length field. I know this might not be satisfying to some, but here is a extremely simple but accurate proposal that and doesn't incur a lot of overhead and can deal with a lot of generality. A unit of compilation I think is a *function. *That is the container part. Attach to the function its location information in some other way (e.g. it's container might be a file name if that is appropriate, or defined inside a macro...) A function before bytecompile compiles it is a kind of lambda which is a kind of S-Expression. A location inside that could simply be a tree node's preorder number. Or the pre-order number and a number of successor nodes in preorder traversal. As with the simple-minded run-time error location proposal: when we have a bytecode offset, mark that position in a disassembly, the same thing can be done here: show the position or range of nodes in the S-expresion that you've got. What if the bytecode compiler has done some wild and weird optimization changes? Just show what S-exp you were working on and mark where you were. I know for some or many it may not be satisfying, but it is the honest truth and I'd rather have that than nothing or the wrong guess. Having done this first step, the problem is divided a little bit so carry on: discuss and conquer. A separate tool outside of the compiler proper can be written to take this and given pointers to where the source might be located figure out where in the source code that might be. Maybe pattern matching would work, dunno, but let me not try to speculate too much. Finally, in this proposal though I am not suggesting changing the current behavior: by default the additional precise geeky information might be shown only in some sort of "super hacker" verbose compilation mode. On Fri, Mar 20, 2020 at 4:10 PM Alan Mackenzie wrote: > Hello, Stefan. > > On Thu, Mar 19, 2020 at 17:41:30 -0400, Stefan Monnier wrote: > > > things like cconv.el here). More to the point, users' macros chew up > and > > > spit out cons cells, and we have no control over them. So whilst we > > > could, with a lot of tedious effort, clean up our own software to > > > preserve cons cells (believe me, I've tried), this would fail in users' > > > macros. > > > I think fat-cons cells are cheap to implement (with (hopefully) no > > performance impact when not used ..... > > They may be cheap to implement in themselves, but adapting the entire > byte compiler and all our macros to the heavily restricted semantics > they would impose would be an enormous job. I've tried something > similar, and gave up in exhaustion. > > > or weird semantic artifacts like the fat-symbol approach you tried), > > Er, not "tried" but "implemented", please. The implementation was > complete, and was capable of bootstrapping Emacs with correct positions > for all the (then plentiful) warning messages. > > > and can work 99.9% right in the long term with an incremental way to > > get there. > > Where does this 99.9% come from? How is this cons tracking you're > proposing supposed to work, when there are an infinite number of > occurrences of the likes of > > (cons (car form) (cdr form)) > > in our code? > > > Furthermore it matches the "usual" way to deal with this problem, so > > there's very little doubt about whether it can work or not. > > Are you saying that this is how other Lisp compilers deal with source > code positions? How do they deal with the difficult problem of user > macros? Could you give me an example of a free Lisp system which works > this way? I'd be interested in having a look at it. > > I think there's quite a bit of doubt as to whether this could work > effectively in Emacs. The way to dispel this doubt is for Somebody (tm) > to implement it. > > > > Since then I've worked a fair bit on creating a "double" Emacs core, > > > one core being for normal use, the other for byte compiling. > > > There's a fair amount of work still to do on this, but I know how to > > > do it. The problem is that I have been discouraged by the prospect > > > of having this solution vetoed too, since it will make Emacs quite a > > > bit bigger. > > > I'd probably try to veto it, indeed. It might be a good solution in > > the short-term but it'd just slow down our progress in the long term. > > Fixing bugs slows down our progress? > > To which the answer is to install the working solution pending the > implementation of something better, after which it can be superseded. > Somehow, even that strategy tends to get vetoed. > > > Stefan > > -- > Alan Mackenzie (Nuremberg, Germany). > --000000000000d5c3d205a14fe54d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Before I begin, as has been pointed = out, let us be clear that the discussion has changed. Originally I was inte= rested in better call stack and traceback information, a run-time th= ing,=C2=A0 which I was proposing as a Summer of Code project. The discussio= n now is compiler locations at compile time.=C2=A0=C2=A0
<= br>
So be it.=C2=A0

The problems however= do have one thing in common: how to represent a location.=C2=A0
=
Let me also correct one earler=C2=A0correction that the &quo= t;better" location was construed to be a single line and column number= . A better=C2=A0way, I believe, to think of locations is as=C2=A0

  1. an container, where container is defi= ned to be something,=C2=A0
  2. an offset off of that container, of some= kind where "units" are defined to be something, and=C2=A0
  3. an optional length of those units. When the length isn't given it is = assumed to be the value one.

For example= , if you are intersted=C2=A0in only representing a line and column number, = one value, an offset would do it.=C2=A0

Note that = this abstraction works equally well for other kinds of things like bytecode= and the offset would be the bytecode offset. Many times contiguous sequenc= e of bytecode many times maps to a contiguous sequence in the source code. = Of course that's not necessarily always the case, but already th= is is wandering astray of the proposal to follow for me describe how to dea= l with this. But let me say again if you just care about a single bytecode = instruction, set the length to be 1 or leave out the length field.

I know this might not be satisfying to some, but here is a= extremely=C2=A0simple but accurate proposal that and doesn't incur a l= ot of overhead and can deal with a lot of generality.=C2=A0

<= /div>
A unit of compilation I think is a function. That is the c= ontainer part. Attach to the function its location information in some othe= r way (e.g. it's container might be a file name if that is appropriate,= or defined inside a macro...)=C2=A0

A function be= fore bytecompile compiles it is a kind of lambda which is a kind of S-Expre= ssion. A location inside that could simply be a tree node's preorder=C2= =A0number. Or the pre-order number and a number of successor nodes in preor= der traversal. As with the simple-minded run-time error location proposal: = when we have a bytecode offset,=C2=A0 mark that position in a disassembly, = the same thing can be done here: show the position or range of nodes in the= S-expresion=C2=A0that you've got.=C2=A0

What = if the bytecode compiler has done some wild and weird optimization changes?= =C2=A0 Just show what S-exp you were working on and mark where you were.=C2= =A0

I know for some or many it may not be satisfyi= ng, but it is the honest truth and I'd rather have that than nothing or= the wrong guess.=C2=A0

Having done this first ste= p, the problem is divided a little bit so carry on: discuss and conquer. A = separate tool outside of the compiler proper can be written to take this an= d given pointers to where the source might be located figure out where in t= he source code that might be. Maybe pattern matching would work, dunno, but= let me not try to speculate too much.=C2=A0

Final= ly, in this proposal though I am not suggesting changing=C2=A0the current b= ehavior: by default the additional precise geeky information might be shown= only in some sort of "super hacker" verbose=C2=A0compilation mod= e.=C2=A0

On Fri, Mar 20, 2020 at 4:10 PM Alan Mackenzie <acm@muc.de> wrote:
Hello, Stefan.

On Thu, Mar 19, 2020 at 17:41:30 -0400, Stefan Monnier wrote:
> > things like cconv.el here).=C2=A0 More to the point, users' m= acros chew up and
> > spit out cons cells, and we have no control over them.=C2=A0 So w= hilst we
> > could, with a lot of tedious effort, clean up our own software to=
> > preserve cons cells (believe me, I've tried), this would fail= in users'
> > macros.

> I think fat-cons cells are cheap to implement (with (hopefully) no
> performance impact when not used .....

They may be cheap to implement in themselves, but adapting the entire
byte compiler and all our macros to the heavily restricted semantics
they would impose would be an enormous job.=C2=A0 I've tried something<= br> similar, and gave up in exhaustion.

> or weird semantic artifacts like the fat-symbol approach you tried),
Er, not "tried" but "implemented", please.=C2=A0 The im= plementation was
complete, and was capable of bootstrapping Emacs with correct positions
for all the (then plentiful) warning messages.

> and can work 99.9% right in the long term with an incremental way to > get there.

Where does this 99.9% come from?=C2=A0 How is this cons tracking you're=
proposing supposed to work, when there are an infinite number of
occurrences of the likes of

=C2=A0 =C2=A0 (cons (car form) (cdr form))

in our code?

> Furthermore it matches the "usual" way to deal with this pro= blem, so
> there's very little doubt about whether it can work or not.

Are you saying that this is how other Lisp compilers deal with source
code positions?=C2=A0 How do they deal with the difficult problem of user macros?=C2=A0 Could you give me an example of a free Lisp system which work= s
this way?=C2=A0 I'd be interested in having a look at it.

I think there's quite a bit of doubt as to whether this could work
effectively in Emacs.=C2=A0 The way to dispel this doubt is for Somebody (t= m)
to implement it.

> > Since then I've worked a fair bit on creating a "double&= quot; Emacs core,
> > one core being for normal use, the other for byte compiling.
> > There's a fair amount of work still to do on this, but I know= how to
> > do it.=C2=A0 The problem is that I have been discouraged by the p= rospect
> > of having this solution vetoed too, since it will make Emacs quit= e a
> > bit bigger.

> I'd probably try to veto it, indeed.=C2=A0 It might be a good solu= tion in
> the short-term but it'd just slow down our progress in the long te= rm.

Fixing bugs slows down our progress?

To which the answer is to install the working solution pending the
implementation of something better, after which it can be superseded.
Somehow, even that strategy tends to get vetoed.

>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Stefan

--
Alan Mackenzie (Nuremberg, Germany).
--000000000000d5c3d205a14fe54d--