Thoughts on the buffer positions in the byte compiler's warning messages.

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Thoughts on the buffer positions in the byte compiler's warning messages.
@ 2016-09-18 15:23 Alan Mackenzie
  2016-09-18 17:38 ` Alan Mackenzie
  2016-09-18 18:21 ` Helmut Eller
  0 siblings, 2 replies; 4+ messages in thread
From: Alan Mackenzie @ 2016-09-18 15:23 UTC (permalink / raw)
  To: emacs-devel

Hello, Emacs.

The byte compiler reporting wrong positions in its warning messages is a
long standing problem.  See bugs #2681, #8774, #9109, #22288, #24128,
#24449.  #24449 and #2681 have recently been fixed.

The compiler's difficulty comes from how it reads the source code.  It
actually _reads_ it (in the lisp sense) then gets to work on the lisp
form produced, rather than reading (in the file access sense) one line
at a time and processing that, the way typical compilers do.

So, how does the byte compiler produce any position information at all?
It does so because the reader, in addition to producing the lisp form,
also produces a linear alist of the positions each symbol it encountered
was found at.  So, if the form were:

    (defun foo (bar)
      (baz))

, the alist (called read-symbol-positions-list) would look something
like:

    ((defun . 1) (foo . 7) (bar . 12) (baz . 20))

This alist is the sole source of information the compiler has to link
symbols in the form being compiled with source positions.  It does this
(in function byte-compile-set-symbol-position, which takes a single
argument, a symbol) by searching this alist for the NEXT occurrence of
the desired symbol.  So that, for example, if there were a warning
concerning "(baz)", that function would search forward from the "current
position", find (baz . 20) in read-symbol-positions-list, and from 20 it
calculates the pertinent line and column positions.

Not surprisingly, it often gets things wrong.  For example, if a warning
message is output before byte-compile-set-symbol-position has been
called for the pertinent symbol, the line and column output will be that
of the previous symbol.  This happens in bug #8774, where in:

 1  (defun fix-page-breaks ()
 2    "Fix page breaks in SAS 6 print files."
 3    (interactive)
 4    (save-excursion
 5      (goto-char (point-min))
 6      (if (looking-at "\f") (delete-char 1))
 7      (replace-regexp "^\\(.+\\)\f" "\\1\n\f\n")
 8      (goto-char (point-min))
 9      (replace-regexp "^\f\\(.+\\)" "\f\n\\1")
10          (goto-char (point-min))))

, the output messages are:

    ~/eglen.el:6:28:Warning: `replace-regexp' is for interactive use only; use
        `re-search-forward' and `replace-match' instead.
    ~/eglen.el:7:6:Warning: `replace-regexp' is for interactive use only; use
        `re-search-forward' and `replace-match' instead.

Note the positions - 6:28 points at "delete-char", and 7:6, apparently
correct, points at "replace-regexp".  Trouble is, both are wrong: the
first message should point at 7:6, and the second at 9:6.  This would
actually be fairly easy to fix, by centralising the point where
byte-compile-set-symbol-position is called, into byte-compile-form, at
the same time removing it from direct error-checking functions.

The problem with this whole mechanism is that it is strictly
left-to-right.  Once the "current-position" has passed a symbol, there
is no going back to it.  This works, more or less, with straight code.
Where a form is first transformed (whether by the byte code optimiser,
macro expansion, or the closure conversion, or whatever) and then
compiled, the "current position" becomes foggy indeed.  The macro
expander has its own routines for outputting messages (which I don't
understand at the moment), but even so, sometimes gets it wrong.

######################################################################### 

I've been trying to come up with a general solution to these problems.
What I have at the moment, which is rather vague, amounts to this:

After the reader has produced the form to be compiled and
read-symbol-positions-list, we combine these to produce a @dfn{shadow
form} with the same shape as the form, but where there's a symbol in the
form, there is a corresponding list in the shadow form, noting the
corresponding "position" in the form, and onto which warning/error
messages can be pushed.  These can then be output at the end of the
compilation.

The info in the shadow form will allow the correct node corresponding to
one in the form to be found, thus correct line/column numbers in
messages are assured for normal code.  Possibly a hash table will serve
somehow to speed up searches.

For transformed code (macro invocations, optimised forms, etc.), things
become more difficult.  However, these transformations mostly leave most
of the cons cells in the form unchanged, just rearranging them somewhat.
So the "pointers" in the shadow form will continue to be associated with
them, enabling accurate warning messages even here.

Obviously, this mechanism would cause the byte compiler to run more
slowly.  Whether or not this is significant or not would be down to
experience.

Comments?

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts on the buffer positions in the byte compiler's warning messages.
  2016-09-18 15:23 Thoughts on the buffer positions in the byte compiler's warning messages Alan Mackenzie
@ 2016-09-18 17:38 ` Alan Mackenzie
  2016-10-13 22:58   ` Andreas Politz
  2016-09-18 18:21 ` Helmut Eller
  1 sibling, 1 reply; 4+ messages in thread
From: Alan Mackenzie @ 2016-09-18 17:38 UTC (permalink / raw)
  To: emacs-devel

Hello, Emacs.

On Sun, Sep 18, 2016 at 03:23:03PM +0000, Alan Mackenzie wrote:
> ######################################################################### 

> I've been trying to come up with a general solution to these problems.
> What I have at the moment, which is rather vague, amounts to this:

> After the reader has produced the form to be compiled and
> read-symbol-positions-list, we combine these to produce a @dfn{shadow
> form} with the same shape as the form, but where there's a symbol in the
> form, there is a corresponding list in the shadow form, noting the
> corresponding "position" in the form, and onto which warning/error
> messages can be pushed.  These can then be output at the end of the
> compilation.

> The info in the shadow form will allow the correct node corresponding to
> one in the form to be found, thus correct line/column numbers in
> messages are assured for normal code.  Possibly a hash table will serve
> somehow to speed up searches.

> For transformed code (macro invocations, optimised forms, etc.), things
> become more difficult.  However, these transformations mostly leave most
> of the cons cells in the form unchanged, just rearranging them somewhat.
> So the "pointers" in the shadow form will continue to be associated with
> them, enabling accurate warning messages even here.

> Obviously, this mechanism would cause the byte compiler to run more
> slowly.  Whether or not this is significant or not would be down to
> experience.

> Comments?

Actually, with a bit more thought, the above is totally over the top.

What's needed is to construct a hash table whose key is a cons cell in
the form which the reader has just built, and whose value is the
position of the symbol in the car of that cons cell.  OK, something is
needed for vectors, too, and maybe one or two other things.

This hash table can easily be built from the available information (the
form and read-symbol-positions-list), and once the mechanism is seen to
be working we could get the reader to produce this hash table directly.

Then when we want to output a diagnostic, in addition to passing the
string to byte-compile-warn, we also pass a cons cell representing the
position we want output.

This should work.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts on the buffer positions in the byte compiler's warning messages.
  2016-09-18 15:23 Thoughts on the buffer positions in the byte compiler's warning messages Alan Mackenzie
  2016-09-18 17:38 ` Alan Mackenzie
@ 2016-09-18 18:21 ` Helmut Eller
  1 sibling, 0 replies; 4+ messages in thread
From: Helmut Eller @ 2016-09-18 18:21 UTC (permalink / raw)
  To: emacs-devel

On Sun, Sep 18 2016, Alan Mackenzie wrote:

> After the reader has produced the form to be compiled and
> read-symbol-positions-list, we combine these to produce a @dfn{shadow
> form} with the same shape as the form, but where there's a symbol in the
> form, there is a corresponding list in the shadow form, noting the
> corresponding "position" in the form, and onto which warning/error
> messages can be pushed.  These can then be output at the end of the
> compilation.

It would be problematic to pass such shadow-forms to macros because
macros would be confused if they see artificial lists where they expect
symbols.  After macro expansion that's no longer a problem and the
compiler can use any representation that is convenient.

So, instead of replacing symbols with lists, I would attach the position
to cons cells (stored in an auxiliary hashtable with the cons as key).

(Remember that in Lisp code almost every symbol is stored in the car of
of a cons.)

> Comments?

First, I would look at CMUCL or Clozure CL for inspiration.  Those
compilers produce fairly accurate source location information and also
have the problem that they need to pass "vanilla" forms to macros.
(Scheme compilers use "syntax-objects" instead of vanilla forms, so
there's probably not much to learn there.)

Second, instead of (or in addition to) recording the position of symbols
in read-symbol-positions-list I would record the start and end position
of lists ie. the READ function should record the position of "(" and the
corresponding ")" for each list.

Third, MACROEXPAND should record source forms so that the compiler can
list all forms (and positions) that eventually generated the fully
expanded form.

Fourth, for very sophisticated macros there should probably be some API
beyond MACROEXPAND so that macros can help the compiler to track source
positions.

Helmut

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts on the buffer positions in the byte compiler's warning messages.
  2016-09-18 17:38 ` Alan Mackenzie
@ 2016-10-13 22:58   ` Andreas Politz
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Politz @ 2016-10-13 22:58 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

Alan Mackenzie <acm@muc.de> writes:

> What's needed is to construct a hash table whose key is a cons cell in
> the form which the reader has just built[...]

Edebug already does something similar.

(defun read-with-offsets (&optional buffer position)
  (unless buffer (setq buffer (current-buffer)))
  (setq buffer (get-buffer buffer))
  (unless position
    (setq position (with-current-buffer buffer (point))))

  (let (edebug-offsets
        edebug-offsets-stack
        edebug-current-offset)
    (with-current-buffer buffer
      (save-excursion
        (goto-char position)
        (cons
         (edebug-read-storing-offsets (current-buffer))
         edebug-offsets)))))

-ap



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-13 22:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-18 15:23 Thoughts on the buffer positions in the byte compiler's warning messages Alan Mackenzie
2016-09-18 17:38 ` Alan Mackenzie
2016-10-13 22:58   ` Andreas Politz
2016-09-18 18:21 ` Helmut Eller

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).