From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Thoughts on the buffer positions in the byte compiler's warning messages. Date: Sun, 18 Sep 2016 15:23:03 +0000 Message-ID: <20160918152303.GA3576@acm.fritz.box> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1474212254 24541 195.159.176.226 (18 Sep 2016 15:24:14 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 18 Sep 2016 15:24:14 +0000 (UTC) User-Agent: Mutt/1.5.24 (2015-08-30) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Sep 18 17:24:10 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bldwv-0005pD-Sl for ged-emacs-devel@m.gmane.org; Sun, 18 Sep 2016 17:24:10 +0200 Original-Received: from localhost ([::1]:50481 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bldwu-0000q8-4p for ged-emacs-devel@m.gmane.org; Sun, 18 Sep 2016 11:24:08 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52137) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bldwM-0000po-K0 for emacs-devel@gnu.org; Sun, 18 Sep 2016 11:23:35 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bldwG-0006ny-I0 for emacs-devel@gnu.org; Sun, 18 Sep 2016 11:23:34 -0400 Original-Received: from mail.muc.de ([193.149.48.3]:31701) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bldwG-0006mU-8G for emacs-devel@gnu.org; Sun, 18 Sep 2016 11:23:28 -0400 Original-Received: (qmail 40094 invoked by uid 3782); 18 Sep 2016 15:23:26 -0000 Original-Received: from acm.muc.de (p548C6F28.dip0.t-ipconnect.de [84.140.111.40]) by colin.muc.de (tmda-ofmipd) with ESMTP; Sun, 18 Sep 2016 17:23:25 +0200 Original-Received: (qmail 4988 invoked by uid 1000); 18 Sep 2016 15:23:03 -0000 Content-Disposition: inline X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-Received-From: 193.149.48.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:207559 Archived-At: Hello, Emacs. The byte compiler reporting wrong positions in its warning messages is a long standing problem. See bugs #2681, #8774, #9109, #22288, #24128, #24449. #24449 and #2681 have recently been fixed. The compiler's difficulty comes from how it reads the source code. It actually _reads_ it (in the lisp sense) then gets to work on the lisp form produced, rather than reading (in the file access sense) one line at a time and processing that, the way typical compilers do. So, how does the byte compiler produce any position information at all? It does so because the reader, in addition to producing the lisp form, also produces a linear alist of the positions each symbol it encountered was found at. So, if the form were: (defun foo (bar) (baz)) , the alist (called read-symbol-positions-list) would look something like: ((defun . 1) (foo . 7) (bar . 12) (baz . 20)) This alist is the sole source of information the compiler has to link symbols in the form being compiled with source positions. It does this (in function byte-compile-set-symbol-position, which takes a single argument, a symbol) by searching this alist for the NEXT occurrence of the desired symbol. So that, for example, if there were a warning concerning "(baz)", that function would search forward from the "current position", find (baz . 20) in read-symbol-positions-list, and from 20 it calculates the pertinent line and column positions. Not surprisingly, it often gets things wrong. For example, if a warning message is output before byte-compile-set-symbol-position has been called for the pertinent symbol, the line and column output will be that of the previous symbol. This happens in bug #8774, where in: 1 (defun fix-page-breaks () 2 "Fix page breaks in SAS 6 print files." 3 (interactive) 4 (save-excursion 5 (goto-char (point-min)) 6 (if (looking-at "\f") (delete-char 1)) 7 (replace-regexp "^\\(.+\\)\f" "\\1\n\f\n") 8 (goto-char (point-min)) 9 (replace-regexp "^\f\\(.+\\)" "\f\n\\1") 10 (goto-char (point-min)))) , the output messages are: ~/eglen.el:6:28:Warning: `replace-regexp' is for interactive use only; use `re-search-forward' and `replace-match' instead. ~/eglen.el:7:6:Warning: `replace-regexp' is for interactive use only; use `re-search-forward' and `replace-match' instead. Note the positions - 6:28 points at "delete-char", and 7:6, apparently correct, points at "replace-regexp". Trouble is, both are wrong: the first message should point at 7:6, and the second at 9:6. This would actually be fairly easy to fix, by centralising the point where byte-compile-set-symbol-position is called, into byte-compile-form, at the same time removing it from direct error-checking functions. The problem with this whole mechanism is that it is strictly left-to-right. Once the "current-position" has passed a symbol, there is no going back to it. This works, more or less, with straight code. Where a form is first transformed (whether by the byte code optimiser, macro expansion, or the closure conversion, or whatever) and then compiled, the "current position" becomes foggy indeed. The macro expander has its own routines for outputting messages (which I don't understand at the moment), but even so, sometimes gets it wrong. ######################################################################### I've been trying to come up with a general solution to these problems. What I have at the moment, which is rather vague, amounts to this: After the reader has produced the form to be compiled and read-symbol-positions-list, we combine these to produce a @dfn{shadow form} with the same shape as the form, but where there's a symbol in the form, there is a corresponding list in the shadow form, noting the corresponding "position" in the form, and onto which warning/error messages can be pushed. These can then be output at the end of the compilation. The info in the shadow form will allow the correct node corresponding to one in the form to be found, thus correct line/column numbers in messages are assured for normal code. Possibly a hash table will serve somehow to speed up searches. For transformed code (macro invocations, optimised forms, etc.), things become more difficult. However, these transformations mostly leave most of the cons cells in the form unchanged, just rearranging them somewhat. So the "pointers" in the shadow form will continue to be associated with them, enabling accurate warning messages even here. Obviously, this mechanism would cause the byte compiler to run more slowly. Whether or not this is significant or not would be down to experience. Comments? -- Alan Mackenzie (Nuremberg, Germany).