* Thoughts on getting correct line numbers in the byte compiler's warning messages @ 2018-11-01 17:59 Alan Mackenzie 2018-11-01 22:45 ` Stefan Monnier 2018-11-08 4:47 ` Michael Heerdegen 0 siblings, 2 replies; 44+ messages in thread From: Alan Mackenzie @ 2018-11-01 17:59 UTC (permalink / raw) To: emacs-devel Hello, Emacs. Most of the time, the byte compiler identifies the correct place of error in its warning messages. This is remarkable, given the crude hack which it uses. However, it sometimes fails, and this has given rise to a number of bug reports, e.g., 22288, and several others which have been merged with it. In bug #22288: (defun test () (let (a)) a) , the byte compiler correctly reports "reference to free variable 'a', but wrongly gives the source position as L2 C9 rather than L3 C3. The problem is that the Emacs Lisp source code being compiled is first read, and this discards line/column numbers of the constructs created. I believe that, somehow, accurate source position information must be preserved. But how? It is not easy. The forms created by the reader go through several (?many) transformative phases where they get replaced by successor forms. This makes things more difficult. My first idea to track position information was for the reader to create a hash table of conses (the key) and positions (the value), so that the position could be found simply by accessing the entry corresponding with the current form. This doesn't work so easily, because of the previous paragraph. Then I tried duplicating a hash table entry when a transformation was effected. This was just too tedious and error prone, and was also slow. Second idea was still to maintain this hash table, but on each transformation to write the result back to the same cons cell as the original. I actually put quite a lot of work into this approach, but in the end didn't get very far. It was just too much detailed work, too fiddly. The third idea is to amend the reader so that whereas it now produces a form, in a byte compiler special mode, it would produce the cons (form . offset). So, for example, the text "(not a)" currently gets read into the form (not . (a . nil)). The amended reader would produce (((not . 1) . ((a . 5) . (nil . 6))) . 0) (where 0, 1, 5, and 6 are the textual offsets of the elements coded). Such forms would require special versions of `cons', `car', `cdr', `cond', ...., `mapcar', .... to be easily manipulable. These versions would be macros to begin with, but probably primitives ultimately. Assuming appropriate design, it should be possibly to substitute these new macros/primitives for the existing cons/car/cdr/...s in the byte compiler without too much related change. I'm still exploring this scheme. I feel that this bug is not intractable, though it will take quite a lot of work to fix. Comments? -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-01 17:59 Thoughts on getting correct line numbers in the byte compiler's warning messages Alan Mackenzie @ 2018-11-01 22:45 ` Stefan Monnier 2018-11-05 10:53 ` Alan Mackenzie 2018-11-08 4:47 ` Michael Heerdegen 1 sibling, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-01 22:45 UTC (permalink / raw) To: emacs-devel > The third idea is to amend the reader so that whereas it now produces a > form, in a byte compiler special mode, it would produce the cons (form . > offset). So, for example, the text "(not a)" currently gets read into Sounds good. I have the vague feeling that I mentioned it already, but in case I haven't: please make sure the positions are character-precise rather than line-precise, so that we can (eventually) ditch Edebug's Elisp-reimplementation-of-the-reader which returns the same kind of info (and needs character-precise location info). Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-01 22:45 ` Stefan Monnier @ 2018-11-05 10:53 ` Alan Mackenzie 2018-11-05 15:57 ` Eli Zaretskii 2018-11-06 13:56 ` Stefan Monnier 0 siblings, 2 replies; 44+ messages in thread From: Alan Mackenzie @ 2018-11-05 10:53 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello, Stefan. On Thu, Nov 01, 2018 at 18:45:00 -0400, Stefan Monnier wrote: > > The third idea is to amend the reader so that whereas it now produces a > > form, in a byte compiler special mode, it would produce the cons (form . > > offset). So, for example, the text "(not a)" currently gets read into > Sounds good. I have the vague feeling that I mentioned it already, but > in case I haven't: please make sure the positions are character-precise > rather than line-precise, so that we can (eventually) ditch Edebug's > Elisp-reimplementation-of-the-reader which returns the same kind of info > (and needs character-precise location info). Actually this idea was not good; macros could not handle such a form without severe changes in the way macros work. (A research project, perhaps). I have come up with an improved scheme, which may well work. The reader would produce, in place of the Lisp_Objects it currently does, an object with Lisp_Type 1 (which is currently unused). The rest of the object would be an address pointing at two Lisp_Objects, one being the "real" read object, the other being a source position. The low level routines, like CONSP, and a million others in lisp.h would need amendment. But the Lisp system would continue with 8-byte objects, and the higher level bits (nearly all of it) would not need changes. The beauty of this scheme is that, outside of byte compilation, nothing else would change. One or two extra functions would be needed, such as `big-object' which would create a new-type object out of a source offset and "ordinary" object, `big-object-p', `big-offset' to get the source offset from a big object, and possibly one or two others. These would naturally be available to byte-compile-warn and friends, supplying the source position. To cope with the times when no source position would be available (e.g. in forms expanded from macros), the new variable `byte-compile-containing-form' would be bound at strategic places in the byte compiler. This would provide a fallback source position. The extra indirection involved in these "big objects" would naturally slow down byte compilation somewhat. I've no idea how much, but it might not be much at all. And yes, the source positions used would be character-precise. What do you think? > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-05 10:53 ` Alan Mackenzie @ 2018-11-05 15:57 ` Eli Zaretskii 2018-11-05 16:51 ` Alan Mackenzie 2018-11-06 13:56 ` Stefan Monnier 1 sibling, 1 reply; 44+ messages in thread From: Eli Zaretskii @ 2018-11-05 15:57 UTC (permalink / raw) To: Alan Mackenzie; +Cc: monnier, emacs-devel > Date: Mon, 5 Nov 2018 10:53:02 +0000 > From: Alan Mackenzie <acm@muc.de> > Cc: emacs-devel@gnu.org > > The reader would produce, in place of the Lisp_Objects it currently > does, an object with Lisp_Type 1 (which is currently unused). The rest > of the object would be an address pointing at two Lisp_Objects, one > being the "real" read object, the other being a source position. Sounds gross to me. Did you consider using mint_ptr objects instead? That'd be still be gross, but at least we won't introduce another type of Lisp_Object. Also, what about keeping the source position in some other way, like a property of some symbol? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-05 15:57 ` Eli Zaretskii @ 2018-11-05 16:51 ` Alan Mackenzie 2018-11-06 4:34 ` Herring, Davis 0 siblings, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-05 16:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: monnier, emacs-devel Hello, Eli. On Mon, Nov 05, 2018 at 17:57:35 +0200, Eli Zaretskii wrote: > > Date: Mon, 5 Nov 2018 10:53:02 +0000 > > From: Alan Mackenzie <acm@muc.de> > > Cc: emacs-devel@gnu.org > > The reader would produce, in place of the Lisp_Objects it currently > > does, an object with Lisp_Type 1 (which is currently unused). The rest > > of the object would be an address pointing at two Lisp_Objects, one > > being the "real" read object, the other being a source position. > Sounds gross to me. What is done at the moment is no less gross. Just to clarify, the above acton of read would only be done when in byte compilation, a bit like how the current list of source symbols is also only for when in compilation. I've spend many hours at my PC, trying to figure out a neat way of solving this problem. The above is the best I've been able to come up with, so far. Why do you think the idea is gross, given the difficulty of the underlying problem? The idea should work with only moderate amendment of the byte-compiler/macro routines, and virtually no change outside of that, bar amending the reader and the lowest level functions like `cons' and `car'. > Did you consider using mint_ptr objects instead? That'd be still be > gross, but at least we won't introduce another type of Lisp_Object. The using up of the last available object type is a severe disadvantage, yes. I wasn't aware of mint_ptrs until you just mentioned them. I'll need to read up on them to get the hang of what they're about. > Also, what about keeping the source position in some other way, like a > property of some symbol? Difficult. Essentially, these source positions are properties of Lisp_Objects, such as conses, not of symbols. A typical symbol is used several or many times in a compilation unit. Some means has to be found of attaching properties (in this case, source positions), to arbitrary Lisp_Objects. It's gradually become clear to me that what I proposed this morning is a special case of attaching a property list to an arbitrary object. Maybe an actual property list, being more general, would be a better idea. Alternatively, it may be possible to use a vector or pseudovector type rather than using Lisp_Type 1 to implement basically the same idea. This would be slower at run time, however, possibly not significantly. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-05 16:51 ` Alan Mackenzie @ 2018-11-06 4:34 ` Herring, Davis 2018-11-06 8:53 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Herring, Davis @ 2018-11-06 4:34 UTC (permalink / raw) To: Alan Mackenzie Cc: Eli Zaretskii, monnier@iro.umontreal.ca, emacs-devel@gnu.org > I've spend many hours at my PC, trying to figure out a neat way of > solving this problem. The above is the best I've been able to come up > with, so far. Considering patterns like AoS vs. SoA, could the reader produce (on demand) a pair: the expression read and a parallel structure of position information? For example, '(foo bar [baz]) => ((quote (foo bar)) . (0 (2 6 [11]))) where the numbers are character offsets from the beginning of the read? This loses information on the opening delimiter for each list/cons/vector; it could be added with certain obvious alterations to the location structure if that's a problem. Davis ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-06 4:34 ` Herring, Davis @ 2018-11-06 8:53 ` Alan Mackenzie 0 siblings, 0 replies; 44+ messages in thread From: Alan Mackenzie @ 2018-11-06 8:53 UTC (permalink / raw) To: Herring, Davis Cc: Eli Zaretskii, monnier@iro.umontreal.ca, emacs-devel@gnu.org Hello, Davis. On Tue, Nov 06, 2018 at 04:34:30 +0000, Herring, Davis wrote: > > I've spend many hours at my PC, trying to figure out a neat way of > > solving this problem. The above is the best I've been able to come up > > with, so far. > Considering patterns like AoS vs. SoA, could the reader produce (on > demand) a pair: the expression read and a parallel structure of > position information? For example, > '(foo > bar [baz]) > => > ((quote (foo bar)) . > (0 (2 6 [11]))) > where the numbers are character offsets from the beginning of the > read? This loses information on the opening delimiter for each > list/cons/vector; it could be added with certain obvious alterations > to the location structure if that's a problem. Such a structure could be generated easily. But how are we going to use it? The problem is how do we associate a particular piece of the main structure with the pertinent bit of the auxiliary structure? For example, at the time we're compiling baz, the byte compiler has just baz itself. How do we get to the 11 in the offsets structure? This is the essence of the problem - associating data with the elements of an arbitrary structure of lisp objects. My proposal from yesterday does this rigorously. > Davis -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-05 10:53 ` Alan Mackenzie 2018-11-05 15:57 ` Eli Zaretskii @ 2018-11-06 13:56 ` Stefan Monnier 2018-11-06 15:11 ` Alan Mackenzie 1 sibling, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-06 13:56 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel > Actually this idea was not good; [ I'll assume you're not talking about the idea of using such a reader in edebug, but about using such a reader for your use case. ] > macros could not handle such a form without severe changes in the way > macros work. (A research project, perhaps). Right. The way I was thinking about it was that when calling macros we'd do something like: (plain-to-annotated (macroexpand (annotated-to-plain sexp))) not a research project by any stretch, but its impact on performance could be a problem, indeed. > The reader would produce, in place of the Lisp_Objects it currently > does, an object with Lisp_Type 1 (which is currently unused). The rest > of the object would be an address pointing at two Lisp_Objects, one > being the "real" read object, the other being a source position. More generally, you're suggesting here to add a new object type (could just as well be a new pseudo-vector or any such thing: these are just low-level concerns that don't really affect the overall design). > The low level routines, like CONSP, and a million others in lisp.h would > need amendment. So you're suggesting to change the low-level routines accessing virtually all object types to also accept those "annotated objects"? That means all processing of all objects would be slowed down. I think that's a serious problem (I'd rather pay a significant slow down in byte-compilation than a smaller slowdown on everything else). > But the Lisp system would continue with 8-byte objects, > and the higher level bits (nearly all of it) would not need changes. > The beauty of this scheme is that, outside of byte compilation, nothing > else would change. Also, I wonder how this (or any other of the techniques discussed) solve the original problem you describe: The forms created by the reader go through several (?many) transformative phases where they get replaced by successor forms. This makes things more difficult. E.g. we could implement big-object as (defun big-object (object location) (cons object location)) or (defun big-object (object location) (puthash object location location-hash-table) object) or (defun big-object (object location) (make-new-special-object object location)) but the problem remains of how to put it at all the places where we need it. > The extra indirection involved in these "big objects" would naturally > slow down byte compilation somewhat. I've no idea how much, but it > might not be much at all. Indeed, I don't think that's a significant issue. Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-06 13:56 ` Stefan Monnier @ 2018-11-06 15:11 ` Alan Mackenzie 2018-11-06 16:29 ` Stefan Monnier 0 siblings, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-06 15:11 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello, Stefan. On Tue, Nov 06, 2018 at 08:56:48 -0500, Stefan Monnier wrote: > > Actually this idea was not good; > [ I'll assume you're not talking about the idea of using such a reader in > edebug, but about using such a reader for your use case. ] In particular, in the byte compiler. > > macros could not handle such a form without severe changes in the way > > macros work. (A research project, perhaps). > Right. The way I was thinking about it was that when calling > macros we'd do something like: > (plain-to-annotated > (macroexpand (annotated-to-plain sexp))) That would lose too much of the wanted source position data. > not a research project by any stretch, but its impact on performance > could be a problem, indeed. > > The reader would produce, in place of the Lisp_Objects it currently > > does, an object with Lisp_Type 1 (which is currently unused). The rest > > of the object would be an address pointing at two Lisp_Objects, one > > being the "real" read object, the other being a source position. > More generally, you're suggesting here to add a new object type (could > just as well be a new pseudo-vector or any such thing: these are just > low-level concerns that don't really affect the overall design). There's nothing just about hurting performance. > > The low level routines, like CONSP, and a million others in lisp.h would > > need amendment. > So you're suggesting to change the low-level routines accessing > virtually all object types to also accept those "annotated objects"? Yes. > That means all processing of all objects would be slowed down. > I think that's a serious problem (I'd rather pay a significant slow > down in byte-compilation than a smaller slowdown on everything else). The slow down would not be great. For example, XCONS first checks the 3-bit tag, and if all's OK, removes it, otherwise it handles the error. I'm proposing enhancing the "otherwise" to check for a tag of 1 together with a proper cons at the far end of a pointer. With care, there should be no loss in the usual case, here. I timed a bootstrap, unoptimised GCC, with an extra tag check and storage to a global variable inserted into XFIXNUM. (Currently there is no such check there). The slowdown was around 1.3% > > But the Lisp system would continue with 8-byte objects, > > and the higher level bits (nearly all of it) would not need changes. > > The beauty of this scheme is that, outside of byte compilation, nothing > > else would change. > Also, I wonder how this (or any other of the techniques discussed) solve > the original problem you describe: > The forms created by the reader go through several (?many) > transformative phases where they get replaced by successor forms. > This makes things more difficult. Many of the original forms produced by the reader survive these transformations. For those that do not, we could bind byte-compile-containing-position (or whatever) to a sensible position each time the compiler enters a "major" form (whatever that might mean). > E.g. we could implement big-object as > 1. (defun big-object (object location) > (cons object location)) > or > 2. (defun big-object (object location) > (puthash object location location-hash-table) > object) > or > 3. (defun big-object (object location) > (make-new-special-object object location)) 1. wouldn't work, as such. E.g. evaluating `car' must get the car of the original OBJECT, not the car of (cons OBJECT LOCATION). I've tried 2., and given up on it: everywhere in the compiler where FORM is transformed to NEWFORM, a copy of a hash has to be created for NEWFORM. Also, there's no convenient key for recording the hash of an occurence of a symbol (such as `if'). 3. is what I'm proposing, I think. The motivating thing here is that the rest of the system can handle NEW-SPECIAL-OBJECT and get the same result it would have from OBJECT. Hence the use of Lisp_Type 1, or possibly a new pseudovector type. > but the problem remains of how to put it at all the places where we > need it. Every object produced by the reader during byte compilation would have its source position attached to the object, in essence. Objects produced by macro expansion would not have this, but we could arrange to copy the info much of the time. (E.g. the result of a `mapcar' operating on a list of FORMs would be given the position information of the list.) Other non-reader forms would have to depend on the variable byte-compile-containing-position mentioned above. Incidentally, I'm coming round the the idea of calling the new object an _extended_ object. In place of the fixnum source position proposed, we could use, for example, a property list. There are surely many applications for having a property list on a cons form. :-) > > The extra indirection involved in these "big objects" would naturally > > slow down byte compilation somewhat. I've no idea how much, but it > > might not be much at all. > Indeed, I don't think that's a significant issue. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-06 15:11 ` Alan Mackenzie @ 2018-11-06 16:29 ` Stefan Monnier 2018-11-06 19:15 ` Alan Mackenzie 2018-11-07 17:00 ` Alan Mackenzie 0 siblings, 2 replies; 44+ messages in thread From: Stefan Monnier @ 2018-11-06 16:29 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel > I timed a bootstrap, unoptimised GCC, with an extra tag check and > storage to a global variable inserted into XFIXNUM. (Currently there is > no such check there). The slowdown was around 1.3% That accumulates for every data type, and it increases code size, reduces cache hit rate... You may find it acceptable, but I don't, mostly because I know fundamentally it's not needed: it's only introduced for short/medium term convenience (to avoid having to rewrite a lot of code). And I can't see how we'll be able to get rid of it in the long run (gradually or not). So in the long run it's a bad option. > Many of the original forms produced by the reader survive these > transformations. Yeah, that's why I thought of using a hash-table. > I've tried 2., and given up on it: everywhere in the compiler where FORM > is transformed to NEWFORM, a copy of a hash has to be created for > NEWFORM. Same with your new scheme: everywhere where a "big cons-cell" is transformed, by a macro you'll get a "small cons-cell". That's a constant of all options, AFAICT. > Also, there's no convenient key for recording the hash of an > occurence of a symbol (such as `if'). Ah, right, I keep forgetting this detail. Yes, that's a major downer. > 3. is what I'm proposing, I think. Yes [ sorry, you had to guess; I thought it was clear enough]. > The motivating thing here is that the rest of the system can handle > NEW-SPECIAL-OBJECT and get the same result it would have from OBJECT. > Hence the use of Lisp_Type 1, or possibly a new pseudovector type. How 'bout we don't try to add location to all objects, but only to some specific objects? E.g. only cons-cells? We could add a new "big cons-cell" type which shares the same tag, and just adds additional info after the end of the normal cons-cell (cons-cell would either be allocated from small_cons_blocks or big_cons_blocks, so you'd have to look at the enclosing cons_block to determine which kind of cons-cell you have). So normal code is not slowed down at all (except I guess for the GC which will be marginally slower). Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-06 16:29 ` Stefan Monnier @ 2018-11-06 19:15 ` Alan Mackenzie 2018-11-06 20:04 ` Stefan Monnier 2018-11-07 17:00 ` Alan Mackenzie 1 sibling, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-06 19:15 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello again, Stefan. Now for something completely different. On Tue, Nov 06, 2018 at 11:29:41 -0500, Stefan Monnier wrote: [ .... ] > So in the long run it [Alan's idea for extended Lisp Objects] is a bad > option. I feel that intuitively, hence agree with you. It would be nice to have robust warning line numbers, though. In the rest of this post, I will no longer be discussing this scheme. > > Many of the original forms produced by the reader survive these > > transformations. > Yeah, that's why I thought of using a hash-table. What I tried before (about two years ago) was having each reader-produced form as a key, and the source position as a value. Each time the source was transformed, the new form became a new key, and the value stayed the same. I vaguely remember this being slow. Maybe it would be better the other way around. The source position would be the key, and the value would be a list of (equivalent) forms. Building this table would be faster. Finding a form in that table for a warning message would be much slower, but that shouldn't matter. [ .... ] > > Also, there's no convenient key for recording the hash of an > > occurence of a symbol (such as `if'). > Ah, right, I keep forgetting this detail. Yes, that's a major downer. Here's my latest idea: we maintain byte-compile-containing-forms as a stack of containing forms. Each time we're manipulating a list of forms, we increment a counter N with each form. That form is often a symbol. In byte-compile-warn, if we can't find the current form in the above table, we search for the containing form, get its source offset, put point there and read the next N forms, moving forward in the source text to the position we need. That this might be slow (I don't really think it would be) is again unimportant. [ .... ] > How 'bout we don't try to add location to all objects, but only to some > specific objects? E.g. only cons-cells? Yes, and vectors too. Integers, symbols, strings, and floats, no. [ .... ] > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-06 19:15 ` Alan Mackenzie @ 2018-11-06 20:04 ` Stefan Monnier 2018-11-07 12:35 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-06 20:04 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel >> > Many of the original forms produced by the reader survive these >> > transformations. >> Yeah, that's why I thought of using a hash-table. > What I tried before (about two years ago) was having each > reader-produced form as a key, and the source position as a value. Each > time the source was transformed, the new form became a new key, and the > value stayed the same. > > I vaguely remember this being slow. Which part do you remember being slow (e.g. just performing a `read` that returns a sexp and fills that table along the way)? > Maybe it would be better the other way around. The source position > would be the key, and the value would be a list of (equivalent) forms. > Building this table would be faster. I don't follow you: why would this be faster? > Finding a form in that table for a warning message would be much > slower, but that shouldn't matter. It could matter, but yeah, let's not worry about that for now. > In byte-compile-warn, if we can't find the current form in the above > table, we search for the containing form, get its source offset, put > point there and read the next N forms, moving forward in the source text > to the position we need. That this might be slow (I don't really think > it would be) is again unimportant. I lost you here as well: how is the location data propagated from the reader to the byte-compiler's phase that ends up running byte-compile-warn? I mean, how is the location info preserved while going through macro-expansion, closure-conversion, and byte-optimize-form? Or are most objects left untouched in practice? I guess we could limit the info (e.g. stored in a hash-table) to map "first cons-cell in a list" to its location info, and then change macroexp.el, cconv.el, and friends to preserve this info as much as possible (we may even come up with a `with-location-data` macro that encapsulates most of the work so the changes are easy to apply). Is that what you're thinking of? Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-06 20:04 ` Stefan Monnier @ 2018-11-07 12:35 ` Alan Mackenzie 2018-11-07 17:11 ` Stefan Monnier 0 siblings, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-07 12:35 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello, Stefan. On Tue, Nov 06, 2018 at 15:04:51 -0500, Stefan Monnier wrote: > >> > Many of the original forms produced by the reader survive these > >> > transformations. > >> Yeah, that's why I thought of using a hash-table. > > What I tried before (about two years ago) was having each > > reader-produced form as a key, and the source position as a value. Each > > time the source was transformed, the new form became a new key, and the > > value stayed the same. > > > > I vaguely remember this being slow. > Which part do you remember being slow (e.g. just performing a `read` > that returns a sexp and fills that table along the way)? Looking at notes I made at the time, I amended a small portion of e.g. byte-optimize-body to make a new hash entry with the same value when a form was transformed. The slowdown on just the byte optimiser was around a factor of three. I think the comparison was with the byte-optimiser in the released version (without any hash tables). > > Maybe it would be better the other way around. The source position > > would be the key, and the value would be a list of (equivalent) forms. > > Building this table would be faster. > I don't follow you: why would this be faster? I don't think I follow myself here. I was thinking that accessing a hash table element was slow, therefore keeping a table value current and pushing transformed forms onto it would be faster than creating a new hash table entry for these new forms. Looking at the code for hash tables, the access time can not be all that long. > > Finding a form in that table for a warning message would be much > > slower, but that shouldn't matter. > It could matter, but yeah, let's not worry about that for now. > > In byte-compile-warn, if we can't find the current form in the above > > table, we search for the containing form, get its source offset, put > > point there and read the next N forms, moving forward in the source text > > to the position we need. That this might be slow (I don't really think > > it would be) is again unimportant. > I lost you here as well: how is the location data propagated from the > reader to the byte-compiler's phase that ends up running > byte-compile-warn? For objects created by the reader, they can be looked up in the hash table. But your real question .... > I mean, how is the location info preserved while going through > macro-expansion, closure-conversion, and byte-optimize-form? Or are > most objects left untouched in practice? Either by making new entries in the table for transformed forms, or by noting byte-compile-containing-form and "sub-form number 2" and using read (or forward-sexp, even) on the source text to move forward to sub-form 2. > I guess we could limit the info (e.g. stored in a hash-table) to map > "first cons-cell in a list" to its location info, and then change > macroexp.el, cconv.el, and friends to preserve this info as much as > possible (we may even come up with a `with-location-data` macro that > encapsulates most of the work so the changes are easy to apply). > Is that what you're thinking of? That's the sort of thing, yes. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-07 12:35 ` Alan Mackenzie @ 2018-11-07 17:11 ` Stefan Monnier 0 siblings, 0 replies; 44+ messages in thread From: Stefan Monnier @ 2018-11-07 17:11 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel > Looking at notes I made at the time, I amended a small portion of e.g. > byte-optimize-body to make a new hash entry with the same value when a > form was transformed. The slowdown on just the byte optimiser was > around a factor of three. Ouch! > I don't think I follow myself here. I was thinking that accessing a > hash table element was slow, therefore keeping a table value current and > pushing transformed forms onto it would be faster than creating a new > hash table entry for these new forms. Ah, so you'd keep a pointer to the list somehow and add to it by side-effects. Yes, I guess it would indeed be noticeably faster for the case of copying the location info from the source code to the transformed code. > Looking at the code for hash tables, the access time can not be all > that long. Hash-table accesses are pretty costly, in my experience. Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-06 16:29 ` Stefan Monnier 2018-11-06 19:15 ` Alan Mackenzie @ 2018-11-07 17:00 ` Alan Mackenzie 2018-11-07 17:25 ` Stefan Monnier 1 sibling, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-07 17:00 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello, Stefan. On Tue, Nov 06, 2018 at 11:29:41 -0500, Stefan Monnier wrote: > > I timed a bootstrap, unoptimised GCC, with an extra tag check and > > storage to a global variable inserted into XFIXNUM. (Currently there is > > no such check there). The slowdown was around 1.3% > That accumulates for every data type, and it increases code size, > reduces cache hit rate... No, it applies mainly to FIXNUM, because XFIXNUM doesn't already check the Lisp_Type. Other object types already perform this check, so while it would increase the code size (by how much?) it would have a lesser run time penalty. There would be a slow down in predicates like symbolp, when the result is false. This probably wouldn't amount to much in practice. Part of that 1.3% (I don't know how big a part) was GCC outputting warning messages. Anyhow, do we really need to worry about code size anymore? temacs is only 7.3 Mb, and the machines people will be running it on will have several, or more usually many, Gb of RAM. So what if it became 7.5 Mb, or even 8.0 Mb? > You may find it acceptable, but I don't, mostly because I know > fundamentally it's not needed: it's only introduced for short/medium > term convenience (to avoid having to rewrite a lot of code). > And I can't see how we'll be able to get rid of it in the long run > (gradually or not). > So in the long run it's a bad option. Yes, it may be a bad option, but possibly less bad than the other bad options we have. > > Many of the original forms produced by the reader survive these > > transformations. This, as it happens, is not true. Many of the symbols produced by the reader survive, none of the cons forms do. cconv, we love you. ;-( > Yeah, that's why I thought of using a hash-table. > > I've tried 2., and given up on it: everywhere in the compiler where FORM > > is transformed to NEWFORM, a copy of a hash has to be created for > > NEWFORM. I've rediscovered why I gave up on the hash table approach 2. That's because cconv-convert chews up EVERY list it is presented with and spits out one which is not EQ to the original, though it is usually EQUAL. I'm not saying it was written with the object of frustrating the current exercise (I'm sure it wasn't), but I will say that if that had been the objective, the end result wouldn't be different from what we now have. cconve.el would need to be entirely rewritten if we stick to the hash table approach. It wouldn't survive anything like unscathed even in an "extended Lisp Object" solution. Maybe it would be possible to defer cconv.el processing till after macro expansion and byte-opt.el stuff. Would this do any good? The only vague idea I have for saving this, and I don't like it one bit, is somehow to redefine \` (and possibly \,) in such a way that it would somehow copy the source position from the original list to the result. > Same with your new scheme: everywhere where a "big cons-cell" is > transformed, by a macro you'll get a "small cons-cell". > That's a constant of all options, AFAICT. The "extended" symbols would survive. That is a big plus. > > Also, there's no convenient key for recording the hash of an > > occurence of a symbol (such as `if'). > Ah, right, I keep forgetting this detail. Yes, that's a major downer. > > 3. is what I'm proposing, I think. > Yes [ sorry, you had to guess; I thought it was clear enough]. > > The motivating thing here is that the rest of the system can handle > > NEW-SPECIAL-OBJECT and get the same result it would have from OBJECT. > > Hence the use of Lisp_Type 1, or possibly a new pseudovector type. > How 'bout we don't try to add location to all objects, but only to some > specific objects? E.g. only cons-cells? This could work, together with byte-compile-enclosing-form and a subform number N to get at the non-cons objects (symbols, strings, ..) in a cons or vector form. > We could add a new "big cons-cell" type which shares the same tag, and > just adds additional info after the end of the normal cons-cell > (cons-cell would either be allocated from small_cons_blocks or > big_cons_blocks, so you'd have to look at the enclosing cons_block to > determine which kind of cons-cell you have). I've been through these sort of thoughts. That idea would be less effective than the "extended object", since it would only work with conses, but might be less disruptive. But why should it only work with conses? Why not with symbols, too? > So normal code is not slowed down at all (except I guess for the GC > which will be marginally slower). Hmmm. Maybe there's something in this idea. :-) Somehow we'd need to determine the enclosing cons block, given the address of a cons, and that could be slow. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-07 17:00 ` Alan Mackenzie @ 2018-11-07 17:25 ` Stefan Monnier 2018-11-07 18:47 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-07 17:25 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel >> > I timed a bootstrap, unoptimised GCC, with an extra tag check and >> > storage to a global variable inserted into XFIXNUM. (Currently there is >> > no such check there). The slowdown was around 1.3% > >> That accumulates for every data type, and it increases code size, >> reduces cache hit rate... > > No, it applies mainly to FIXNUM, because XFIXNUM doesn't already check > the Lisp_Type. Other object types already perform this check, so while I'm not sure why you say that. XCONS/XSYMBOL don't perform the check either (unless you compile with debug-checks, of course, but that's not the important case). > Yes, it may be a bad option, but possibly less bad than the other bad > options we have. There's indeed a pretty good set of bad options at hand. Not sure which one will suck less. > cconv.el would need to be entirely rewritten if we stick to the hash > table approach. It wouldn't survive anything like unscathed even in an > "extended Lisp Object" solution. It's "only" the cconv-convert part of cconv.el that will need changes, but yes, one way or another it will need to be changed to preserve the location info. > Maybe it would be possible to defer cconv.el processing till after macro > expansion and byte-opt.el stuff. Would this do any good? It's already done after macro expansion (but before byte-opt). I don't think it moving it would help. > The only vague idea I have for saving this, and I don't like it one bit, > is somehow to redefine \` (and possibly \,) in such a way that it would > somehow copy the source position from the original list to the result. Define "original list" ;-) >> Same with your new scheme: everywhere where a "big cons-cell" is >> transformed, by a macro you'll get a "small cons-cell". >> That's a constant of all options, AFAICT. > The "extended" symbols would survive. That is a big plus. Indeed symbols are usually preserved un-touched. > I've been through these sort of thoughts. That idea would be less > effective than the "extended object", since it would only work with > conses, but might be less disruptive. But why should it only work with > conses? No particular reason at first. > Why not with symbols, too? Reproducing this idea for other types is not always that easy or useful: - for pseudo-vectors the variable size aspect makes it harder to handle (tho not impossible). OTOH we could probably use a bit in the header and thus avoid the need to place those extended objects in their own blocks. - for symbols the extra info is "per symbol occurrence" rather than "per symbol", so we can't add this info directly to the symbol (i.e. the same reason the hash-table approach doesn't work for symbols). So we'd really want a completely separate object which then points to the underlying symbol object. But yes, we could introduce a new symbol-occurrence object, along the lines you originally suggested but only for symbols (thus reducing the performance cost). -- Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-07 17:25 ` Stefan Monnier @ 2018-11-07 18:47 ` Alan Mackenzie 2018-11-07 19:12 ` Stefan Monnier 0 siblings, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-07 18:47 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello again, Stefan. On Wed, Nov 07, 2018 at 12:25:15 -0500, Stefan Monnier wrote: > >> That accumulates for every data type, and it increases code size, > >> reduces cache hit rate... > > No, it applies mainly to FIXNUM, because XFIXNUM doesn't already check > > the Lisp_Type. Other object types already perform this check, so while > I'm not sure why you say that. XCONS/XSYMBOL don't perform the check > either (unless you compile with debug-checks, of course, but that's not > the important case). Ah, really? OK, I'd need to repeat the exercise with the checks in XCONS and XSYMBOL, too. I suspect the slowdown would be significant, though perhaps not critical (say, around 5%). For these #defines, there must be a check on Lisp_Type somewhere, so we should be able to incorporate that "somewhere" into the check for Lisp_Type 1. Maybe. [ .... ] > There's indeed a pretty good set of bad options at hand. Not sure which > one will suck less. Yes. Things aren't looking good. [ .... ] > It's "only" the cconv-convert part of cconv.el that will need changes, > but yes, one way or another it will need to be changed to preserve the > location info. OK. But it's still a challenging job. > > Maybe it would be possible to defer cconv.el processing till after macro > > expansion and byte-opt.el stuff. Would this do any good? > It's already done after macro expansion (but before byte-opt). > I don't think it moving it would help. Maybe not. I was thinking that if it was deferred until after byte-opt, "all" the warning messages would have the right position info. But cconv.el calls byte-compile-warn, too. > > The only vague idea I have for saving this, and I don't like it one bit, > > is somehow to redefine \` (and possibly \,) in such a way that it would > > somehow copy the source position from the original list to the result. > Define "original list" ;-) The one that has been transformed into the result. For example, in this fragment from the end of cconv-convert: (`(,func . ,forms) ;; First element is function or whatever function-like forms are: or, and, ;; if, catch, progn, prog1, prog2, while, until `(,func . ,(mapcar (lambda (form) (cconv-convert form env extend)) forms))) , the original list would be the whole FORM. My idea would be to rewrite the resulting form as something like: `(form ,func . ,(bc-mapcar (lambda (form) (cconv-convert form env extend)) forms)) , where the first argument in the modified \` supplies the position information for the result list, but isn't included in the list itself. bc-mapcar would be a version of mapcar which preserves the internal position info in the resulting form, copying it from the original list parameter. As I say, I don't like the idea, but it might be the best we can come up with, and still have a readable and maintainable cconv.el. [ .... ] > > I've been through these sort of thoughts. That idea would be less > > effective than the "extended object", since it would only work with > > conses, but might be less disruptive. But why should it only work > > with conses? > No particular reason at first. > > Why not with symbols, too? > Reproducing this idea for other types is not always that easy or useful: > - for pseudo-vectors the variable size aspect makes it harder to handle > (tho not impossible). OTOH we could probably use a bit in the header > and thus avoid the need to place those extended objects in their > own blocks. Yes. > - for symbols the extra info is "per symbol occurrence" rather than "per > symbol", so we can't add this info directly to the symbol (i.e. the > same reason the hash-table approach doesn't work for symbols). D'oh! Of course! > So we'd really want a completely separate object which then points to > the underlying symbol object. But yes, we could introduce a new > symbol-occurrence object, along the lines you originally suggested but > only for symbols (thus reducing the performance cost). :-) This could be a pseudovector, leaving Lisp_Type 1 free for more worthy uses. You're suggesting a mix of approaches. This might be more complicated, but possibly the least pessimal. > -- Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-07 18:47 ` Alan Mackenzie @ 2018-11-07 19:12 ` Stefan Monnier 2018-11-08 14:08 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-07 19:12 UTC (permalink / raw) To: emacs-devel >> It's "only" the cconv-convert part of cconv.el that will need changes, >> but yes, one way or another it will need to be changed to preserve the >> location info. > OK. But it's still a challenging job. I wouldn't call it challenging: the changes are orthogonal to the actual working of cconv, so it will likely make the code messier but conceptually there's no significant difficulty. I'm familiar with the code and will be happy to help. > Maybe not. I was thinking that if it was deferred until after byte-opt, > "all" the warning messages would have the right position info. But > cconv.el calls byte-compile-warn, too. Some/many(most?) of the warnings come from bytecomp itself which inevitably happens after all of the above anyway. > As I say, I don't like the idea, but it might be the best we can come up > with, and still have a readable and maintainable cconv.el. Yes, we'd probably use a hack along these lines to try and limit the impact of the change. >> So we'd really want a completely separate object which then points to >> the underlying symbol object. But yes, we could introduce a new >> symbol-occurrence object, along the lines you originally suggested but >> only for symbols (thus reducing the performance cost). > :-) This could be a pseudovector, leaving Lisp_Type 1 free for more > worthy uses. You're suggesting a mix of approaches. This might be more > complicated, but possibly the least pessimal. One possible approach is to introduce such a symbol-occurrence hack [if this word sounds like a criticism, it's because it is] and nothing else (i.e. not a "mix" of approaches). To the extent that symbols aren't touched during the various phases, the corresponding info should trivially be preserved. The current hack we use is also limited to tracking symbol locations, so it should never be worse than what we already have. Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-07 19:12 ` Stefan Monnier @ 2018-11-08 14:08 ` Alan Mackenzie 2018-11-08 17:02 ` Stefan Monnier 2018-11-12 15:44 ` Alan Mackenzie 0 siblings, 2 replies; 44+ messages in thread From: Alan Mackenzie @ 2018-11-08 14:08 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello, Stefan. On Wed, Nov 07, 2018 at 14:12:41 -0500, Stefan Monnier wrote: > >> It's "only" the cconv-convert part of cconv.el that will need changes, > >> but yes, one way or another it will need to be changed to preserve the > >> location info. > > OK. But it's still a challenging job. > I wouldn't call it challenging: the changes are orthogonal to the actual > working of cconv, so it will likely make the code messier but > conceptually there's no significant difficulty. I'm familiar with the > code and will be happy to help. Thanks! By the way, am I right in thinking that pcase does its comparisons using equal? [ .... ] > >> So we'd really want a completely separate object which then points to > >> the underlying symbol object. But yes, we could introduce a new > >> symbol-occurrence object, along the lines you originally suggested but > >> only for symbols (thus reducing the performance cost). > > :-) This could be a pseudovector, leaving Lisp_Type 1 free for more > > worthy uses. You're suggesting a mix of approaches. This might be more > > complicated, but possibly the least pessimal. > One possible approach is to introduce such a symbol-occurrence hack > [if this word sounds like a criticism, it's because it is] and nothing > else (i.e. not a "mix" of approaches). This sounds like a good idea. > To the extent that symbols aren't touched during the various phases, the > corresponding info should trivially be preserved. The current hack we > use is also limited to tracking symbol locations, so it should never be > worse than what we already have. One thing we'd need to watch out for is using equal, not eq, when we compare symbols. (eq 'foo #<symbol foo with position 73>) will surely be nil, but (equal ....) would be t. Same with member and memq. We'd also need to make sure that the reader's enabling flag for creating these extended symbols is bound to nil whenever we suspend the byte compiler to do something else (edebug, for example). > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-08 14:08 ` Alan Mackenzie @ 2018-11-08 17:02 ` Stefan Monnier 2018-11-08 22:13 ` Alan Mackenzie 2018-11-12 15:44 ` Alan Mackenzie 1 sibling, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-08 17:02 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel >> >> It's "only" the cconv-convert part of cconv.el that will need changes, >> >> but yes, one way or another it will need to be changed to preserve the >> >> location info. >> > OK. But it's still a challenging job. >> I wouldn't call it challenging: the changes are orthogonal to the actual >> working of cconv, so it will likely make the code messier but >> conceptually there's no significant difficulty. I'm familiar with the >> code and will be happy to help. > Thanks! By the way, am I right in thinking that pcase does its > comparisons using equal? "as if by `equal`", so when comparing against symbols we actually use `eq`. > One thing we'd need to watch out for is using equal, not eq, when we > compare symbols. (eq 'foo #<symbol foo with position 73>) will surely > be nil, but (equal ....) would be t. Same with member and memq. Indeed. > We'd also need to make sure that the reader's enabling flag for creating > these extended symbols is bound to nil whenever we suspend the byte > compiler to do something else (edebug, for example). Rather than a dynamically-scoped var, it might be a better option to either use a new function `read-with-positions`, or else use an additional argument to `read`. Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-08 17:02 ` Stefan Monnier @ 2018-11-08 22:13 ` Alan Mackenzie 2018-11-11 12:59 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-08 22:13 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello, Stefan. On Thu, Nov 08, 2018 at 12:02:01 -0500, Stefan Monnier wrote: > >> >> It's "only" the cconv-convert part of cconv.el that will need changes, > >> >> but yes, one way or another it will need to be changed to preserve the > >> >> location info. > >> > OK. But it's still a challenging job. > >> I wouldn't call it challenging: the changes are orthogonal to the actual > >> working of cconv, so it will likely make the code messier but > >> conceptually there's no significant difficulty. I'm familiar with the > >> code and will be happy to help. > > Thanks! By the way, am I right in thinking that pcase does its > > comparisons using equal? > "as if by `equal`", so when comparing against symbols we actually use `eq`. ... at the moment ... ;-) equal actually tests EQ right near its start anyway, so it shouldn't be a big deal for pcase actually to use equal. Or am I missing something? > > One thing we'd need to watch out for is using equal, not eq, when we > > compare symbols. (eq 'foo #<symbol foo with position 73>) will surely > > be nil, but (equal ....) would be t. Same with member and memq. > Indeed. > > We'd also need to make sure that the reader's enabling flag for creating > > these extended symbols is bound to nil whenever we suspend the byte > > compiler to do something else (edebug, for example). > Rather than a dynamically-scoped var, it might be a better option to > either use a new function `read-with-positions`, or else use an > additional argument to `read`. OK. I've hacked together some basic infrastructure in alloc.c, lread.c, print.c, and lisp.h. I can now read a small test file and get back the form with "located symbols". I've called the new function which does this read-locating-symbols, but that might want to change. As soon as I've sorted out SYMBOLP and XSYMBOL, I'll create a new branch under /scratch, commit what I've got, and then we can play with it. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-08 22:13 ` Alan Mackenzie @ 2018-11-11 12:59 ` Alan Mackenzie 2018-11-11 15:53 ` Eli Zaretskii 0 siblings, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-11 12:59 UTC (permalink / raw) To: Stefan Monnier; +Cc: Michael Heerdegen, emacs-devel Hello, Stefan. On Thu, Nov 08, 2018 at 22:13:11 +0000, Alan Mackenzie wrote: > On Thu, Nov 08, 2018 at 12:02:01 -0500, Stefan Monnier wrote: [ .... ] > OK. I've hacked together some basic infrastructure in alloc.c, lread.c, > print.c, and lisp.h. I can now read a small test file and get back the > form with "located symbols". I've called the new function which does > this read-locating-symbols, but that might want to change. > As soon as I've sorted out SYMBOLP and XSYMBOL, I'll create a new branch > under /scratch, commit what I've got, and then we can play with it. I've now got this working, and created the new, optimistically named, branch /scratch/accurate-warning-pos. To use this, do something like: M-: (setq bar (read-locating-symbols (current-buffer))) with point at the beginning of a (smallish) buffer. The following form, from Roland Winkler's bug #9109, works well: (unwind-protect (let ((foo "foo")) (insert foo)) (setq foo "bar")) . (car bar), for example, is now a "located symbol". Direct symbol functions are "protected" by an enabling flag located-symbols-enabled. This is needed, partly to minimise the run time taken when the facility is not being used, but more pertinently to enable Emacs to build without a segfault. Currently this flag guards only SYMBOLP and XSYMBOL. So, try M-: (symbolp (car bar)). This is nil. But M-: (let ((located-symbols-enabled t)) (symbolp (car bar))) is t. Similarly, set, symbol-value, symbol-function, symbol-plist need that flag to be non-nil. > > > One thing we'd need to watch out for is using equal, not eq, when we > > > compare symbols. (eq 'foo #<symbol foo with position 73>) will surely > > > be nil, but (equal ....) would be t. Same with member and memq. > > Indeed. `equal' has been enhanced so that M-: (equal (car bar) 'unwind-protect) is t. Additionally, there are defuns only-symbol-p, located-symbol-p, located-symbol-sym, located-symbol-loc, which do the obvious. > > > We'd also need to make sure that the reader's enabling flag for creating > > > these extended symbols is bound to nil whenever we suspend the byte > > > compiler to do something else (edebug, for example). > > Rather than a dynamically-scoped var, it might be a better option to > > either use a new function `read-with-positions`, or else use an > > additional argument to `read`. As noted above I've currently got a rather untidy mixture of these two approaches. There's a lot left to do, but this is a start. Incidentally, I timed a make bootstrap in this branch, comparing it with master. The branch was ~0.5% slower. This might be real, it might just be random noise. Comments and criticism welcome! > > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-11 12:59 ` Alan Mackenzie @ 2018-11-11 15:53 ` Eli Zaretskii 2018-11-11 20:12 ` Alan Mackenzie 2018-11-12 14:16 ` Alan Mackenzie 0 siblings, 2 replies; 44+ messages in thread From: Eli Zaretskii @ 2018-11-11 15:53 UTC (permalink / raw) To: Alan Mackenzie; +Cc: michael_heerdegen, monnier, emacs-devel > Date: Sun, 11 Nov 2018 12:59:45 +0000 > From: Alan Mackenzie <acm@muc.de> > Cc: Michael Heerdegen <michael_heerdegen@web.de>, emacs-devel@gnu.org > > I've now got this working, and created the new, optimistically named, > branch /scratch/accurate-warning-pos. Thanks. +/* Return a new located symbol with the specified SYMBOL and LOCATION. */ +Lisp_Object +build_located_symbol (Lisp_Object symbol, Lisp_Object location) +{ I'd prefer something like symbol_with_pos instead, and accordingly in other related symbol names. +DEFUN ("only-symbol-p", Fonly_symbol_p, Sonly_symbol_p, 1, 1, 0, + doc: /* Return t if OBJECT is a symbol, but not a located symbol. */ + attributes: const) + (Lisp_Object object) symbol-bare-p? + DEFVAR_LISP ("located-symbols-enabled", Vlocated_symbols_enabled, + doc: /* Non-nil when "located symbols" can be used in place of symbols. What is the rationale for this variable? diff --git a/src/lisp.h b/src/lisp.h index eb67626..b4fc6f2 100644 --- a/src/lisp.h +++ b/src/lisp.h @@ -323,6 +323,64 @@ typedef union Lisp_X *Lisp_Word; typedef EMACS_INT Lisp_Word; #endif +/* A Lisp_Object is a tagged pointer or integer. Ordinarily it is a + Lisp_Word. However, if CHECK_LISP_OBJECT_TYPE, it is a wrapper + around Lisp_Word, to help catch thinkos like 'Lisp_Object x = 0;'. + + LISP_INITIALLY (W) initializes a Lisp object with a tagged value + that is a Lisp_Word W. It can be used in a static initializer. */ Looks like you moved a large chunk of lisp.h to a different place in the file. Any reasons for that? +/* FIXME!!! 2018-11-09. Consider using lisp_h_PSEUDOVECTOR here. */ What is this FIXME about? This needs support in src/.gdbinit and documentation. Thanks again for working in this. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-11 15:53 ` Eli Zaretskii @ 2018-11-11 20:12 ` Alan Mackenzie 2018-11-11 20:47 ` Stefan Monnier 2018-11-12 16:19 ` Eli Zaretskii 2018-11-12 14:16 ` Alan Mackenzie 1 sibling, 2 replies; 44+ messages in thread From: Alan Mackenzie @ 2018-11-11 20:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: michael_heerdegen, monnier, emacs-devel Hello, Eli. Thanks for the reply and comments. On Sun, Nov 11, 2018 at 17:53:13 +0200, Eli Zaretskii wrote: > > Date: Sun, 11 Nov 2018 12:59:45 +0000 > > From: Alan Mackenzie <acm@muc.de> > > Cc: Michael Heerdegen <michael_heerdegen@web.de>, emacs-devel@gnu.org > > > > I've now got this working, and created the new, optimistically named, > > branch /scratch/accurate-warning-pos. > Thanks. > +/* Return a new located symbol with the specified SYMBOL and LOCATION. */ > +Lisp_Object > +build_located_symbol (Lisp_Object symbol, Lisp_Object location) > +{ > I'd prefer something like symbol_with_pos instead, and accordingly in > other related symbol names. Yes, I'll do that. "Located Symbol" is too much of a mouthfull. Thinking up names for new things isn't my strong point. > +DEFUN ("only-symbol-p", Fonly_symbol_p, Sonly_symbol_p, 1, 1, 0, > + doc: /* Return t if OBJECT is a symbol, but not a located symbol. */ > + attributes: const) > + (Lisp_Object object) > symbol-bare-p? How about bare-symbol-p? symbol-bare-p has the connotations "we have a symbol; is it bare?" rather than "have we a bare symbol?". > + DEFVAR_LISP ("located-symbols-enabled", Vlocated_symbols_enabled, > + doc: /* Non-nil when "located symbols" can be used in place of symbols. > What is the rationale for this variable? In the new lisp_h_SYMBOLP, we have #define lisp_h_SYMBOLP(x) ((lisp_h_ONLY_SYMBOL_P (x) || \ (Vlocated_symbols_enabled && (lisp_h_LOCATED_SYMBOL_P (x))))) The Vlocated_symbols_enabled should efficiently prevent a potentially slow lisp_h_LOCATED_SYMBOL_P from being executed in the overwhelmingly normal case that we don't have "symbols with pos". It is a simple test against binary zero, and the word should be permanently in cache. Another, slightly more honest, answer is that when it wasn't there, my Emacs build crashed with a segfault whilst loading .el files. I didn't get a core dump for this segfault. Could you please tell me (or point me in the right direction of documentation) how I configure my GNU/Linux to generate core dumps. I think my kernel's set up correctly, but I don't see the dumps. > diff --git a/src/lisp.h b/src/lisp.h > index eb67626..b4fc6f2 100644 > --- a/src/lisp.h > +++ b/src/lisp.h > @@ -323,6 +323,64 @@ typedef union Lisp_X *Lisp_Word; > typedef EMACS_INT Lisp_Word; > #endif > +/* A Lisp_Object is a tagged pointer or integer. Ordinarily it is a > + Lisp_Word. However, if CHECK_LISP_OBJECT_TYPE, it is a wrapper > + around Lisp_Word, to help catch thinkos like 'Lisp_Object x = 0;'. > + > + LISP_INITIALLY (W) initializes a Lisp object with a tagged value > + that is a Lisp_Word W. It can be used in a static initializer. */ > Looks like you moved a large chunk of lisp.h to a different place in > the file. Any reasons for that? I did this to get things to compile. lisp.h is intricate and complicated. But it turned out I'd moved far more than I needed. With the benefit of a night's sleep, I've restored most of the damage. All that's been moved now is some inline functions (SYMBOLP, XSYMBOL, ...., CHECK_SYMBOL) from before More_Lisp_Bits to after it, since they now depend on More_Lisp_Bits. > +/* FIXME!!! 2018-11-09. Consider using lisp_h_PSEUDOVECTOR here. */ > What is this FIXME about? It was a note to self about whether just to invoke the (new) macro lisp_h_PSEUDOVECTOR, rather than repeating the logic in the inline function. Sorry it escaped into the wild. The answer is, I MUST invoke the macro, to avoid duplication of functionality. > This needs support in src/.gdbinit and documentation. Yes! I think .gdbinit will be relatively straightforward. How much to put into the docs (the elisp manual?) is more difficult to decide. Although primariliy for the byte compiler, Michael Heerdegen has already said he's got other uses for it. > Thanks again for working in this. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-11 20:12 ` Alan Mackenzie @ 2018-11-11 20:47 ` Stefan Monnier 2018-11-12 3:30 ` Eli Zaretskii 2018-11-12 16:19 ` Eli Zaretskii 1 sibling, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-11 20:47 UTC (permalink / raw) To: Alan Mackenzie; +Cc: michael_heerdegen, Eli Zaretskii, emacs-devel > Another, slightly more honest, answer is that when it wasn't there, my > Emacs build crashed with a segfault whilst loading .el files. I didn't > get a core dump for this segfault. Could you please tell me (or point > me in the right direction of documentation) how I configure my GNU/Linux > to generate core dumps. I think my kernel's set up correctly, but I > don't see the dumps. I can't rmember how to do that, but I recommend you just run emacs (or temacs as the case may be) within GDB directly rather than go through a core dump. Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-11 20:47 ` Stefan Monnier @ 2018-11-12 3:30 ` Eli Zaretskii 0 siblings, 0 replies; 44+ messages in thread From: Eli Zaretskii @ 2018-11-12 3:30 UTC (permalink / raw) To: Stefan Monnier; +Cc: michael_heerdegen, acm, emacs-devel > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: Eli Zaretskii <eliz@gnu.org>, michael_heerdegen@web.de, > emacs-devel@gnu.org > Date: Sun, 11 Nov 2018 15:47:19 -0500 > > > Another, slightly more honest, answer is that when it wasn't there, my > > Emacs build crashed with a segfault whilst loading .el files. I didn't > > get a core dump for this segfault. Could you please tell me (or point > > me in the right direction of documentation) how I configure my GNU/Linux > > to generate core dumps. I think my kernel's set up correctly, but I > > don't see the dumps. > > I can't rmember how to do that "ulimit -H -c unlimited", I think. > but I recommend you just run emacs (or temacs as the case may be) > within GDB directly rather than go through a core dump. Right. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-11 20:12 ` Alan Mackenzie 2018-11-11 20:47 ` Stefan Monnier @ 2018-11-12 16:19 ` Eli Zaretskii 1 sibling, 0 replies; 44+ messages in thread From: Eli Zaretskii @ 2018-11-12 16:19 UTC (permalink / raw) To: Alan Mackenzie; +Cc: michael_heerdegen, monnier, emacs-devel > Date: Sun, 11 Nov 2018 20:12:14 +0000 > Cc: michael_heerdegen@web.de, monnier@IRO.UMontreal.CA, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > > This needs support in src/.gdbinit and documentation. > > Yes! I think .gdbinit will be relatively straightforward. How much to > put into the docs (the elisp manual?) is more difficult to decide. > Although primariliy for the byte compiler, Michael Heerdegen has already > said he's got other uses for it. The object and its predicate(s) should be documented, as well as the new primitive which uses it, read-positioning-symbols. The printed representation of the new Lisp object should also be documented (we do that for every other Lisp object). And there should be a short announcement in NEWS. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-11 15:53 ` Eli Zaretskii 2018-11-11 20:12 ` Alan Mackenzie @ 2018-11-12 14:16 ` Alan Mackenzie 1 sibling, 0 replies; 44+ messages in thread From: Alan Mackenzie @ 2018-11-12 14:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: michael_heerdegen, monnier, emacs-devel Hello, Eli. On Sun, Nov 11, 2018 at 17:53:13 +0200, Eli Zaretskii wrote: > > Date: Sun, 11 Nov 2018 12:59:45 +0000 > > From: Alan Mackenzie <acm@muc.de> > > Cc: Michael Heerdegen <michael_heerdegen@web.de>, emacs-devel@gnu.org > > > > I've now got this working, and created the new, optimistically named, > > branch /scratch/accurate-warning-pos. > Thanks. > +/* Return a new located symbol with the specified SYMBOL and LOCATION. */ > +Lisp_Object > +build_located_symbol (Lisp_Object symbol, Lisp_Object location) > +{ > I'd prefer something like symbol_with_pos instead, and accordingly in > other related symbol names. DONE. > +DEFUN ("only-symbol-p", Fonly_symbol_p, Sonly_symbol_p, 1, 1, 0, > + doc: /* Return t if OBJECT is a symbol, but not a located symbol. */ > + attributes: const) > + (Lisp_Object object) > symbol-bare-p? DONE. (bare-symbol-p) [ .... ] > diff --git a/src/lisp.h b/src/lisp.h > index eb67626..b4fc6f2 100644 > --- a/src/lisp.h > +++ b/src/lisp.h > @@ -323,6 +323,64 @@ typedef union Lisp_X *Lisp_Word; > typedef EMACS_INT Lisp_Word; > #endif > +/* A Lisp_Object is a tagged pointer or integer. Ordinarily it is a > + Lisp_Word. However, if CHECK_LISP_OBJECT_TYPE, it is a wrapper > + around Lisp_Word, to help catch thinkos like 'Lisp_Object x = 0;'. > + > + LISP_INITIALLY (W) initializes a Lisp object with a tagged value > + that is a Lisp_Word W. It can be used in a static initializer. */ > Looks like you moved a large chunk of lisp.h to a different place in > the file. Any reasons for that? I've now moved all but a few inline functions back again. > +/* FIXME!!! 2018-11-09. Consider using lisp_h_PSEUDOVECTOR here. */ > What is this FIXME about? It's gone, the issue having been resolved. > This needs support in src/.gdbinit and documentation. Not yet done. > Thanks again for working in this. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-08 14:08 ` Alan Mackenzie 2018-11-08 17:02 ` Stefan Monnier @ 2018-11-12 15:44 ` Alan Mackenzie 2018-11-12 20:36 ` Stefan Monnier 1 sibling, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-12 15:44 UTC (permalink / raw) To: Stefan Monnier, Eli Zaretskii; +Cc: Michael Heerdegen, emacs-devel Hello, Stefan and Eli. A snag..... On Thu, Nov 08, 2018 at 14:08:43 +0000, Alan Mackenzie wrote: [ .... ] > One thing we'd need to watch out for is using equal, not eq, when we > compare symbols. (eq 'foo #<symbol foo with position 73>) will surely > be nil, but (equal ....) would be t. Same with member and memq. Unfortunately, this isn't going to work. There will be macros which do things like: (cond ((eq (car form) 'bar) ....) .....) Here, (car form) is going to be #<symbol bar at 42>, so the eq is going to return nil. The only way out of this I can see at the moment is to amend eq (and memq, assq, delq, ....) so that it recognises a symbol with position as being eq to the bare symbol. At least when the flag variable symbols-with-pos-enabled is currently non-nil. At the implementation level, when that variable is nil (i.e. for normal running), there would be a cost of one comparison of an in-cache variable with zero on each eq operation which returns nil. This isn't pretty. If this modification of eq, memq, .... is too much to take, then I think the current approach is doomed to failure. What do you think? [ .... ] -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-12 15:44 ` Alan Mackenzie @ 2018-11-12 20:36 ` Stefan Monnier 2018-11-12 21:35 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-12 20:36 UTC (permalink / raw) To: Alan Mackenzie; +Cc: Michael Heerdegen, Eli Zaretskii, emacs-devel > Unfortunately, this isn't going to work. There will be macros which do > things like: > > (cond ((eq (car form) 'bar) ....) .....) > > Here, (car form) is going to be #<symbol bar at 42>, so the eq is going > to return nil. [...] > This isn't pretty. If this modification of eq, memq, .... is too much > to take, then I think the current approach is doomed to failure. It's indeed a serious concern. Maybe we can circumvent by changing those pieces of code to use `eql` (and make sure `eql` consider a symbol and its symbol-with-pos as equal, obviously). Changing `eq` would better be avoided, Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-12 20:36 ` Stefan Monnier @ 2018-11-12 21:35 ` Alan Mackenzie 2018-11-14 13:34 ` Stefan Monnier 0 siblings, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-12 21:35 UTC (permalink / raw) To: Stefan Monnier; +Cc: Michael Heerdegen, Eli Zaretskii, emacs-devel Hello, Stefan. On Mon, Nov 12, 2018 at 15:36:14 -0500, Stefan Monnier wrote: > > Unfortunately, this isn't going to work. There will be macros which do > > things like: > > > > (cond ((eq (car form) 'bar) ....) .....) > > > > Here, (car form) is going to be #<symbol bar at 42>, so the eq is going > > to return nil. > [...] > > This isn't pretty. If this modification of eq, memq, .... is too much > > to take, then I think the current approach is doomed to failure. > It's indeed a serious concern. Maybe we can circumvent by changing > those pieces of code to use `eql` (and make sure `eql` consider > a symbol and its symbol-with-pos as equal, obviously). We can't change those bits of code - they're in macros that we don't necessarily control. Or are you suggesting that we somehow compile macros such that `eq' gets replaced by `eql' in the critical places? > Changing `eq` would better be avoided, I agree, but don't see how we can avoid it. Apologies for my earlier insistence that the approach would have little impact outside the byte compiler. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-12 21:35 ` Alan Mackenzie @ 2018-11-14 13:34 ` Stefan Monnier 2018-11-15 16:32 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-14 13:34 UTC (permalink / raw) To: emacs-devel >> Changing `eq` would better be avoided, > I agree, but don't see how we can avoid it. Oh... you mean when someone else's macro does for example (defmacro ... (if (eq x 'foo) `(...) `(...))) ...hmm... yes, this is getting really ugly. Maybe the "big cons-cells" approach is not that bad after all, since it doesn't try to introduce new objects which are "equal but not": it just introduces a subtype of cons-cells and that's that, so it's semantically much simpler/cleaner. It will require special code in alloc.c to keep the special representation of normal cons-cells, and special extra code to propagate the location information in macroexp.el, cconv.el, byte-opt.el, bytecomp.el but the impact should be much more localized (and at places where normal compilers also have to do this kind of work). Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-14 13:34 ` Stefan Monnier @ 2018-11-15 16:32 ` Alan Mackenzie 2018-11-15 18:01 ` Stefan Monnier 0 siblings, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-15 16:32 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello, Stefan. On Wed, Nov 14, 2018 at 08:34:28 -0500, Stefan Monnier wrote: > >> Changing `eq` would better be avoided, > > I agree, but don't see how we can avoid it. > Oh... you mean when someone else's macro does for example > (defmacro ... > (if (eq x 'foo) > `(...) > `(...))) Yes. > ...hmm... yes, this is getting really ugly. > Maybe the "big cons-cells" approach is not that bad after all, since it > doesn't try to introduce new objects which are "equal but not": it just > introduces a subtype of cons-cells and that's that, so it's semantically > much simpler/cleaner. I'm not sure about that. We'd still have to modify EQ to cope with the new structure no matter how we do it. > It will require special code in alloc.c to keep the special > representation of normal cons-cells, and special extra code to propagate > the location information in macroexp.el, cconv.el, byte-opt.el, > bytecomp.el but the impact should be much more localized (and at places > where normal compilers also have to do this kind of work). In branch scratch/accurate-warning-pos I have hacked up (but not committed) an EQ which works with the (new as of a few days ago) PVEC structure for symbols with position. I am now able to byte-compile a .el file with symbols-with-pos-enabled bound to non-nil, having sorted out the problem that was earlier causing segfaults (probably). This version of Emacs is slower by ~8%, but this is tempered by the EQ implementation being extremely naive without any optimsation. Also some existing optimsation (e.g. #define EQ) has been commented out to enable the files to compile. I don't understand the relationship between "#define EQ" and the inline function EQ at all well. Optimsation will be surely be possible. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-15 16:32 ` Alan Mackenzie @ 2018-11-15 18:01 ` Stefan Monnier 2018-11-16 14:14 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-15 18:01 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel >> Maybe the "big cons-cells" approach is not that bad after all, since it >> doesn't try to introduce new objects which are "equal but not": it just >> introduces a subtype of cons-cells and that's that, so it's semantically >> much simpler/cleaner. > > I'm not sure about that. We'd still have to modify EQ to cope with the > new structure no matter how we do it. No need to modify EQ for the big-cons cells: a big-cons-cell would be a normal cons-cell just with more fields added at its end. It's not a "location + pointer to the real object" like we need to do for symbols, so EQ will do the expected thing on it. Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-15 18:01 ` Stefan Monnier @ 2018-11-16 14:14 ` Alan Mackenzie 0 siblings, 0 replies; 44+ messages in thread From: Alan Mackenzie @ 2018-11-16 14:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hello, Stefan. On Thu, Nov 15, 2018 at 13:01:49 -0500, Stefan Monnier wrote: > >> Maybe the "big cons-cells" approach is not that bad after all, since it > >> doesn't try to introduce new objects which are "equal but not": it just > >> introduces a subtype of cons-cells and that's that, so it's semantically > >> much simpler/cleaner. > > I'm not sure about that. We'd still have to modify EQ to cope with the > > new structure no matter how we do it. > No need to modify EQ for the big-cons cells: a big-cons-cell would be > a normal cons-cell just with more fields added at its end. It's not > a "location + pointer to the real object" like we need to do for > symbols, so EQ will do the expected thing on it. Sorry, yes. We'd need some way of distinguishing between the two types of cons cell (which I think you already dealt with some while ago) and we'd need to do an awful lot of transfer of old->new source information in the transformation of forms. In the mean time, I've got the symbols approach "working". In particular, I can byte compile the file from Roland Winkler's bug #9109, and get the "free variable" warning message indicating the correct source line (and, with a little more work to be done, the correct column). It is not quite ready to demonstrate, but quite near it. Incidentally, why do we not print line and column numbers for warnings in compile_defun? It wouldn't be difficult. > Stefan -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-01 17:59 Thoughts on getting correct line numbers in the byte compiler's warning messages Alan Mackenzie 2018-11-01 22:45 ` Stefan Monnier @ 2018-11-08 4:47 ` Michael Heerdegen 2018-11-08 11:07 ` Alan Mackenzie 2018-11-08 13:45 ` Stefan Monnier 1 sibling, 2 replies; 44+ messages in thread From: Michael Heerdegen @ 2018-11-08 4:47 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel Alan Mackenzie <acm@muc.de> writes: > The third idea is to amend the reader so that whereas it now produces a > form, in a byte compiler special mode, it would produce the cons (form . > offset). So, for example, the text "(not a)" currently gets read into > the form (not . (a . nil)). The amended reader would produce (((not . 1) > . ((a . 5) . (nil . 6))) . 0) (where 0, 1, 5, and 6 are the textual > offsets of the elements coded). BTW, an amended version of `read' might be beneficial for other stuff, too. When I designed el-search, I wanted something like that. I'm not sure which kind of position info data I would like to have. I think it would be good to have additionally starting positions of conses, for example. Michael. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-08 4:47 ` Michael Heerdegen @ 2018-11-08 11:07 ` Alan Mackenzie 2018-11-09 2:06 ` Michael Heerdegen 2018-11-08 13:45 ` Stefan Monnier 1 sibling, 1 reply; 44+ messages in thread From: Alan Mackenzie @ 2018-11-08 11:07 UTC (permalink / raw) To: Michael Heerdegen; +Cc: emacs-devel Hello, Michael. On Thu, Nov 08, 2018 at 05:47:15 +0100, Michael Heerdegen wrote: > Alan Mackenzie <acm@muc.de> writes: > > The third idea is to amend the reader so that whereas it now produces a > > form, in a byte compiler special mode, it would produce the cons (form . > > offset). So, for example, the text "(not a)" currently gets read into > > the form (not . (a . nil)). The amended reader would produce (((not . 1) > > . ((a . 5) . (nil . 6))) . 0) (where 0, 1, 5, and 6 are the textual > > offsets of the elements coded). > BTW, an amended version of `read' might be beneficial for other stuff, > too. When I designed el-search, I wanted something like that. As it turned out, the above scheme would not be useful, because a macro could not manipulate such a form. The ideas are currently in flux, in a discussion between Stefan and me, and we've come up with several ideas, all bad. ;-) We're currently trying to select the least bad idea. > I'm not sure which kind of position info data I would like to have. I > think it would be good to have additionally starting positions of > conses, for example. I came up with a way of doing this, using the spare value of Lisp_Type in a Lisp_Object to indicate an indirection to a structure of two Lisp_Objects. The first would be the actual object, the second would be position information. The trouble with this is it would slow down Emacs performance significantly (possibly as much as ~10%). It would also be difficult to implement, since at each transformation of the form being compiled, position information would need to be copied to the new version of form. Stefan's latest suggestion is to use the above approach just on symbol occurrences. (Sorry!). These are preserved through transformations much more than cons cells are. Also, the existing approach in the compiler only tracks symbol occurrences, so we will not lose anything by tracking only symbols, but more accurately. Even so, this will be a lot of work. If some code wants to get the starting position of a cons, the source code will surely be in a buffer somewhere. As long as there is a symbol in the cons (i.e., we don't have ()), surely the cons position can be found from the contained symbol, together with backward-up-list in the source buffer. Or something like that. > Michael. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-08 11:07 ` Alan Mackenzie @ 2018-11-09 2:06 ` Michael Heerdegen 2018-11-10 10:59 ` Alan Mackenzie 0 siblings, 1 reply; 44+ messages in thread From: Michael Heerdegen @ 2018-11-09 2:06 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel Alan Mackenzie <acm@muc.de> writes: > Stefan's latest suggestion is to use the above approach just on symbol > occurrences. (Sorry!). These are preserved through transformations > much more than cons cells are. BTW, just to be sure, you know about the already existing variable `read-with-symbol-positions', right? Only a detail for what you need to do, though. > If some code wants to get the starting position of a cons, the source > code will surely be in a buffer somewhere. As long as there is a symbol > in the cons (i.e., we don't have ()), surely the cons position can be > found from the contained symbol, together with backward-up-list in the > source buffer. Or something like that. Sure. The problem is how to find the right cons when several such places exist. Likewise for strings etc. My requirement is quite similar to yours, btw. Say, in a buffer at some position there is a list (X1 X2 X3) and you want to match that with (i.e. el-search for) pattern `(,P1 ,P2 ,P3) with certain PATTERNS Pi. In an ideal world, when the Pi are (tried to be) matched against the Xi, the Pi would know the buffer location of Xi, so that Pi could e.g. use a `guard' checking the "current" value of point. Since patterns can do destructuring like above, similar to your case I would want the position info to somehow survive transformations (mostly list accessing functions in my case). Thanks, Michael. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-09 2:06 ` Michael Heerdegen @ 2018-11-10 10:59 ` Alan Mackenzie 2018-11-10 13:20 ` Stefan Monnier 2018-11-11 7:56 ` Michael Heerdegen 0 siblings, 2 replies; 44+ messages in thread From: Alan Mackenzie @ 2018-11-10 10:59 UTC (permalink / raw) To: Michael Heerdegen; +Cc: emacs-devel Hello, Michael. On Fri, Nov 09, 2018 at 03:06:27 +0100, Michael Heerdegen wrote: > Alan Mackenzie <acm@muc.de> writes: > > Stefan's latest suggestion is to use the above approach just on symbol > > occurrences. (Sorry!). These are preserved through transformations > > much more than cons cells are. > BTW, just to be sure, you know about the already existing variable > `read-with-symbol-positions', right? Only a detail for what you need to > do, though. Oh, yes, we know about this, all right! It's because read-with-symbol-positions doesn't work reliably (there are several bugs open about warning messages reporting wrong positions) that we're trying to develop something better. > > If some code wants to get the starting position of a cons, the source > > code will surely be in a buffer somewhere. As long as there is a symbol > > in the cons (i.e., we don't have ()), surely the cons position can be > > found from the contained symbol, together with backward-up-list in the > > source buffer. Or something like that. > Sure. The problem is how to find the right cons when several such > places exist. Likewise for strings etc. I'm not sure I follow you, here. Surely the "right" cons is the one containing the symbol occurrence whose position is known? Or the one containing that, and so on. Literal strings could also be located, in much the same way as for symbol occurrences. Right at the moment, I don't know how much these things will slow Emacs down by. > My requirement is quite similar to yours, btw. Say, in a buffer at some > position there is a list (X1 X2 X3) and you want to match that with > (i.e. el-search for) pattern `(,P1 ,P2 ,P3) with certain PATTERNS Pi. > In an ideal world, when the Pi are (tried to be) matched against the Xi, > the Pi would know the buffer location of Xi, so that Pi could e.g. use a > `guard' checking the "current" value of point. > Since patterns can do destructuring like above, similar to your case I > would want the position info to somehow survive transformations (mostly > list accessing functions in my case). Yes, it sounds like this could use the "located symbols" feature, too. Right at the moment I'm trying to get SYMBOLP to recognise both normal symbols and "located symbols". This is causing a segfault on building such a test Emacs. :-( > Thanks, > Michael. -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-10 10:59 ` Alan Mackenzie @ 2018-11-10 13:20 ` Stefan Monnier 2018-11-11 7:56 ` Michael Heerdegen 1 sibling, 0 replies; 44+ messages in thread From: Stefan Monnier @ 2018-11-10 13:20 UTC (permalink / raw) To: emacs-devel > Literal strings could also be located, in much the same way as for > symbol occurrences. Right at the moment, I don't know how much these > things will slow Emacs down by. For literal strings, since they aren't "uniquified" like symbols, we can simply put the location on the object, e.g. as a text-property. Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-10 10:59 ` Alan Mackenzie 2018-11-10 13:20 ` Stefan Monnier @ 2018-11-11 7:56 ` Michael Heerdegen 1 sibling, 0 replies; 44+ messages in thread From: Michael Heerdegen @ 2018-11-11 7:56 UTC (permalink / raw) To: Alan Mackenzie; +Cc: emacs-devel Alan Mackenzie <acm@muc.de> writes: > > Sure. The problem is how to find the right cons when several such > > places exist. Likewise for strings etc. > > I'm not sure I follow you, here. Surely the "right" cons is the one > containing the symbol occurrence whose position is known? Or the one > containing that, and so on. Yes, if destructuring patterns also were able to desctructure the position info structure. Otherwise, matching `(,P1 ,P2 ,P3) against (a a a) has the problem that the Pi don't know which of the a's they are matched against. > Right at the moment I'm trying to get SYMBOLP to recognise both normal > symbols and "located symbols". This is causing a segfault on building > such a test Emacs. :-( Then good luck! Michael. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-08 4:47 ` Michael Heerdegen 2018-11-08 11:07 ` Alan Mackenzie @ 2018-11-08 13:45 ` Stefan Monnier 2018-11-09 3:06 ` Michael Heerdegen 1 sibling, 1 reply; 44+ messages in thread From: Stefan Monnier @ 2018-11-08 13:45 UTC (permalink / raw) To: emacs-devel > BTW, an amended version of `read' might be beneficial for other stuff, > too. When I designed el-search, I wanted something like that. Have you looked at edebug-read-*? Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-08 13:45 ` Stefan Monnier @ 2018-11-09 3:06 ` Michael Heerdegen 2018-11-09 16:15 ` Stefan Monnier 0 siblings, 1 reply; 44+ messages in thread From: Michael Heerdegen @ 2018-11-09 3:06 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > > BTW, an amended version of `read' might be beneficial for other stuff, > > too. When I designed el-search, I wanted something like that. > > Have you looked at edebug-read-*? No, thanks for the idea. I hope it would be fast and reliable enough (I already have enough bugs from the standard reader...) Michael. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages 2018-11-09 3:06 ` Michael Heerdegen @ 2018-11-09 16:15 ` Stefan Monnier 0 siblings, 0 replies; 44+ messages in thread From: Stefan Monnier @ 2018-11-09 16:15 UTC (permalink / raw) To: emacs-devel > No, thanks for the idea. I hope it would be fast and reliable enough (I > already have enough bugs from the standard reader...) It's fast and reliable enough for Edebug, but being an Elisp emulation of the C reader, it's obviously significantly slower than the C reader and less reliable. I think the reliability aspect should be good enough (or easy to fix) for el-search, but w.r.t to speed that might be a problem. Stefan ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2018-11-16 14:14 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-11-01 17:59 Thoughts on getting correct line numbers in the byte compiler's warning messages Alan Mackenzie 2018-11-01 22:45 ` Stefan Monnier 2018-11-05 10:53 ` Alan Mackenzie 2018-11-05 15:57 ` Eli Zaretskii 2018-11-05 16:51 ` Alan Mackenzie 2018-11-06 4:34 ` Herring, Davis 2018-11-06 8:53 ` Alan Mackenzie 2018-11-06 13:56 ` Stefan Monnier 2018-11-06 15:11 ` Alan Mackenzie 2018-11-06 16:29 ` Stefan Monnier 2018-11-06 19:15 ` Alan Mackenzie 2018-11-06 20:04 ` Stefan Monnier 2018-11-07 12:35 ` Alan Mackenzie 2018-11-07 17:11 ` Stefan Monnier 2018-11-07 17:00 ` Alan Mackenzie 2018-11-07 17:25 ` Stefan Monnier 2018-11-07 18:47 ` Alan Mackenzie 2018-11-07 19:12 ` Stefan Monnier 2018-11-08 14:08 ` Alan Mackenzie 2018-11-08 17:02 ` Stefan Monnier 2018-11-08 22:13 ` Alan Mackenzie 2018-11-11 12:59 ` Alan Mackenzie 2018-11-11 15:53 ` Eli Zaretskii 2018-11-11 20:12 ` Alan Mackenzie 2018-11-11 20:47 ` Stefan Monnier 2018-11-12 3:30 ` Eli Zaretskii 2018-11-12 16:19 ` Eli Zaretskii 2018-11-12 14:16 ` Alan Mackenzie 2018-11-12 15:44 ` Alan Mackenzie 2018-11-12 20:36 ` Stefan Monnier 2018-11-12 21:35 ` Alan Mackenzie 2018-11-14 13:34 ` Stefan Monnier 2018-11-15 16:32 ` Alan Mackenzie 2018-11-15 18:01 ` Stefan Monnier 2018-11-16 14:14 ` Alan Mackenzie 2018-11-08 4:47 ` Michael Heerdegen 2018-11-08 11:07 ` Alan Mackenzie 2018-11-09 2:06 ` Michael Heerdegen 2018-11-10 10:59 ` Alan Mackenzie 2018-11-10 13:20 ` Stefan Monnier 2018-11-11 7:56 ` Michael Heerdegen 2018-11-08 13:45 ` Stefan Monnier 2018-11-09 3:06 ` Michael Heerdegen 2018-11-09 16:15 ` Stefan Monnier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).