* Update 1 on Bytecode Offset tracking @ 2020-07-15 23:10 Zach Shaftel 2020-07-16 3:55 ` Stefan Monnier ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Zach Shaftel @ 2020-07-15 23:10 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/html, Size: 10496 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-15 23:10 Update 1 on Bytecode Offset tracking Zach Shaftel @ 2020-07-16 3:55 ` Stefan Monnier 2020-07-16 22:45 ` Zach Shaftel 2020-07-16 7:25 ` Andrea Corallo via Emacs development discussions. 2020-07-28 19:19 ` Update 2 " Zach Shaftel 2 siblings, 1 reply; 15+ messages in thread From: Stefan Monnier @ 2020-07-16 3:55 UTC (permalink / raw) To: Zach Shaftel; +Cc: emacs-devel > The second branch saves the offset only before a call. Therefore, the > traceback on all of the functions other than the current one are > accurate, but the current one is not accurate if the error happens in > a byte op. IIUC this has a negligible performance impact. The info it provides in not 100% accurate, but I think it's a "sweet spot": it does provide the byte-offset info and is cheap enough to be acceptable into `master` with no real downside. I'd look at it as a "step" along the way: subsequent steps can be to make use of that info, or to improve the accuracy of that info. > The third branch bypasses invoking Ffuncall from within > exec_byte_code, and instead does essentially the same thing that > Ffuncall does, right in the Bcall ops. This would be useful in its own right. So I suggest you try and get this code into shape for `master` as well. I expect this will tend to suffer from some amount of code duplication. Maybe we can avoid it via refactoring, or maybe by "clever" macro tricks, but if the speedup is important enough, we can probably live with some amount of duplication. > All of them print the offset next to function names in the backtrace like this: > > > Debugger entered--Lisp error: (wrong-type-argument stringp t) > string-match(t t nil) > 13 test-condition-case() > load("/home/zach/.repos/bench-compare.el/test/test-debug...") > 78 byte-recompile-file("/home/zach/.repos/bench-compare.el/test/test-debug..." nil 0 t) > 35 emacs-lisp-byte-compile-and-load() > funcall-interactively(emacs-lisp-byte-compile-and-load) > call-interactively(emacs-lisp-byte-compile-and-load record nil) > 101 command-execute(emacs-lisp-byte-compile-and-load record) Cool! > With respect to reporting offsets, using code from edebug we have > a Lisp-Expression reader that will track source-code locations and > store the information in a source-map-expression cl-struct. The code > in progress is here. How does the performance of this code compare to that of the "native" `read? And to put it into perspective, have you looked at the relative proportion of time spent in `read` during a "typical" byte compilation? There's no doubt that preserving source code information will slow down byte-compilation but depending on how slow it gets we may find it's not "worth it". > Information currently saved is: > > * The expression itself > * The exact string that was read > * Begin and end points of the sexp in the buffer > * source-map-expression children (for conses and vectors) Sounds like a lot of information, which in turn implies a potentially high overhead (e.g. the "exact string" sounds like it might cost O(N²) in corner cases, yet provides redundant info that can be recovered from begin+end points). Note also that while `read` returns a sexp made exclusively of data coming from a particular buffer, the code after macro-expansion can include chunks coming from other buffers, so if we want to keep the same representation of "sexp with extra info" in both cases, we can't just assume "the buffer". Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-16 3:55 ` Stefan Monnier @ 2020-07-16 22:45 ` Zach Shaftel 2020-07-17 3:44 ` Eli Zaretskii 2020-07-17 16:20 ` Stefan Monnier 0 siblings, 2 replies; 15+ messages in thread From: Zach Shaftel @ 2020-07-16 22:45 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Hi Stefan, Stefan Monnier <monnier@iro.umontreal.ca> writes: >> The second branch saves the offset only before a call. Therefore, the >> traceback on all of the functions other than the current one are >> accurate, but the current one is not accurate if the error happens in >> a byte op. > > IIUC this has a negligible performance impact. The info it provides in > not 100% accurate, but I think it's a "sweet spot": it does provide the > byte-offset info and is cheap enough to be acceptable into `master` with > no real downside. Great! I just followed up on my copyright assignment as I still haven't finished that process. I don't know whether this could be exempt or if Rocky's assignment is sufficient, but hopefully I will hear back from copyright-clerk soon. > I'd look at it as a "step" along the way: subsequent steps can be to > make use of that info, or to improve the accuracy of that info. Absolutely, it would be great to have that in place as a basis for further improvement. >> The third branch bypasses invoking Ffuncall from within >> exec_byte_code, and instead does essentially the same thing that >> Ffuncall does, right in the Bcall ops. > > This would be useful in its own right. > So I suggest you try and get this code into shape for `master` as well. I will definitely continue work on this. > I expect this will tend to suffer from some amount of code duplication. > Maybe we can avoid it via refactoring, or maybe by "clever" macro > tricks, but if the speedup is important enough, we can probably live > with some amount of duplication. That seems to be the case. I'll keep looking to see if there's any low hanging fruit in terms of splitting up the funcall logic without slowing things down. More testing is necessary, but if a moderate chunk of duplicated code is acceptable then there may not be as much work needed on that branch as I had thought. >> All of them print the offset next to function names in the backtrace like this: >> >> >> Debugger entered--Lisp error: (wrong-type-argument stringp t) >> string-match(t t nil) >> 13 test-condition-case() >> load("/home/zach/.repos/bench-compare.el/test/test-debug...") >> 78 byte-recompile-file("/home/zach/.repos/bench-compare.el/test/test-debug..." nil 0 t) >> 35 emacs-lisp-byte-compile-and-load() >> funcall-interactively(emacs-lisp-byte-compile-and-load) >> call-interactively(emacs-lisp-byte-compile-and-load record nil) >> 101 command-execute(emacs-lisp-byte-compile-and-load record) > > Cool! > >> With respect to reporting offsets, using code from edebug we have >> a Lisp-Expression reader that will track source-code locations and >> store the information in a source-map-expression cl-struct. The code >> in progress is here. > > How does the performance of this code compare to that of the "native" `read? Rough tests indicate it's about three times slower. Running it on all 274 files in the `lisp` directory of the GNU Emacs sources takes ~11-12 seconds (after removing the string from the struct and not pretty-printing). A similar function which just calls `read` takes ~4 seconds. There are probably ways to further improve the performance of `source-map-read`, but I don't know much more speed can realistically be gained. > And to put it into perspective, have you looked at the relative > proportion of time spent in `read` during a "typical" byte compilation? I have not yet, but I'll evaluate that and keep it in mind. > There's no doubt that preserving source code information will slow down > byte-compilation but depending on how slow it gets we may find it's not > "worth it". > >> Information currently saved is: >> >> * The expression itself >> * The exact string that was read >> * Begin and end points of the sexp in the buffer >> * source-map-expression children (for conses and vectors) > > Sounds like a lot of information, which in turn implies a potentially > high overhead (e.g. the "exact string" sounds like it might cost O(N²) > in corner cases, yet provides redundant info that can be recovered from > begin+end points). Removing the string did improve performance, but not by as much as I expected. The function that constructs the tree of "children" may be slower than it needs to be, so I'll look into improving that. It may not be necessary to create the children for vectors since they're constants (outside of backquote, at least). >Note also that while `read` returns a sexp made exclusively of data >coming from a particular buffer, the code after macro-expansion can >include chunks coming from other buffers, so if we want to keep the >same representation of "sexp with extra info" in both cases, we can't >just assume "the buffer". Yes, and it won't be easy to maintain the read locations across macroexpansion, byte-opt and cconv. It's tough to say at this point how much the final product will slow down compilation, but I suspect it will be significant. > Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-16 22:45 ` Zach Shaftel @ 2020-07-17 3:44 ` Eli Zaretskii 2020-07-17 16:20 ` Stefan Monnier 1 sibling, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2020-07-17 3:44 UTC (permalink / raw) To: Zach Shaftel; +Cc: monnier, emacs-devel > From: Zach Shaftel <zshaftel@gmail.com> > Date: Thu, 16 Jul 2020 18:45:00 -0400 > Cc: emacs-devel@gnu.org > > I just followed up on my copyright assignment as I still haven't > finished that process. Let me know if I can help in any way. > I don't know whether this could be exempt or if Rocky's assignment > is sufficient, but hopefully I will hear back from copyright-clerk > soon. If you don't hear from the in a week or two, ping them and CC me. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-16 22:45 ` Zach Shaftel 2020-07-17 3:44 ` Eli Zaretskii @ 2020-07-17 16:20 ` Stefan Monnier 2020-07-17 20:19 ` Zach Shaftel 1 sibling, 1 reply; 15+ messages in thread From: Stefan Monnier @ 2020-07-17 16:20 UTC (permalink / raw) To: Zach Shaftel; +Cc: emacs-devel > Great! I just followed up on my copyright assignment as I still haven't > finished that process. I don't know whether this could be exempt or if > Rocky's assignment is sufficient, but hopefully I will hear back from > copyright-clerk soon. While waiting for the paperwork to go through, you can prepare the patch and we can start discussing it. > That seems to be the case. I'll keep looking to see if there's any low > hanging fruit in terms of splitting up the funcall logic without slowing > things down. More testing is necessary, but if a moderate chunk of > duplicated code is acceptable then there may not be as much work needed > on that branch as I had thought. It's a tradeoff, so it's hard to say what is acceptable without seeing the actual path along with the corresponding measurements of the performance impact. > Rough tests indicate it's about three times slower. Wow! That's a lot less than I expected. That makes it quite usable. This said, we'll probably still want to merge the feature into the C code simply to avoid the duplication (I expect that the Edebug reader has never been 100% faithful and that it has probably diverged over time). > Removing the string did improve performance, but not by as much as I > expected. The function that constructs the tree of "children" may be > slower than it needs to be, so I'll look into improving that. It may not > be necessary to create the children for vectors since they're constants > (outside of backquote, at least). I think vectors are rare enough that the performance benefit of special casing them is not worth the downside of losing source-location info when it'd be beneficial. > Yes, and it won't be easy to maintain the read locations across > macroexpansion, byte-opt and cconv. For byte-opt and cconv it's mostly a question of labour. For macros, OTOH, it's really fundamentally hard (or impossible, in general). We could/should introduce some new way to define macros which knows about "source code annotated with locations". There's a lot of work on Scheme macros we could leverage for that. Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-17 16:20 ` Stefan Monnier @ 2020-07-17 20:19 ` Zach Shaftel 2020-07-17 22:08 ` Stefan Monnier 0 siblings, 1 reply; 15+ messages in thread From: Zach Shaftel @ 2020-07-17 20:19 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> Great! I just followed up on my copyright assignment as I still haven't >> finished that process. I don't know whether this could be exempt or if >> Rocky's assignment is sufficient, but hopefully I will hear back from >> copyright-clerk soon. > > While waiting for the paperwork to go through, you can prepare the patch > and we can start discussing it. Sure, does that just mean the 'git format-patch -1' emailed to bug-gnu-emacs@gnu.org, as mentioned in CONTRIBUTE? If that's the gist of it then I can do that shortly. >> That seems to be the case. I'll keep looking to see if there's any low >> hanging fruit in terms of splitting up the funcall logic without slowing >> things down. More testing is necessary, but if a moderate chunk of >> duplicated code is acceptable then there may not be as much work needed >> on that branch as I had thought. > > It's a tradeoff, so it's hard to say what is acceptable without seeing > the actual path along with the corresponding measurements of the > performance impact. > >> Rough tests indicate it's about three times slower. > > Wow! That's a lot less than I expected. That makes it quite usable. I was able to speed that function up to the point that it's about the same as one using `read`. Those functions are doing a whole lot of IO (reading and writing hundreds of files) so it's not really a fair comparison. I've done more tests with functions that just read a whole buffer, collecting what they read into a list. In a 9600 line file with just over 500 sexps, the `read` version took about ~.02-.04 seconds (according to `benchmark-run-compiled`), and the `source-map-read` version took ~.08 seconds when it didn't GC, but unlike with `read` it did cause a GC 10-20% of the time. > This said, we'll probably still want to merge the feature into the > C code simply to avoid the duplication (I expect that the Edebug reader > has never been 100% faithful and that it has probably diverged over > time). I already had a couple minor issues with edebug's reader, though they were easily remedied. It's pretty hard to hook into it since it relies heavily on dynamic binding. So I agree edebug is not the ideal candidate for a final product. >> Removing the string did improve performance, but not by as much as I >> expected. The function that constructs the tree of "children" may be >> slower than it needs to be, so I'll look into improving that. It may not >> be necessary to create the children for vectors since they're constants >> (outside of backquote, at least). > > I think vectors are rare enough that the performance benefit of special > casing them is not worth the downside of losing source-location info > when it'd be beneficial. > >> Yes, and it won't be easy to maintain the read locations across >> macroexpansion, byte-opt and cconv. > > For byte-opt and cconv it's mostly a question of labour. > > For macros, OTOH, it's really fundamentally hard (or impossible, in > general). Helmut Eller mentioned before that most macros do use at least some of the original code in their expansion. A crude test with a pair of hash-tables indicates 120/594 of the conses in the definition of `cl-typep` remained after `macroexpand-all`ing. That's not much, but considering how macro-laden the function is, that doesn't seem totally impossible to work with. > We could/should introduce some new way to define macros which > knows about "source code annotated with locations". I've wondered about this too but don't know what the right approach would be. I doubt anyone would want to use something like macro-cons/list/append etc. functions, and somehow modifying the behavior of the real list operations during macroexpansion seems impractical. I'm sure someone can come up with a good approach to this. > There's a lot of work on Scheme macros we could leverage for that. Interesting, so far I've had some difficulty finding documentation about how other Lisps track source locations. If you know where I can find some more details about that I'd love to take a look. -Zach ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-17 20:19 ` Zach Shaftel @ 2020-07-17 22:08 ` Stefan Monnier 2020-07-18 21:41 ` Zach Shaftel 0 siblings, 1 reply; 15+ messages in thread From: Stefan Monnier @ 2020-07-17 22:08 UTC (permalink / raw) To: Zach Shaftel; +Cc: emacs-devel >> While waiting for the paperwork to go through, you can prepare the patch >> and we can start discussing it. > Sure, does that just mean the 'git format-patch -1' emailed to > bug-gnu-emacs@gnu.org, as mentioned in CONTRIBUTE? If that's the gist of > it then I can do that shortly. Pretty much, yes. You can add some text to give extra background on the design, the motivation for some of the choices, or ask questions about particular details, but that's not indispensable. You can also send an email that just refers to a branch in emacs.git. But for the discussion to work well, it's usually better to make sure this branch is "small" so people aren't discouraged to read the large diff ;-) > I was able to speed that function up to the point that it's about the > same as one using `read`. Those functions are doing a whole lot of IO > (reading and writing hundreds of files) so it's not really a fair > comparison. I've done more tests with functions that just read a whole > buffer, collecting what they read into a list. In a 9600 line file with > just over 500 sexps, the `read` version took about ~.02-.04 seconds > (according to `benchmark-run-compiled`), and the `source-map-read` > version took ~.08 seconds when it didn't GC, but unlike with `read` it > did cause a GC 10-20% of the time. IME when the time is in the sub-second range the measurements are very imprecise, so better measure the time to repeat the same `read` N times so the total time is a few seconds (and since it's the same `read`, it won't suffer from extra IO overhead). >> For macros, OTOH, it's really fundamentally hard (or impossible, in >> general). > Helmut Eller mentioned before that most macros do use at least some of > the original code in their expansion. We can definitely hope to use some heuristics that will preserve "most" source info for "most" existing macros, yes. But it's still a fundamentally impossible problem in general ;-) >> We could/should introduce some new way to define macros which >> knows about "source code annotated with locations". > I've wondered about this too but don't know what the right approach > would be. The first step is to define a `defmacro2` which works like `defmacro` but is defined to take as arguments (and to return) annotated-sexps instead of "bare sexps". It'll be less convenient to use, but In Scheme "annotated sexps" are called "syntax objects". > I doubt anyone would want to use something like macro-cons/list/append > etc. functions, Scheme avoids the problem by defining additional higher-level layers, where macros are defined in a more restrictive way using templates, so for most macros the programmer doesn't need to use care very much about the difference between bare sexps and syntax objects. The main motivation for it was hygiene (the framework takes care of adding the needed `gensym`s where applicable) rather than tracking source-location, but fundamentally the issue is the same: an AST node is not just some random sexp. IOW "code and data aren't quite the same, after all" ;-) See for example `syntax-case` https://www.gnu.org/software/guile/manual/html_node/Syntax-Case.html Note that Scheme uses the #' notation for syntax objects. Adapting the example for `when` to an Elisp syntax could look like: (defmacro2 when (form) (elisp-case form ((_ test e e* ...) (elisp (if test (progn e e* ...)))))) [ Where I used `elisp` instead of Scheme's `syntax` since we already use the prefix "syntax-" for things related to syntax-tables. ] Notice how it's `elisp-case` which extracts `test`, `e`, and `e*` and then it's `syntax` which builds the new chunk of code, so all the replacement of `car` with `elisp-car` can be hidden within the definition of `elisp-case` and `elisp`. >> There's a lot of work on Scheme macros we could leverage for that. > Interesting, so far I've had some difficulty finding documentation about > how other Lisps track source locations. It's not really discussed, but the distinction between "sexp" and "syntax object" is the key. It's largely not discussed because Scheme macros have never officially included the equivalent of `defmacro` operating on raw sexps, so they've never really had to deal with the issue (tho Gambit does provide a `define-macro` which operates like our `defmacro` but it's rarely used so Gambit just punts on the source-location issue in that case). Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-17 22:08 ` Stefan Monnier @ 2020-07-18 21:41 ` Zach Shaftel 2020-07-19 2:34 ` Stefan Monnier 0 siblings, 1 reply; 15+ messages in thread From: Zach Shaftel @ 2020-07-18 21:41 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >>> While waiting for the paperwork to go through, you can prepare the patch >>> and we can start discussing it. >> Sure, does that just mean the 'git format-patch -1' emailed to >> bug-gnu-emacs@gnu.org, as mentioned in CONTRIBUTE? If that's the gist of >> it then I can do that shortly. > > Pretty much, yes. You can add some text to give extra background on the > design, the motivation for some of the choices, or ask questions about > particular details, but that's not indispensable. > > You can also send an email that just refers to a branch in emacs.git. > But for the discussion to work well, it's usually better to make sure > this branch is "small" so people aren't discouraged to read the large > diff ;-) Sounds good, I'll proceed with that once I make sure there are no issues with that branch. >> I was able to speed that function up to the point that it's about the >> same as one using `read`. Those functions are doing a whole lot of IO >> (reading and writing hundreds of files) so it's not really a fair >> comparison. I've done more tests with functions that just read a whole >> buffer, collecting what they read into a list. In a 9600 line file with >> just over 500 sexps, the `read` version took about ~.02-.04 seconds >> (according to `benchmark-run-compiled`), and the `source-map-read` >> version took ~.08 seconds when it didn't GC, but unlike with `read` it >> did cause a GC 10-20% of the time. > > IME when the time is in the sub-second range the measurements are very > imprecise, so better measure the time to repeat the same `read` N times > so the total time is a few seconds (and since it's the same `read`, > it won't suffer from extra IO overhead). Sure, I'll do some more exhaustive testing. So far though, the results aren't great, the biggest issue being memory usage. The `source-map-read` can GC over 5 times more often than `read`. Obviously edebug isn't the answer. I could start trying to simplify and adapt the useful bits of edebug for the source-map reader directly, but I think it's more sensible to accept that a real implementation will have to be in C and this reader will just remain a prototype. >>> For macros, OTOH, it's really fundamentally hard (or impossible, in >>> general). >> Helmut Eller mentioned before that most macros do use at least some of >> the original code in their expansion. > > We can definitely hope to use some heuristics that will preserve "most" > source info for "most" existing macros, yes. > But it's still a fundamentally impossible problem in general ;-) > >>> We could/should introduce some new way to define macros which >>> knows about "source code annotated with locations". >> I've wondered about this too but don't know what the right approach >> would be. > > The first step is to define a `defmacro2` which works like `defmacro` > but is defined to take as arguments (and to return) annotated-sexps > instead of "bare sexps". It'll be less convenient to use, but > > In Scheme "annotated sexps" are called "syntax objects". > >> I doubt anyone would want to use something like macro-cons/list/append >> etc. functions, > > Scheme avoids the problem by defining additional higher-level layers, > where macros are defined in a more restrictive way using templates, so > for most macros the programmer doesn't need to use care very much about > the difference between bare sexps and syntax objects. > > The main motivation for it was hygiene (the framework takes care of > adding the needed `gensym`s where applicable) rather than tracking > source-location, but fundamentally the issue is the same: an AST node is > not just some random sexp. > > IOW "code and data aren't quite the same, after all" ;-) > > See for example `syntax-case` https://www.gnu.org/software/guile/manual/html_node/Syntax-Case.html > Note that Scheme uses the #' notation for syntax objects. Adapting the > example for `when` to an Elisp syntax could look like: > > (defmacro2 when (form) > (elisp-case form > ((_ test e e* ...) (elisp (if test (progn e e* ...)))))) > > [ Where I used `elisp` instead of Scheme's `syntax` since we already use > the prefix "syntax-" for things related to syntax-tables. ] > > Notice how it's `elisp-case` which extracts `test`, `e`, and `e*` and > then it's `syntax` which builds the new chunk of code, so all the > replacement of `car` with `elisp-car` can be hidden within the definition > of `elisp-case` and `elisp`. Aha, I had never even considered hygienic macros in Elisp (nor had I recognized how trivial it is to track their source-code). That would be an amazing development for Emacs Lisp, but is certainly a huge undertaking, not something I could fit into the GSoC timeline. I know that it has been done in Common Lisp (by Pascal Costanza), but I believe that implementation serves the sole purpose of capture avoidance and doesn't abstract syntax. For Emacs I assume this would have to be done in C, but I do wonder if an Elisp implementation would be possible. >>> There's a lot of work on Scheme macros we could leverage for that. >> Interesting, so far I've had some difficulty finding documentation about >> how other Lisps track source locations. > > It's not really discussed, but the distinction between "sexp" and > "syntax object" is the key. It's largely not discussed because Scheme > macros have never officially included the equivalent of `defmacro` > operating on raw sexps, so they've never really had to deal with the > issue (tho Gambit does provide a `define-macro` which operates like our > `defmacro` but it's rarely used so Gambit just punts on the > source-location issue in that case). Doing the similar thing in Elisp -- relegating source location tracking to code using only a specialized kind of macro, hygienic or otherwise -- would of course be a major loss, since it would take years for that new paradigm to become commonplace. -Zach ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-18 21:41 ` Zach Shaftel @ 2020-07-19 2:34 ` Stefan Monnier 2020-07-21 0:28 ` Zach Shaftel 0 siblings, 1 reply; 15+ messages in thread From: Stefan Monnier @ 2020-07-19 2:34 UTC (permalink / raw) To: Zach Shaftel; +Cc: emacs-devel > Sure, I'll do some more exhaustive testing. So far though, the results > aren't great, the biggest issue being memory usage. The > `source-map-read` can GC over 5 times more often than `read`. Sounds fine for a prototype. > I think it's more sensible to accept that a real implementation will > have to be in C and this reader will just remain a prototype. Indeed. > Aha, I had never even considered hygienic macros in Elisp (nor had I > recognized how trivial it is to track their source-code). That would be > an amazing development for Emacs Lisp, but is certainly a huge > undertaking, not something I could fit into the GSoC timeline. No, I'm just discussing what the longer-run might look like. > I know that it has been done in Common Lisp (by Pascal Costanza), but > I believe that implementation serves the sole purpose of capture > avoidance and doesn't abstract syntax. For Emacs I assume this would > have to be done in C, but I do wonder if an Elisp implementation would > be possible. I haven't thought very much about it, but I can't see any reason why it would need to be done in C, no (tho I wouldn't be surprised if it could benefit from a bit of help from the C side, of course). > Doing the similar thing in Elisp -- relegating source location tracking > to code using only a specialized kind of macro, hygienic or otherwise -- > would of course be a major loss, since it would take years for that new > paradigm to become commonplace. Indeed, we'll need some fallback heuristic for all the existing `defmacro`s. Part of the issue is "tracking source location" but another important part is to take the annotated source code and "de-annotate" it (recursively) to pass it to the macro, since the macro expects a raw sexp. That's why we've been thinking about annotated representations which are "transparent" (i.e. can be used as if they weren't annotated). Either using "fat cons-cells" or using "fat symbols" or storing the annotations in an eq-hash-table. Another way to attack the problem is to rely on the Edebug spec: you can refrain from de-annotating all the parts marked as `form` or `body` (as long as the annotations themselves look sufficiently like normal code, at least). Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-19 2:34 ` Stefan Monnier @ 2020-07-21 0:28 ` Zach Shaftel 2020-07-21 2:51 ` Stefan Monnier 0 siblings, 1 reply; 15+ messages in thread From: Zach Shaftel @ 2020-07-21 0:28 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> Sure, I'll do some more exhaustive testing. So far though, the results >> aren't great, the biggest issue being memory usage. The >> `source-map-read` can GC over 5 times more often than `read`. > > Sounds fine for a prototype. > >> I think it's more sensible to accept that a real implementation will >> have to be in C and this reader will just remain a prototype. > > Indeed. > >> Aha, I had never even considered hygienic macros in Elisp (nor had I >> recognized how trivial it is to track their source-code). That would be >> an amazing development for Emacs Lisp, but is certainly a huge >> undertaking, not something I could fit into the GSoC timeline. > > No, I'm just discussing what the longer-run might look like. > >> I know that it has been done in Common Lisp (by Pascal Costanza), but >> I believe that implementation serves the sole purpose of capture >> avoidance and doesn't abstract syntax. For Emacs I assume this would >> have to be done in C, but I do wonder if an Elisp implementation would >> be possible. > > I haven't thought very much about it, but I can't see any reason why it > would need to be done in C, no (tho I wouldn't be surprised if it could > benefit from a bit of help from the C side, of course). This is my gut feeling as well, but in the few discussions I've seen about implementing hygiene in an unhygienic macro system people suggest that it's nigh impossible without rewriting core parts of the language. I haven't seen a convincing defense of that argument in any of those discussions though. >> Doing the similar thing in Elisp -- relegating source location tracking >> to code using only a specialized kind of macro, hygienic or otherwise -- >> would of course be a major loss, since it would take years for that new >> paradigm to become commonplace. > > Indeed, we'll need some fallback heuristic for all the existing > `defmacro`s. > > Part of the issue is "tracking source location" but another important > part is to take the annotated source code and "de-annotate" it > (recursively) to pass it to the macro, since the macro expects > a raw sexp. > > That's why we've been thinking about annotated representations > which are "transparent" (i.e. can be used as if they weren't annotated). > Either using "fat cons-cells" or using "fat symbols" or storing the > annotations in an eq-hash-table. A hash-table seems like the most straightforward approach, which I'm working on now. Doing something like in the scratch/accurate-warning-pos branch adds a whole lot of complexity to both C and Lisp, and toggling byte-compilation versions of subrs feels clunky to me (though that's obviously just a prototype). A hash-table of conses will hopefully be enough, with or without the `source-map` stuff. > Another way to attack the problem is to rely on the Edebug spec: you can > refrain from de-annotating all the parts marked as `form` or `body` (as > long as the annotations themselves look sufficiently like normal code, > at least). Interesting, that's not something I had thought about. I suspect flawed edebug specs are common enough that this can't be relied upon. -Zach ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-21 0:28 ` Zach Shaftel @ 2020-07-21 2:51 ` Stefan Monnier 0 siblings, 0 replies; 15+ messages in thread From: Stefan Monnier @ 2020-07-21 2:51 UTC (permalink / raw) To: Zach Shaftel; +Cc: emacs-devel > A hash-table seems like the most straightforward approach, which I'm > working on now. Doing something like in the scratch/accurate-warning-pos > branch adds a whole lot of complexity to both C and Lisp, and toggling > byte-compilation versions of subrs feels clunky to me (though that's > obviously just a prototype). A hash-table of conses will hopefully be > enough, with or without the `source-map` stuff. Preliminary measurements suggested that the hash-table way would be too slow. But I guess it depends on the granularity. Maybe if we only put such annotations on the "head-cons-cells", it might be cheap enough. > Interesting, that's not something I had thought about. I suspect flawed > edebug specs are common enough that this can't be relied upon. Actually, I don't think so: since Edebug rewrites all the parts marked as `form` or `body` with significant changes, such errors are rather unlikely. The more frequent problems will be when the Edebug spec is missing or failed to mark the relevant parts as `form` or `body`, in which case we'd not only lose the annotations (which costs us extra work to go and remove the annotations). But it's a "safe" form of failure (a kind of "graceful degradation"). Stefan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-15 23:10 Update 1 on Bytecode Offset tracking Zach Shaftel 2020-07-16 3:55 ` Stefan Monnier @ 2020-07-16 7:25 ` Andrea Corallo via Emacs development discussions. 2020-07-17 0:24 ` Zach Shaftel 2020-07-28 19:19 ` Update 2 " Zach Shaftel 2 siblings, 1 reply; 15+ messages in thread From: Andrea Corallo via Emacs development discussions. @ 2020-07-16 7:25 UTC (permalink / raw) To: Zach Shaftel; +Cc: emacs-devel, Rocky Bernstein Hi Zach and Rocky, IMO having the exact offset for all functions in the stack except the last is already a measurable improvement. Reevaluating the top function and rerunning is not a huge deal, reevaluating N functions and rerunning each time trying to figure out what is going wrong and where on the contrary can be considerably more painful. Zach Shaftel <zshaftel@gmail.com> writes: > With respect to reporting offsets, using code from edebug we have a > Lisp-Expression reader that will track source-code locations and > store the information in a source-map-expression cl-struct. The code > in progress is here. > > Information currently saved is: > > The expression itself > The exact string that was read > Begin and end points of the sexp in the buffer > source-map-expression children (for conses and vectors) > > which can be generated for a whole lisp file with the function > source-map-file. We are testing this on lots of files such as the > lisp files in the GNU Emacs distribution. After this is done we will > try hooking this into the compilation process. Regarding the reader I fear modifying the C one will be the only way if we want to have something sufficiently high performance to be used as default. That said having one to begin with experimenting is a very good start. I guess we'll want to have the 'children' as key of an hash where the rest is the value. Thanks you and Rocky for the effort! Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-16 7:25 ` Andrea Corallo via Emacs development discussions. @ 2020-07-17 0:24 ` Zach Shaftel 2020-07-17 13:47 ` Rocky Bernstein 0 siblings, 1 reply; 15+ messages in thread From: Zach Shaftel @ 2020-07-17 0:24 UTC (permalink / raw) To: Andrea Corallo; +Cc: Rocky Bernstein, emacs-devel Hi Andrea, Andrea Corallo <akrl@sdf.org> writes: > Hi Zach and Rocky, > > IMO having the exact offset for all functions in the stack except the > last is already a measurable improvement. Reevaluating the top function > and rerunning is not a huge deal, reevaluating N functions and rerunning > each time trying to figure out what is going wrong and where on the > contrary can be considerably more painful. Absolutely, and since this deficiency arises when errors are signalled from byte ops other than `Bcall`, many backtraces still present accurate offsets top to bottom. > Zach Shaftel <zshaftel@gmail.com> writes: > >> With respect to reporting offsets, using code from edebug we have a >> Lisp-Expression reader that will track source-code locations and >> store the information in a source-map-expression cl-struct. The code >> in progress is here. >> >> Information currently saved is: >> >> The expression itself >> The exact string that was read >> Begin and end points of the sexp in the buffer >> source-map-expression children (for conses and vectors) >> >> which can be generated for a whole lisp file with the function >> source-map-file. We are testing this on lots of files such as the >> lisp files in the GNU Emacs distribution. After this is done we will >> try hooking this into the compilation process. > > Regarding the reader I fear modifying the C one will be the only way if > we want to have something sufficiently high performance to be used as > default. I suspect you're right that processing and saving this information in Lisp won't be fast enough to maintain reasonable compilation time. Perhaps it could be made available as an option, in the form of alternate byte-compile functions for example, but that would be an unfortunate compromise. Down the line, perhaps the native compiler could change that. > That said having one to begin with experimenting is a very > good start. I guess we'll want to have the 'children' as key of an hash > where the rest is the value. That's the plan! > Thanks you and Rocky for the effort! > > Andrea ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Update 1 on Bytecode Offset tracking 2020-07-17 0:24 ` Zach Shaftel @ 2020-07-17 13:47 ` Rocky Bernstein 0 siblings, 0 replies; 15+ messages in thread From: Rocky Bernstein @ 2020-07-17 13:47 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2847 bytes --] ON Wed, 15 Jul 2020 23:55:19 -0400 Stefan Monnier wrote: Sounds like a lot of information, which in turn implies a potentially > high overhead (e.g. the "exact string" sounds like it might cost O(N²) > in corner cases, yet provides redundant info that can be recovered from > begin+end points). Note also that while `read` returns a sexp made > exclusively of data coming from a particular buffer, the code after > macro-expansion can include chunks coming from other buffers, so if we > want to keep the same representation of "sexp with extra info" in both > cases, we can't just assume "the buffer". Yes, when I last looked, yes, there is bloat in the way source mappings are done. But let me explain: As a Google Summer of Code project, the project has always been been a bit behind. So the approach I had been taking was that if something is usable for now, go with it and move onto other uncharted territory. In other words, get something out, complete what remains and *only then* go back and iterate on the parts that need improving. The C changes were little bit different because of the (necessarily) long lead time to get things into master and because one can't put something inefficient into the core. The source-code string is needed in the source map only at the top-level. (Oddly the member name for this is "code"). I had suggested that offsets should be relative to the beginning of the function, and the function node would have the position from the beginning of the container (e.g. file) that it is in. However this isn't a big deal, since conversions are easily done. As for handling bits of S-expressions that represent the conglomeration of a number of containers/files, that's pretty easily handled inside the structure. I am not totally clear about how the container information is determined. I imagine some of it would be noticed in the parameters when the macro is defined, and some of each time the macro is expanded. But once it is determined that certain S-expressions go with certain containers, it is trivial to add it to a source-map object One cool thing about having the source string stored in the sourcemap object (whether just at the top-level of in more places) is that in tracebacks is that exact information can be given without searching around. In fact, the source code may have *never* existed inside a file and this still works. Another great thing about this is that it can tolerate mismatches between the Elisp compiled and the Elisp that is have available. If there were changes outside the toplevel object but not inside the object, then it is pretty easy to detect and correct for this. Even if the discrepency is inside the object, the differences are also easiliy detected. Adjusting is a little more difficult, but still doable. [-- Attachment #2: Type: text/html, Size: 3769 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Update 2 on Bytecode Offset tracking 2020-07-15 23:10 Update 1 on Bytecode Offset tracking Zach Shaftel 2020-07-16 3:55 ` Stefan Monnier 2020-07-16 7:25 ` Andrea Corallo via Emacs development discussions. @ 2020-07-28 19:19 ` Zach Shaftel 2 siblings, 0 replies; 15+ messages in thread From: Zach Shaftel @ 2020-07-28 19:19 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/html, Size: 12296 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-07-28 19:19 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-07-15 23:10 Update 1 on Bytecode Offset tracking Zach Shaftel 2020-07-16 3:55 ` Stefan Monnier 2020-07-16 22:45 ` Zach Shaftel 2020-07-17 3:44 ` Eli Zaretskii 2020-07-17 16:20 ` Stefan Monnier 2020-07-17 20:19 ` Zach Shaftel 2020-07-17 22:08 ` Stefan Monnier 2020-07-18 21:41 ` Zach Shaftel 2020-07-19 2:34 ` Stefan Monnier 2020-07-21 0:28 ` Zach Shaftel 2020-07-21 2:51 ` Stefan Monnier 2020-07-16 7:25 ` Andrea Corallo via Emacs development discussions. 2020-07-17 0:24 ` Zach Shaftel 2020-07-17 13:47 ` Rocky Bernstein 2020-07-28 19:19 ` Update 2 " Zach Shaftel
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.